SlideShare une entreprise Scribd logo
1  sur  82
Télécharger pour lire hors ligne
Inferring Multiple Graph Structures

               Julien Chiquet1 , Yves Grandvalet2 , Christophe
                                  Ambroise1
                  1                   ´                      ´   ´
                      Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne
                  2                              ´                        `
                      Heudiasyc, CNRS & Universite de Technologie de Compiegne


                                         NeMo – 21 juin 2010


         Chiquet, Grandvalet, Ambroise, arXiv preprint.
         Inferring multiple Gaussian graphical structures.

         Chiquet, Grasseau, Charbonnier and Ambroise, R-package SIMoNe.
         http://stat.genopole.cnrs.fr/~jchiquet/fr/softwares/simone



Inferring Multiple Graph Structures                                                   1
Problem




                                      Inference




   few arrays ⇔ few examples
   lots of genes ⇔ high dimension                 Which interactions?
   interactions ⇔ very high dimension



     The main trouble is the low sample size and high dimensional
                                 setting
     Our main hope is to benefit from sparsity: few genes interact


Inferring Multiple Graph Structures                                     2
Handling the scarcity of data
  Merge several experimental conditions
  experiment 1           experiment 2     experiment 3




Inferring Multiple Graph Structures                      3
Handling the scarcity of data
  Inferring each graph independently does not help
   experiment 1           experiment 2           experiment 3




              (1)          (1)           (2)        (2)        (3)        (3)
          (X1 , . . . , Xn1 )         (X1 , . . . , Xn2 )   (X1 , . . . , Xn3 )
              inference                  inference             inference




Inferring Multiple Graph Structures                                               3
Handling the scarcity of data
  By pooling all the available data
   experiment 1             experiment 2                                       experiment 3




                                      (X1 , . . . , Xn ), n = n1 + n2 + n3 .
                                                   inference




Inferring Multiple Graph Structures                                                           3
Handling the scarcity of data

   experiment 1                       experiment 2             experiment 3




              (1)          (1)              (2)        (2)           (3)        (3)
          (X1 , . . . , Xn1 )            (X1 , . . . , Xn2 )      (X1 , . . . , Xn3 )
              inference                     inference                inference




Inferring Multiple Graph Structures                                                     3
Handling the scarcity of data
  By breaking the separability
   experiment 1           experiment 2                      experiment 3




              (1)          (1)           (2)        (2)           (3)        (3)
          (X1 , . . . , Xn1 )         (X1 , . . . , Xn2 )      (X1 , . . . , Xn3 )
              inference                  inference                inference




Inferring Multiple Graph Structures                                                  3
Handling the scarcity of data
  By breaking the separability
   experiment 1           experiment 2                      experiment 3




              (1)          (1)           (2)        (2)           (3)        (3)
          (X1 , . . . , Xn1 )         (X1 , . . . , Xn2 )      (X1 , . . . , Xn3 )
              inference                  inference                inference




Inferring Multiple Graph Structures                                                  3
Outline



  Statistical model


  Multi-task learning


  Algorithms and methods


  Model selection


  Experiments




Inferring Multiple Graph Structures   4
Outline



  Statistical model


  Multi-task learning


  Algorithms and methods


  Model selection


  Experiments




Inferring Multiple Graph Structures   5
Gaussian graphical modeling

  Let
          X = (X1 , . . . , Xp ) ∼ N (0p , Σ) and assume n i.i.d. copies of X,
          X be the n × p matrix whose kth row is Xk ,
          Θ = (θij )i,j∈P             Σ−1 be the concentration matrix.

  Graphical interpretation
  Since corij|P{i,j} = −θij / θii θjj for i = j,
                                              
                                                      θij = 0
                   Xi ⊥ Xj |XP{i,j} ⇔
                      ⊥                                   or
                                                edge (i, j) ∈ network.
                                                            /
                                              



       non zeroes in Θ describes the graph structure.

Inferring Multiple Graph Structures                                              6
The model likelihood

  Let S = n−1 X X be the empirical variance-covariance matrix: S
  is a sufficient statistic for X ⇒ L(Θ; X) = L(Θ; S)


  The log-likelihood
                                      n             n           n
                  L(Θ; S) =             log det(Θ) − trace(SΘ) − log(2π).
                                      2             2           2


  The MLE of Θ is S−1
          not defined for n < p
          not sparse ⇒ fully connected graph



Inferring Multiple Graph Structures                                         7
Penalized Approaches

  Penalized Likelihood (Banerjee et al., 2008)
                                      max L(Θ; S) − λ Θ   1
                                      Θ∈S+

          well defined for n < p
          sparse ⇒ sensible graph
          SDP of size O(p2 ) (solved by Friedman et al., 2007)

                                         ¨
  Neighborhood Selection (Meinshausen & Bulhman, 2006)
                            1            2
                             β = argmin
                              Xj − Xj β 2 + λ β 1
                     β∈Rp−1 n
  where Xj is the jth column of X and Xj is X deprived of Xj

          not symmetric, not positive-definite
          p independent L ASSO problems of size (p − 1)

Inferring Multiple Graph Structures                              8
Penalized Approaches

  Penalized Likelihood (Banerjee et al., 2008)
                                      max L(Θ; S) − λ Θ   1
                                      Θ∈S+

          well defined for n < p
          sparse ⇒ sensible graph
          SDP of size O(p2 ) (solved by Friedman et al., 2007)

                                         ¨
  Neighborhood Selection (Meinshausen & Bulhman, 2006)
                            1            2
                             β = argmin
                              Xj − Xj β 2 + λ β 1
                     β∈Rp−1 n
  where Xj is the jth column of X and Xj is X deprived of Xj

          not symmetric, not positive-definite
          p independent L ASSO problems of size (p − 1)

Inferring Multiple Graph Structures                              8
Neighborhood vs. Likelihood

  Pseudo-likelihood (Besag, 1975)
                                                       p
                                P(X1 , . . . , Xp )         P(Xj |{Xk }k=j )
                                                      j=1

                  n                                   n                n
                    log det(D) −
              L(Θ; S) =                                 trace SD−1 Θ2 − log(2π)
                  2                                   2                 2
                  n                                   n                n
        L(Θ; S) = log det(Θ) −                          trace(SΘ)     − log(2π)
                  2                                   2                2
  with D = diag(Θ).

  Proposition (Ambroise, Chiquet, Matias, 2008)
  Neighborhood selection leads to the graph maximizing the
  penalized pseudo-log-likelihood
         ˆ     θ                          ˜
  Proof: βi = − ij , where Θ = arg maxΘ L(Θ; S) − λ Θ                          1
                          θjj


Inferring Multiple Graph Structures                                                9
Neighborhood vs. Likelihood

  Pseudo-likelihood (Besag, 1975)
                                                       p
                                P(X1 , . . . , Xp )         P(Xj |{Xk }k=j )
                                                      j=1

                  n                                   n                n
                    log det(D) −
              L(Θ; S) =                                 trace SD−1 Θ2 − log(2π)
                  2                                   2                 2
                  n                                   n                n
        L(Θ; S) = log det(Θ) −                          trace(SΘ)     − log(2π)
                  2                                   2                2
  with D = diag(Θ).

  Proposition (Ambroise, Chiquet, Matias, 2008)
  Neighborhood selection leads to the graph maximizing the
  penalized pseudo-log-likelihood
         ˆ     θ                          ˜
  Proof: βi = − ij , where Θ = arg maxΘ L(Θ; S) − λ Θ                          1
                          θjj


Inferring Multiple Graph Structures                                                9
Outline



  Statistical model


  Multi-task learning


  Algorithms and methods


  Model selection


  Experiments




Inferring Multiple Graph Structures   10
Multi-task learning
  We have T samples (experimental cond.) of the same variables

          X(t) is the tth data matrix, S(t) is the empirical covariance
          examples are assumed to be drawn from N (0, Σ(t) )
  Ignoring the relationships between the tasks leads to separable
  objectives

                                      max          L(Θ(t) ; S(t) ) − λ Θ(t)   1
                            Θ(t) ∈Rp×p ,t=1...,T



  Multi-task learning = solving the T tasks jointly
  We may couple the objectives
          through the fitting term term,
          through the penalty term.

Inferring Multiple Graph Structures                                               11
Multi-task learning
  We have T samples (experimental cond.) of the same variables

          X(t) is the tth data matrix, S(t) is the empirical covariance
          examples are assumed to be drawn from N (0, Σ(t) )
  Ignoring the relationships between the tasks leads to separable
  objectives

                                      max          L(Θ(t) ; S(t) ) − λ Θ(t)   1
                            Θ(t) ∈Rp×p ,t=1...,T



  Multi-task learning = solving the T tasks jointly
  We may couple the objectives
          through the fitting term term,
          through the penalty term.

Inferring Multiple Graph Structures                                               11
Coupling through the fitting term

  Intertwined L ASSO

                                              T
                                  max               L(Θ(t) ; S(t) ) − λ Θ(t)   1
                               Θ(t) ,t...,T
                                              t=1

                  1     T        (t) is the “pooled-tasks” covariance matrix.
          S=      n     t=1 nt S
          S(t) =      αS(t) + (1 − α)S is a mixture between specific and
          pooled covariance matrices.


          α = 0 pools the data sets and infers a single graph
          α = 1 separates the data sets and infers T graphs
          independently
          α = 1/2 in all our experiments

Inferring Multiple Graph Structures                                                12
Coupling through penalties: group-L ASSO

                                                                   X1             X2

We group parameters by sets of
corresponding edges across graphs:
                                                                   X3             X4



  Graphical group-L ASSO

                                  T                                T               1/2
                                           (t)   (t)                      (t) 2
                      max               L Θ ;S         −λ                θij
                   Θ(t) ,t...,T
                                  t=1                        i,j   t=1
                                                            i=j


          Sparsity pattern shared between graphs
          Identical graphs across tasks

Inferring Multiple Graph Structures                                                      13
Coupling through penalties: group-L ASSO

                                                                   X1                  X2
                                                                        X1                  X2
We group parameters by sets of                                               X1
                                                                                  X1
                                                                                                 X2
                                                                                                      X2
corresponding edges across graphs:
                                                                   X3                  X4
                                                                        X3                  X4
                                                                             X3                  X4
                                                                                  X3                  X4
  Graphical group-L ASSO

                                  T                                T                    1/2
                                           (t)   (t)                          (t) 2
                      max               L Θ ;S         −λ                    θij
                   Θ(t) ,t...,T
                                  t=1                        i,j   t=1
                                                            i=j


          Sparsity pattern shared between graphs
          Identical graphs across tasks

Inferring Multiple Graph Structures                                                                        13
Coupling through penalties: group-L ASSO

                                                                   X1                  X2
                                                                        X1                  X2
We group parameters by sets of                                               X1
                                                                                  X1
                                                                                                 X2
                                                                                                      X2
corresponding edges across graphs:
                                                                   X3                  X4
                                                                        X3                  X4
                                                                             X3                  X4
                                                                                  X3                  X4
  Graphical group-L ASSO

                                  T                                T                    1/2
                                           (t)   (t)                          (t) 2
                      max               L Θ ;S         −λ                    θij
                   Θ(t) ,t...,T
                                  t=1                        i,j   t=1
                                                            i=j


          Sparsity pattern shared between graphs
          Identical graphs across tasks

Inferring Multiple Graph Structures                                                                        13
Coupling through penalties: group-L ASSO

                                                                   X1                  X2
                                                                        X1                  X2
We group parameters by sets of                                               X1
                                                                                  X1
                                                                                                 X2
                                                                                                      X2
corresponding edges across graphs:
                                                                   X3                  X4
                                                                        X3                  X4
                                                                             X3                  X4
                                                                                  X3                  X4
  Graphical group-L ASSO

                                  T                                T                    1/2
                                           (t)   (t)                          (t) 2
                      max               L Θ ;S         −λ                    θij
                   Θ(t) ,t...,T
                                  t=1                        i,j   t=1
                                                            i=j


          Sparsity pattern shared between graphs
          Identical graphs across tasks

Inferring Multiple Graph Structures                                                                        13
Coupling through penalties: group-L ASSO

                                                                   X1                  X2
                                                                        X1                  X2
We group parameters by sets of                                               X1
                                                                                  X1
                                                                                                 X2
                                                                                                      X2
corresponding edges across graphs:
                                                                   X3                  X4
                                                                        X3                  X4
                                                                             X3                  X4
                                                                                  X3                  X4
  Graphical group-L ASSO

                                  T                                T                    1/2
                                           (t)   (t)                          (t) 2
                      max               L Θ ;S         −λ                    θij
                   Θ(t) ,t...,T
                                  t=1                        i,j   t=1
                                                            i=j


          Sparsity pattern shared between graphs
          Identical graphs across tasks

Inferring Multiple Graph Structures                                                                        13
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1            X2

        correlations are likely to be sign
        consistent
        Gene interactions are either
                                                                  X3            X4
        inhibitory or activating across assays

  Graphical cooperative-L ASSO

                    T                                   T              1       T              1
                               (t)    (t)                      (t) 2   2
                                                                                      (t) 2   2
        max             L S ;Θ              −λ                θij          +         θij
        Θ(t)                                                       +                      −
      t=1,...,T t=1                               i,j   t=1                    t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                               14
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1            X2

        correlations are likely to be sign
        consistent
        Gene interactions are either
                                                                  X3            X4
        inhibitory or activating across assays

  Graphical cooperative-L ASSO

                    T                                   T              1       T              1
                               (t)    (t)                      (t) 2   2
                                                                                      (t) 2   2
        max             L S ;Θ              −λ                θij          +         θij
        Θ(t)                                                       +                      −
      t=1,...,T t=1                               i,j   t=1                    t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                               14
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1                   X2
                                                                       X1                   X2
        correlations are likely to be sign                                  X1                   X2
                                                                                 X1                   X2
        consistent
        Gene interactions are either
                                                                  X3                   X4
        inhibitory or activating across assays                         X3                   X4
                                                                            X3                   X4
                                                                                 X3                   X4
  Graphical cooperative-L ASSO

                    T                                   T                   1         T                    1
                               (t)    (t)                      (t) 2        2
                                                                                              (t) 2        2
        max             L S ;Θ              −λ                θij               +            θij
        Θ(t)                                                       +                              −
      t=1,...,T t=1                               i,j   t=1                           t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                                            14
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1                   X2
                                                                       X1                   X2
        correlations are likely to be sign                                  X1                   X2
                                                                                 X1                   X2
        consistent
        Gene interactions are either
                                                                  X3                   X4
        inhibitory or activating across assays                         X3                   X4
                                                                            X3                   X4
                                                                                 X3                   X4
  Graphical cooperative-L ASSO

                    T                                   T                   1         T                    1
                               (t)    (t)                      (t) 2        2
                                                                                              (t) 2        2
        max             L S ;Θ              −λ                θij               +            θij
        Θ(t)                                                       +                              −
      t=1,...,T t=1                               i,j   t=1                           t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                                            14
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1                   X2
                                                                       X1                   X2
        correlations are likely to be sign                                  X1                   X2
                                                                                 X1                   X2
        consistent
        Gene interactions are either
                                                                  X3                   X4
        inhibitory or activating across assays                         X3                   X4
                                                                            X3                   X4
                                                                                 X3                   X4
  Graphical cooperative-L ASSO

                    T                                   T                   1         T                    1
                               (t)    (t)                      (t) 2        2
                                                                                              (t) 2        2
        max             L S ;Θ              −λ                θij               +            θij
        Θ(t)                                                       +                              −
      t=1,...,T t=1                               i,j   t=1                           t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                                            14
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1                   X2
                                                                       X1                   X2
        correlations are likely to be sign                                  X1                   X2
                                                                                 X1                   X2
        consistent
        Gene interactions are either
                                                                  X3                   X4
        inhibitory or activating across assays                         X3                   X4
                                                                            X3                   X4
                                                                                 X3                   X4
  Graphical cooperative-L ASSO

                    T                                   T                   1         T                    1
                               (t)    (t)                      (t) 2        2
                                                                                              (t) 2        2
        max             L S ;Θ              −λ                θij               +            θij
        Θ(t)                                                       +                              −
      t=1,...,T t=1                               i,j   t=1                           t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                                            14
Coupling through penalties: cooperative-L ASSO

        Same grouping, and bet that                               X1                   X2
                                                                       X1                   X2
        correlations are likely to be sign                                  X1                   X2
                                                                                 X1                   X2
        consistent
        Gene interactions are either
                                                                  X3                   X4
        inhibitory or activating across assays                         X3                   X4
                                                                            X3                   X4
                                                                                 X3                   X4
  Graphical cooperative-L ASSO

                    T                                   T                   1         T                    1
                               (t)    (t)                      (t) 2        2
                                                                                              (t) 2        2
        max             L S ;Θ              −λ                θij               +            θij
        Θ(t)                                                       +                              −
      t=1,...,T t=1                               i,j   t=1                           t=1
                                                 i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

          Plausible in many other situations
          Sparsity pattern shared between graphs, which may differ

Inferring Multiple Graph Structures                                                                            14
Outline



  Statistical model


  Multi-task learning


  Algorithms and methods


  Model selection


  Experiments




Inferring Multiple Graph Structures   15
A Geometric View of Sparsity
 Constrained Optimization
 L(β1 , β2 )




                                           max L(β1 , β2 ) − λΩ(β1 , β2 )
                                           β1 ,β2




               β2
                                      β1
Inferring Multiple Graph Structures                                         16
A Geometric View of Sparsity
 Constrained Optimization
 L(β1 , β2 )




                                               max L(β1 , β2 ) − λΩ(β1 , β2 )
                                               β1 ,β2

                                                   max L(β1 , β2 )
                                           ⇔       β1 ,β2
                                                   s.t.     Ω(β1 , β2 ) ≤ c




               β2
                                      β1
Inferring Multiple Graph Structures                                             16
A Geometric View of Sparsity
 Constrained Optimization




                                          max L(β1 , β2 ) − λΩ(β1 , β2 )
                                          β1 ,β2

                                              max L(β1 , β2 )
                                      ⇔       β1 ,β2
                                              s.t.     Ω(β1 , β2 ) ≤ c
 β2




                      β1

Inferring Multiple Graph Structures                                        16
A Geometric View of Sparsity
 Supporting Hyperplane

  An hyperplane supports a set iff
          the set is contained in one half-space
          the set has at least one point on the hyperplane
   β2




                        β1




Inferring Multiple Graph Structures                          17
A Geometric View of Sparsity
 Supporting Hyperplane

  An hyperplane supports a set iff
          the set is contained in one half-space
          the set has at least one point on the hyperplane
   β2




                                      β2




                        β1                 β1




Inferring Multiple Graph Structures                          17
A Geometric View of Sparsity
 Supporting Hyperplane

  An hyperplane supports a set iff
          the set is contained in one half-space
          the set has at least one point on the hyperplane
   β2




                                      β2




                        β1                 β1




Inferring Multiple Graph Structures                          17
A Geometric View of Sparsity
 Supporting Hyperplane

  An hyperplane supports a set iff
          the set is contained in one half-space
          the set has at least one point on the hyperplane
   β2




                                      β2




                        β1                 β1




Inferring Multiple Graph Structures                          17
A Geometric View of Sparsity
 Supporting Hyperplane

  An hyperplane supports a set iff
          the set is contained in one half-space
          the set has at least one point on the hyperplane
   β2




                                      β2




                                                   β2
                        β1                 β1                β1


     There are Supporting Hyperplane at all points of convex sets:
                        Generalize tangents
Inferring Multiple Graph Structures                                  17
A Geometric View of Sparsity
 Dual Cone

                                      Generalizes normals
   β2




                                      β2




                                                            β2
                        β1                      β1               β1




Inferring Multiple Graph Structures                                   18
A Geometric View of Sparsity
 Dual Cone

                                      Generalizes normals
   β2




                                      β2




                                                            β2
                        β1                      β1               β1




Inferring Multiple Graph Structures                                   18
A Geometric View of Sparsity
 Dual Cone

                                      Generalizes normals
   β2




                                      β2




                                                            β2
                        β1                      β1               β1




Inferring Multiple Graph Structures                                   18
A Geometric View of Sparsity
 Dual Cone

                                      Generalizes normals
   β2




                                      β2




                                                            β2
                        β1                      β1                β1

                         Shape of dual cones ⇒ sparsity pattern




Inferring Multiple Graph Structures                                    18
Group-L ASSO balls
                                                                (2)                          (2)
                                                               β2        =0                 β2     = 0.3




Admissible set
   2 tasks (T = 2)
                                                                     1                             1
        2 coefficients (p = 2)




                                             =0
                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1




                                            (2)
Unit ball

                                             β1
                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1
        2         2              1/2
                        (t) 2
                      βi               ≤1                            1                             1

       i=1      t=1
                                             = 0.3


                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1
                                            (2)
                                             β1




                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1


Inferring Multiple Graph Structures                                                                            19
Group-L ASSO balls
                                                                (2)                          (2)
                                                               β2        =0                 β2     = 0.3




Admissible set
   2 tasks (T = 2)
                                                                     1                             1
        2 coefficients (p = 2)




                                             =0
                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1




                                            (2)
Unit ball

                                             β1
                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1
        2         2              1/2
                        (t) 2
                      βi               ≤1                            1                             1

       i=1      t=1
                                             = 0.3


                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1
                                            (2)
                                             β1




                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1


Inferring Multiple Graph Structures                                                                            19
Group-L ASSO balls
                                                                (2)                          (2)
                                                               β2        =0                 β2     = 0.3




Admissible set
   2 tasks (T = 2)
                                                                     1                             1
        2 coefficients (p = 2)




                                             =0
                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1




                                            (2)
Unit ball

                                             β1
                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1
        2         2              1/2
                        (t) 2
                      βi               ≤1                            1                             1

       i=1      t=1
                                             = 0.3


                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1
                                            (2)
                                             β1




                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1


Inferring Multiple Graph Structures                                                                            19
Group-L ASSO balls
                                                                (2)                          (2)
                                                               β2        =0                 β2     = 0.3




Admissible set
   2 tasks (T = 2)
                                                                     1                             1
        2 coefficients (p = 2)




                                             =0
                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1




                                            (2)
Unit ball

                                             β1
                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1
        2         2              1/2
                        (t) 2
                      βi               ≤1                            1                             1

       i=1      t=1
                                             = 0.3


                                                      (1)                          (1)
                                                     β2   −1                  1   β2   −1                  1
                                            (2)
                                             β1




                                                                    −1                           −1
                                                                     (1)                           (1)
                                                                β1                               β1


Inferring Multiple Graph Structures                                                                            19
Cooperative-L ASSO balls
                                                                      (2)                          (2)
                                                                     β2        =0                 β2     = 0.3

Admissible set
   2 tasks (T = 2)
          2 coefficients (p = 2)
                                                                           1                             1


Unit ball




                                                   =0
                                                            (1)                          (1)
                                                           β2   −1                  1   β2   −1                  1




                                                  (2)
                                      1/2



                                                   β1
      2         2
                         (t) 2
                                                                          −1                           −1
                                                                           (1)                           (1)
                      βj                                              β1                               β1
                               +
    j=1       t=1
     2         2                       1/2                                 1                             1

                       (t) 2
                                                   = 0.3


+                    −βj                     ≤1             (1)
                                                           β2
                                                                                         (1)
                                                                                        β2
                           +                                    −1                  1        −1                  1
    j=1      t=1
                                                  (2)
                                                   β1




                                                                          −1                           −1
                                                                           (1)                           (1)
                                                                      β1                               β1


Inferring Multiple Graph Structures                                                                                  20
Cooperative-L ASSO balls
                                                                      (2)                          (2)
                                                                     β2        =0                 β2     = 0.3

Admissible set
   2 tasks (T = 2)
          2 coefficients (p = 2)
                                                                           1                             1


Unit ball




                                                   =0
                                                            (1)                          (1)
                                                           β2   −1                  1   β2   −1                  1




                                                  (2)
                                      1/2



                                                   β1
      2         2
                         (t) 2
                                                                          −1                           −1
                                                                           (1)                           (1)
                      βj                                              β1                               β1
                               +
    j=1       t=1
     2         2                       1/2                                 1                             1

                       (t) 2
                                                   = 0.3


+                    −βj                     ≤1             (1)
                                                           β2
                                                                                         (1)
                                                                                        β2
                           +                                    −1                  1        −1                  1
    j=1      t=1
                                                  (2)
                                                   β1




                                                                          −1                           −1
                                                                           (1)                           (1)
                                                                      β1                               β1


Inferring Multiple Graph Structures                                                                                  20
Cooperative-L ASSO balls
                                                                      (2)                          (2)
                                                                     β2        =0                 β2     = 0.3

Admissible set
   2 tasks (T = 2)
          2 coefficients (p = 2)
                                                                           1                             1


Unit ball




                                                   =0
                                                            (1)                          (1)
                                                           β2   −1                  1   β2   −1                  1




                                                  (2)
                                      1/2



                                                   β1
      2         2
                         (t) 2
                                                                          −1                           −1
                                                                           (1)                           (1)
                      βj                                              β1                               β1
                               +
    j=1       t=1
     2         2                       1/2                                 1                             1

                       (t) 2
                                                   = 0.3


+                    −βj                     ≤1             (1)
                                                           β2
                                                                                         (1)
                                                                                        β2
                           +                                    −1                  1        −1                  1
    j=1      t=1
                                                  (2)
                                                   β1




                                                                          −1                           −1
                                                                           (1)                           (1)
                                                                      β1                               β1


Inferring Multiple Graph Structures                                                                                  20
Cooperative-L ASSO balls
                                                                      (2)                          (2)
                                                                     β2        =0                 β2     = 0.3

Admissible set
   2 tasks (T = 2)
          2 coefficients (p = 2)
                                                                           1                             1


Unit ball




                                                   =0
                                                            (1)                          (1)
                                                           β2   −1                  1   β2   −1                  1




                                                  (2)
                                      1/2



                                                   β1
      2         2
                         (t) 2
                                                                          −1                           −1
                                                                           (1)                           (1)
                      βj                                              β1                               β1
                               +
    j=1       t=1
     2         2                       1/2                                 1                             1

                       (t) 2
                                                   = 0.3


+                    −βj                     ≤1             (1)
                                                           β2
                                                                                         (1)
                                                                                        β2
                           +                                    −1                  1        −1                  1
    j=1      t=1
                                                  (2)
                                                   β1




                                                                          −1                           −1
                                                                           (1)                           (1)
                                                                      β1                               β1


Inferring Multiple Graph Structures                                                                                  20
Cooperative-L ASSO balls
                                                                      (2)                          (2)
                                                                     β2        =0                 β2     = 0.3

Admissible set
   2 tasks (T = 2)
          2 coefficients (p = 2)
                                                                           1                             1


Unit ball




                                                   =0
                                                            (1)                          (1)
                                                           β2   −1                  1   β2   −1                  1




                                                  (2)
                                      1/2



                                                   β1
      2         2
                         (t) 2
                                                                          −1                           −1
                                                                           (1)                           (1)
                      βj                                              β1                               β1
                               +
    j=1       t=1
     2         2                       1/2                                 1                             1

                       (t) 2
                                                   = 0.3


+                    −βj                     ≤1             (1)
                                                           β2
                                                                                         (1)
                                                                                        β2
                           +                                    −1                  1        −1                  1
    j=1      t=1
                                                  (2)
                                                   β1




                                                                          −1                           −1
                                                                           (1)                           (1)
                                                                      β1                               β1


Inferring Multiple Graph Structures                                                                                  20
Cooperative-L ASSO balls
                                                                      (2)                          (2)
                                                                     β2        =0                 β2     = 0.3

Admissible set
   2 tasks (T = 2)
          2 coefficients (p = 2)
                                                                           1                             1


Unit ball




                                                   =0
                                                            (1)                          (1)
                                                           β2   −1                  1   β2   −1                  1




                                                  (2)
                                      1/2



                                                   β1
      2         2
                         (t) 2
                                                                          −1                           −1
                                                                           (1)                           (1)
                      βj                                              β1                               β1
                               +
    j=1       t=1
     2         2                       1/2                                 1                             1

                       (t) 2
                                                   = 0.3


+                    −βj                     ≤1             (1)
                                                           β2
                                                                                         (1)
                                                                                        β2
                           +                                    −1                  1        −1                  1
    j=1      t=1
                                                  (2)
                                                   β1




                                                                          −1                           −1
                                                                           (1)                           (1)
                                                                      β1                               β1


Inferring Multiple Graph Structures                                                                                  20
Decomposition strategy
 Estimate the j th neighborhood of the T graphs

                                              T
                                  max               ˜
                                                    L(K(t) ; S(t) ) − λ Ω(K(t) )
                             K(t) ,t=1...,T
                                              t=1

  decomposes into p convex optimization problems of size

                                      β j = argmin fj (β) + λ Ω(β)
                                          β∈RT ×(p−1)


  where β j is a minimizer iff 0 ∈                      β fj (β)   + λ∂β Ω(β)




Inferring Multiple Graph Structures                                                21
Decomposition strategy
 Estimate the j th neighborhood of the T graphs

                                              T
                                  max               ˜
                                                    L(K(t) ; S(t) ) − λ Ω(K(t) )
                             K(t) ,t=1...,T
                                              t=1

  decomposes into p convex optimization problems of size

                                      β j = argmin fj (β) + λ Ω(β)
                                          β∈RT ×(p−1)


  where β j is a minimizer iff 0 ∈                      β fj (β)   + λ∂β Ω(β)
  Group-L ASSO:
                                                       p−1
                                                              [1:T ]
                                          Ω(β) =             βi
                                                                       2
                                                       i=1
                 [1:T ]
  where β i               is the vector corresponding to the edges (i, j) across
  graphs

Inferring Multiple Graph Structures                                                21
Decomposition strategy
 Estimate the j th neighborhood of the T graphs

                                              T
                                  max               ˜
                                                    L(K(t) ; S(t) ) − λ Ω(K(t) )
                             K(t) ,t=1...,T
                                              t=1

  decomposes into p convex optimization problems of size

                                      β j = argmin fj (β) + λ Ω(β)
                                            β∈RT ×(p−1)


  where β j is a minimizer iff 0 ∈                         β fj (β)   + λ∂β Ω(β)
  Coop-L ASSO:
                                      p−1
                                                  [1:T ]                  [1:T ]
                     Ω(β) =                   βi                 +     −β i
                                                           + 2                     + 2
                                      i=1

                 [1:T ]
  where β i               is the vector corresponding to the edges (i, j) across
  graphs
Inferring Multiple Graph Structures                                                      21
Active set algorithm:                                  yellow belt
  // 0. INITIALIZATION
  β ← 0, A ← ∅
  while 0 ∈ ∂β L(β) do
          /
      // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
      Find a solution h to the smooth problem

                         h f (β A     + h) + λ∂h Ω(β A + h) = 0,      where ∂h Ω = {   h Ω}   .

         βA ← βA + h
        // 2. IDENTIFY NEWLY ZEROED VARIABLES

        A ← A{i}

        // 3. IDENTIFY NEW NON-ZERO VARIABLES
        // Select a candidate i ∈ Ac

                                                      ∂f (β)
        i ← arg max vj , where vj =           min      ∂βj
                                                               + λν
               j∈Ac                          ν∈∂β Ω
                                                 j




  end
Inferring Multiple Graph Structures                                                               22
Active set algorithm:                                  orange belt
  // 0. INITIALIZATION
  β ← 0, A ← ∅
  while 0 ∈ ∂β L(β) do
          /
      // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
      Find a solution h to the smooth problem

                         h f (β A     + h) + λ∂h Ω(β A + h) = 0,      where ∂h Ω = {   h Ω}   .

         βA ← βA + h
        // 2. IDENTIFY NEWLY ZEROED VARIABLES

        A ← A{i}

        // 3. IDENTIFY NEW NON-ZERO VARIABLES
        // Select a candidate i ∈ Ac which violates the more the optimality
        conditions
                                                      ∂f (β)
        i ← arg max vj , where vj =           min      ∂βj
                                                               + λν
               j∈Ac                          ν∈∂β Ω
                                                 j
        if it exists such an i then
               A ← A ∪ {i}
        else
               Stop and return β, which is optimal
        end
  end
Inferring Multiple Graph Structures                                                               22
Active set algorithm:                                 green belt
  // 0. INITIALIZATION
  β ← 0, A ← ∅
  while 0 ∈ ∂β L(β) do
          /
      // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
      Find a solution h to the smooth problem

                         h f (β A     + h) + λ∂h Ω(β A + h) = 0,   where ∂h Ω = {   h Ω}   .

         βA ← βA + h
        // 2. IDENTIFY NEWLY ZEROED VARIABLES
                                      ∂f (β)
        while ∃i ∈ A : βi = 0 and min  ∂β
                                             + λν = 0 do
                                           ν∈∂β Ω      i
                                               i
             A ← A{i}
        end
        // 3. IDENTIFY NEW NON-ZERO VARIABLES
        // Select a candidate i ∈ Ac such that an infinitesimal change of βi
        provides the highest reduction of L
                                         ∂f (β)
        i ← arg max vj , where vj = min   ∂β
                                                + λν
                                             ν∈∂β Ω        j
               j∈Ac                                j
        if vi = 0 then
              A ← A ∪ {i}
        else
              Stop and return β, which is optimal
        end
  end
Inferring Multiple Graph Structures                                                            22
Outline



  Statistical model


  Multi-task learning


  Algorithms and methods


  Model selection


  Experiments




Inferring Multiple Graph Structures   23
Tuning the penalty parameter
 What does the literature say?


  Theory based penalty choices
                                                                                   √
    1. Optimal order of penalty in the p                        n framework:           n log p
                                                             Bunea et al. 2007, Bickel et al. 2009

    2. Control on the probability of connecting two distinct
       connectivity sets
                              Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009

          practically much too conservative

  Cross-validation
          Optimal in terms of prediction, not in terms of selection
          Problematic with small samples:
          changes the sparsity constraint due to sample size


Inferring Multiple Graph Structures                                                                  24
Tuning the penalty parameter
 BIC / AIC



  Theorem (Zou et al. 2008)

                                            ˆlasso ˆlasso
                                         df(βλ ) = βλ
                                                             0


  Straightforward extensions to the graphical framework

                                           ˆ            ˆ        log n
                                BIC(λ) = L(Θλ ; X) − df(Θλ )
                                                                   2

                                                 ˆ            ˆ
                                      AIC(λ) = L(Θλ ; X) − df(Θλ )

  Rely on asymptotic approximations, but still relevant for small
  data set


Inferring Multiple Graph Structures                                      25
Outline



  Statistical model


  Multi-task learning


  Algorithms and methods


  Model selection


  Experiments




Inferring Multiple Graph Structures   26
Data Generation

  We set

          the number of nodes p
          the number of edges K
          the number of examples n

  Process
   1. Generate a random adjacency matrix with 2 K
      off-diagonal terms
    2. Compute the normalized Laplacian L
    3. Generate a symmetric matrix of random signs R
    4. Compute the concentration matrix Kij = Lij Rij
    5. compute Σ by pseudo-inversion of K
    6. generate correlated Gaussian data ∼ N (0, Σ )

Inferring Multiple Graph Structures                     27
Simulating Related Tasks

  Generate
   1. an “ancestor” with p = 20 nodes and K = 20 edges
    2. T = 4 children by adding and deleting δ edges
    3. T = 4 Gaussian samples




                Figure: ancestor and children with δ = 2 perturbations



Inferring Multiple Graph Structures                                      28
Simulating Related Tasks

  Generate
   1. an “ancestor” with p = 20 nodes and K = 20 edges
    2. T = 4 children by adding and deleting δ edges
    3. T = 4 Gaussian samples




                Figure: ancestor and children with δ = 2 perturbations



Inferring Multiple Graph Structures                                      28
Simulating Related Tasks

  Generate
   1. an “ancestor” with p = 20 nodes and K = 20 edges
    2. T = 4 children by adding and deleting δ edges
    3. T = 4 Gaussian samples




                Figure: ancestor and children with δ = 2 perturbations



Inferring Multiple Graph Structures                                      28
Simulation results




           Precision/Recall curve               ROC curve
                 precision = TP/(TP+FP)   fallout = FP/N (type I error)
                  recall = TP/P (power)     recall = TP/P (power)




Inferring Multiple Graph Structures                                       29
Simulation results
 large sample size


                            penalty: λmax −→ 0                                         penalty: λmax −→ 0
                1.0




                                                                           1.0
                0.8




                                                                           0.8
                0.6




                                                                           0.6
    precision




                                                                  recall
                0.4




                                                                           0.4
                            CoopLasso                                                                         CoopLasso
                0.2




                                                                           0.2
                            GroupLasso                                                                        GroupLasso
                            Intertwined                                                                       Intertwined
                            Independent                                                                       Independent
                            Pooled                                                                            Pooled
                0.0




                                                                           0.0
                      0.0     0.2     0.4     0.6     0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                   fallout


                                                    Figure: nt = 100, δ = 1


Inferring Multiple Graph Structures                                                                                         29
Simulation results
 large sample size


                            penalty: λmax −→ 0                                         penalty: λmax −→ 0
                1.0




                                                                           1.0
                0.8




                                                                           0.8
                0.6




                                                                           0.6
    precision




                                                                  recall
                0.4




                                                                           0.4
                            CoopLasso                                                                         CoopLasso
                0.2




                                                                           0.2
                            GroupLasso                                                                        GroupLasso
                            Intertwined                                                                       Intertwined
                            Independent                                                                       Independent
                            Pooled                                                                            Pooled
                0.0




                                                                           0.0
                      0.0     0.2     0.4     0.6     0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                   fallout


                                                    Figure: nt = 100, δ = 3


Inferring Multiple Graph Structures                                                                                         29
Simulation results
 large sample size


                            penalty: λmax −→ 0                                         penalty: λmax −→ 0
                1.0




                                                                           1.0
                0.8




                                                                           0.8
                0.6




                                                                           0.6
    precision




                                                                  recall
                0.4




                                                                           0.4
                            CoopLasso                                                                         CoopLasso
                0.2




                                                                           0.2
                            GroupLasso                                                                        GroupLasso
                            Intertwined                                                                       Intertwined
                            Independent                                                                       Independent
                            Pooled                                                                            Pooled
                0.0




                                                                           0.0
                      0.0     0.2     0.4     0.6     0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                   fallout


                                                    Figure: nt = 100, δ = 5


Inferring Multiple Graph Structures                                                                                         29
Simulation results
 medium sample size


                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 50, δ = 1


Inferring Multiple Graph Structures                                                                                        29
Simulation results
 medium sample size


                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 50, δ = 3


Inferring Multiple Graph Structures                                                                                        29
Simulation results
 medium sample size


                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 50, δ = 5


Inferring Multiple Graph Structures                                                                                        29
Simulation results
 small sample size


                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 25, δ = 1


Inferring Multiple Graph Structures                                                                                        29
Simulation results
 small sample size


                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 25, δ = 3


Inferring Multiple Graph Structures                                                                                        29
Simulation results
 small sample size


                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 25, δ = 5


Inferring Multiple Graph Structures                                                                                        29
Breast Cancer
 Prediction of the outcome of preoperative chemotherapy




  Two types of patients
  Patient response can be classified either as
    1. pathologic complete response (PCR)
    2. residual disease (not PCR)



  Gene expression data
          133 patients (99 not PCR, 34 PCR)
          26 identified genes (differential analysis)




Inferring Multiple Graph Structures                       30
Package Demo




                                      cancer data: Coop-Lasso




Inferring Multiple Graph Structures                             31
Conclusions
  To sum-up
          Clarified links between neighborhood selection and graphical
          L ASSO
          Identified the relevance of Multi-Task Learning in network
          inference
          First methods for inferring multiple Gaussian Graphical Models
          Consistent improvements upon the available baseline solutions
          Available in the R package SIMoNe


  Perspectives
          Explore model-selection capabilities
          Other applications of the Cooperative-L ASSO
          Theoretical analysis (uniqueness, selection consistency)

Inferring Multiple Graph Structures                                        32

Contenu connexe

Tendances

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 
UMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUmberto Lupo
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Xin-She Yang
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4Phong Vo
 
Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)Matthew Leingang
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal designAlexander Decker
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learningYujiro Katagiri
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayesPhong Vo
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 

Tendances (20)

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
UMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUMAP - Mathematics and implementational details
UMAP - Mathematics and implementational details
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4
 
Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 

Similaire à Multitask learning for GGM

Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
Common fixed point for two weakly compatible pairs ...
Common fixed point for two weakly compatible pairs                           ...Common fixed point for two weakly compatible pairs                           ...
Common fixed point for two weakly compatible pairs ...Alexander Decker
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Csr2011 june14 14_00_agrawal
Csr2011 june14 14_00_agrawalCsr2011 june14 14_00_agrawal
Csr2011 june14 14_00_agrawalCSR2011
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2zukun
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer VisionYap Wooi Hen
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Introduction to Grid Generation
Introduction to Grid GenerationIntroduction to Grid Generation
Introduction to Grid GenerationDelta Pi Systems
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 

Similaire à Multitask learning for GGM (20)

Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Common fixed point for two weakly compatible pairs ...
Common fixed point for two weakly compatible pairs                           ...Common fixed point for two weakly compatible pairs                           ...
Common fixed point for two weakly compatible pairs ...
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
sada_pres
sada_pressada_pres
sada_pres
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Csr2011 june14 14_00_agrawal
Csr2011 june14 14_00_agrawalCsr2011 june14 14_00_agrawal
Csr2011 june14 14_00_agrawal
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer Vision
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Introduction to Grid Generation
Introduction to Grid GenerationIntroduction to Grid Generation
Introduction to Grid Generation
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 
Symmetrical2
Symmetrical2Symmetrical2
Symmetrical2
 

Plus de Laboratoire Statistique et génome

Plus de Laboratoire Statistique et génome (6)

Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
Sparsity by worst-case quadratic penalties
Sparsity by worst-case quadratic penaltiesSparsity by worst-case quadratic penalties
Sparsity by worst-case quadratic penalties
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
 
Weighted Lasso for Network inference
Weighted Lasso for Network inferenceWeighted Lasso for Network inference
Weighted Lasso for Network inference
 
Gaussian Graphical Models with latent structure
Gaussian Graphical Models with latent structureGaussian Graphical Models with latent structure
Gaussian Graphical Models with latent structure
 
SIMoNe: Statistical Iference for MOdular NEtworks
SIMoNe: Statistical Iference for MOdular NEtworksSIMoNe: Statistical Iference for MOdular NEtworks
SIMoNe: Statistical Iference for MOdular NEtworks
 

Multitask learning for GGM

  • 1. Inferring Multiple Graph Structures Julien Chiquet1 , Yves Grandvalet2 , Christophe Ambroise1 1 ´ ´ ´ Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne 2 ´ ` Heudiasyc, CNRS & Universite de Technologie de Compiegne NeMo – 21 juin 2010 Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical structures. Chiquet, Grasseau, Charbonnier and Ambroise, R-package SIMoNe. http://stat.genopole.cnrs.fr/~jchiquet/fr/softwares/simone Inferring Multiple Graph Structures 1
  • 2. Problem Inference few arrays ⇔ few examples lots of genes ⇔ high dimension Which interactions? interactions ⇔ very high dimension The main trouble is the low sample size and high dimensional setting Our main hope is to benefit from sparsity: few genes interact Inferring Multiple Graph Structures 2
  • 3. Handling the scarcity of data Merge several experimental conditions experiment 1 experiment 2 experiment 3 Inferring Multiple Graph Structures 3
  • 4. Handling the scarcity of data Inferring each graph independently does not help experiment 1 experiment 2 experiment 3 (1) (1) (2) (2) (3) (3) (X1 , . . . , Xn1 ) (X1 , . . . , Xn2 ) (X1 , . . . , Xn3 ) inference inference inference Inferring Multiple Graph Structures 3
  • 5. Handling the scarcity of data By pooling all the available data experiment 1 experiment 2 experiment 3 (X1 , . . . , Xn ), n = n1 + n2 + n3 . inference Inferring Multiple Graph Structures 3
  • 6. Handling the scarcity of data experiment 1 experiment 2 experiment 3 (1) (1) (2) (2) (3) (3) (X1 , . . . , Xn1 ) (X1 , . . . , Xn2 ) (X1 , . . . , Xn3 ) inference inference inference Inferring Multiple Graph Structures 3
  • 7. Handling the scarcity of data By breaking the separability experiment 1 experiment 2 experiment 3 (1) (1) (2) (2) (3) (3) (X1 , . . . , Xn1 ) (X1 , . . . , Xn2 ) (X1 , . . . , Xn3 ) inference inference inference Inferring Multiple Graph Structures 3
  • 8. Handling the scarcity of data By breaking the separability experiment 1 experiment 2 experiment 3 (1) (1) (2) (2) (3) (3) (X1 , . . . , Xn1 ) (X1 , . . . , Xn2 ) (X1 , . . . , Xn3 ) inference inference inference Inferring Multiple Graph Structures 3
  • 9. Outline Statistical model Multi-task learning Algorithms and methods Model selection Experiments Inferring Multiple Graph Structures 4
  • 10. Outline Statistical model Multi-task learning Algorithms and methods Model selection Experiments Inferring Multiple Graph Structures 5
  • 11. Gaussian graphical modeling Let X = (X1 , . . . , Xp ) ∼ N (0p , Σ) and assume n i.i.d. copies of X, X be the n × p matrix whose kth row is Xk , Θ = (θij )i,j∈P Σ−1 be the concentration matrix. Graphical interpretation Since corij|P{i,j} = −θij / θii θjj for i = j,   θij = 0 Xi ⊥ Xj |XP{i,j} ⇔ ⊥ or edge (i, j) ∈ network. /  non zeroes in Θ describes the graph structure. Inferring Multiple Graph Structures 6
  • 12. The model likelihood Let S = n−1 X X be the empirical variance-covariance matrix: S is a sufficient statistic for X ⇒ L(Θ; X) = L(Θ; S) The log-likelihood n n n L(Θ; S) = log det(Θ) − trace(SΘ) − log(2π). 2 2 2 The MLE of Θ is S−1 not defined for n < p not sparse ⇒ fully connected graph Inferring Multiple Graph Structures 7
  • 13. Penalized Approaches Penalized Likelihood (Banerjee et al., 2008) max L(Θ; S) − λ Θ 1 Θ∈S+ well defined for n < p sparse ⇒ sensible graph SDP of size O(p2 ) (solved by Friedman et al., 2007) ¨ Neighborhood Selection (Meinshausen & Bulhman, 2006) 1 2 β = argmin Xj − Xj β 2 + λ β 1 β∈Rp−1 n where Xj is the jth column of X and Xj is X deprived of Xj not symmetric, not positive-definite p independent L ASSO problems of size (p − 1) Inferring Multiple Graph Structures 8
  • 14. Penalized Approaches Penalized Likelihood (Banerjee et al., 2008) max L(Θ; S) − λ Θ 1 Θ∈S+ well defined for n < p sparse ⇒ sensible graph SDP of size O(p2 ) (solved by Friedman et al., 2007) ¨ Neighborhood Selection (Meinshausen & Bulhman, 2006) 1 2 β = argmin Xj − Xj β 2 + λ β 1 β∈Rp−1 n where Xj is the jth column of X and Xj is X deprived of Xj not symmetric, not positive-definite p independent L ASSO problems of size (p − 1) Inferring Multiple Graph Structures 8
  • 15. Neighborhood vs. Likelihood Pseudo-likelihood (Besag, 1975) p P(X1 , . . . , Xp ) P(Xj |{Xk }k=j ) j=1 n n n log det(D) − L(Θ; S) = trace SD−1 Θ2 − log(2π) 2 2 2 n n n L(Θ; S) = log det(Θ) − trace(SΘ) − log(2π) 2 2 2 with D = diag(Θ). Proposition (Ambroise, Chiquet, Matias, 2008) Neighborhood selection leads to the graph maximizing the penalized pseudo-log-likelihood ˆ θ ˜ Proof: βi = − ij , where Θ = arg maxΘ L(Θ; S) − λ Θ 1 θjj Inferring Multiple Graph Structures 9
  • 16. Neighborhood vs. Likelihood Pseudo-likelihood (Besag, 1975) p P(X1 , . . . , Xp ) P(Xj |{Xk }k=j ) j=1 n n n log det(D) − L(Θ; S) = trace SD−1 Θ2 − log(2π) 2 2 2 n n n L(Θ; S) = log det(Θ) − trace(SΘ) − log(2π) 2 2 2 with D = diag(Θ). Proposition (Ambroise, Chiquet, Matias, 2008) Neighborhood selection leads to the graph maximizing the penalized pseudo-log-likelihood ˆ θ ˜ Proof: βi = − ij , where Θ = arg maxΘ L(Θ; S) − λ Θ 1 θjj Inferring Multiple Graph Structures 9
  • 17. Outline Statistical model Multi-task learning Algorithms and methods Model selection Experiments Inferring Multiple Graph Structures 10
  • 18. Multi-task learning We have T samples (experimental cond.) of the same variables X(t) is the tth data matrix, S(t) is the empirical covariance examples are assumed to be drawn from N (0, Σ(t) ) Ignoring the relationships between the tasks leads to separable objectives max L(Θ(t) ; S(t) ) − λ Θ(t) 1 Θ(t) ∈Rp×p ,t=1...,T Multi-task learning = solving the T tasks jointly We may couple the objectives through the fitting term term, through the penalty term. Inferring Multiple Graph Structures 11
  • 19. Multi-task learning We have T samples (experimental cond.) of the same variables X(t) is the tth data matrix, S(t) is the empirical covariance examples are assumed to be drawn from N (0, Σ(t) ) Ignoring the relationships between the tasks leads to separable objectives max L(Θ(t) ; S(t) ) − λ Θ(t) 1 Θ(t) ∈Rp×p ,t=1...,T Multi-task learning = solving the T tasks jointly We may couple the objectives through the fitting term term, through the penalty term. Inferring Multiple Graph Structures 11
  • 20. Coupling through the fitting term Intertwined L ASSO T max L(Θ(t) ; S(t) ) − λ Θ(t) 1 Θ(t) ,t...,T t=1 1 T (t) is the “pooled-tasks” covariance matrix. S= n t=1 nt S S(t) = αS(t) + (1 − α)S is a mixture between specific and pooled covariance matrices. α = 0 pools the data sets and infers a single graph α = 1 separates the data sets and infers T graphs independently α = 1/2 in all our experiments Inferring Multiple Graph Structures 12
  • 21. Coupling through penalties: group-L ASSO X1 X2 We group parameters by sets of corresponding edges across graphs: X3 X4 Graphical group-L ASSO T T 1/2 (t) (t) (t) 2 max L Θ ;S −λ θij Θ(t) ,t...,T t=1 i,j t=1 i=j Sparsity pattern shared between graphs Identical graphs across tasks Inferring Multiple Graph Structures 13
  • 22. Coupling through penalties: group-L ASSO X1 X2 X1 X2 We group parameters by sets of X1 X1 X2 X2 corresponding edges across graphs: X3 X4 X3 X4 X3 X4 X3 X4 Graphical group-L ASSO T T 1/2 (t) (t) (t) 2 max L Θ ;S −λ θij Θ(t) ,t...,T t=1 i,j t=1 i=j Sparsity pattern shared between graphs Identical graphs across tasks Inferring Multiple Graph Structures 13
  • 23. Coupling through penalties: group-L ASSO X1 X2 X1 X2 We group parameters by sets of X1 X1 X2 X2 corresponding edges across graphs: X3 X4 X3 X4 X3 X4 X3 X4 Graphical group-L ASSO T T 1/2 (t) (t) (t) 2 max L Θ ;S −λ θij Θ(t) ,t...,T t=1 i,j t=1 i=j Sparsity pattern shared between graphs Identical graphs across tasks Inferring Multiple Graph Structures 13
  • 24. Coupling through penalties: group-L ASSO X1 X2 X1 X2 We group parameters by sets of X1 X1 X2 X2 corresponding edges across graphs: X3 X4 X3 X4 X3 X4 X3 X4 Graphical group-L ASSO T T 1/2 (t) (t) (t) 2 max L Θ ;S −λ θij Θ(t) ,t...,T t=1 i,j t=1 i=j Sparsity pattern shared between graphs Identical graphs across tasks Inferring Multiple Graph Structures 13
  • 25. Coupling through penalties: group-L ASSO X1 X2 X1 X2 We group parameters by sets of X1 X1 X2 X2 corresponding edges across graphs: X3 X4 X3 X4 X3 X4 X3 X4 Graphical group-L ASSO T T 1/2 (t) (t) (t) 2 max L Θ ;S −λ θij Θ(t) ,t...,T t=1 i,j t=1 i=j Sparsity pattern shared between graphs Identical graphs across tasks Inferring Multiple Graph Structures 13
  • 26. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 correlations are likely to be sign consistent Gene interactions are either X3 X4 inhibitory or activating across assays Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 27. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 correlations are likely to be sign consistent Gene interactions are either X3 X4 inhibitory or activating across assays Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 28. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 X1 X2 correlations are likely to be sign X1 X2 X1 X2 consistent Gene interactions are either X3 X4 inhibitory or activating across assays X3 X4 X3 X4 X3 X4 Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 29. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 X1 X2 correlations are likely to be sign X1 X2 X1 X2 consistent Gene interactions are either X3 X4 inhibitory or activating across assays X3 X4 X3 X4 X3 X4 Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 30. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 X1 X2 correlations are likely to be sign X1 X2 X1 X2 consistent Gene interactions are either X3 X4 inhibitory or activating across assays X3 X4 X3 X4 X3 X4 Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 31. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 X1 X2 correlations are likely to be sign X1 X2 X1 X2 consistent Gene interactions are either X3 X4 inhibitory or activating across assays X3 X4 X3 X4 X3 X4 Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 32. Coupling through penalties: cooperative-L ASSO Same grouping, and bet that X1 X2 X1 X2 correlations are likely to be sign X1 X2 X1 X2 consistent Gene interactions are either X3 X4 inhibitory or activating across assays X3 X4 X3 X4 X3 X4 Graphical cooperative-L ASSO T T 1 T 1 (t) (t) (t) 2 2 (t) 2 2 max L S ;Θ −λ θij + θij Θ(t) + − t=1,...,T t=1 i,j t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). Plausible in many other situations Sparsity pattern shared between graphs, which may differ Inferring Multiple Graph Structures 14
  • 33. Outline Statistical model Multi-task learning Algorithms and methods Model selection Experiments Inferring Multiple Graph Structures 15
  • 34. A Geometric View of Sparsity Constrained Optimization L(β1 , β2 ) max L(β1 , β2 ) − λΩ(β1 , β2 ) β1 ,β2 β2 β1 Inferring Multiple Graph Structures 16
  • 35. A Geometric View of Sparsity Constrained Optimization L(β1 , β2 ) max L(β1 , β2 ) − λΩ(β1 , β2 ) β1 ,β2 max L(β1 , β2 ) ⇔ β1 ,β2 s.t. Ω(β1 , β2 ) ≤ c β2 β1 Inferring Multiple Graph Structures 16
  • 36. A Geometric View of Sparsity Constrained Optimization max L(β1 , β2 ) − λΩ(β1 , β2 ) β1 ,β2 max L(β1 , β2 ) ⇔ β1 ,β2 s.t. Ω(β1 , β2 ) ≤ c β2 β1 Inferring Multiple Graph Structures 16
  • 37. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β1 Inferring Multiple Graph Structures 17
  • 38. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1 Inferring Multiple Graph Structures 17
  • 39. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1 Inferring Multiple Graph Structures 17
  • 40. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1 Inferring Multiple Graph Structures 17
  • 41. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set iff the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β2 β1 β1 β1 There are Supporting Hyperplane at all points of convex sets: Generalize tangents Inferring Multiple Graph Structures 17
  • 42. A Geometric View of Sparsity Dual Cone Generalizes normals β2 β2 β2 β1 β1 β1 Inferring Multiple Graph Structures 18
  • 43. A Geometric View of Sparsity Dual Cone Generalizes normals β2 β2 β2 β1 β1 β1 Inferring Multiple Graph Structures 18
  • 44. A Geometric View of Sparsity Dual Cone Generalizes normals β2 β2 β2 β1 β1 β1 Inferring Multiple Graph Structures 18
  • 45. A Geometric View of Sparsity Dual Cone Generalizes normals β2 β2 β2 β1 β1 β1 Shape of dual cones ⇒ sparsity pattern Inferring Multiple Graph Structures 18
  • 46. Group-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 1 1 2 coefficients (p = 2) =0 (1) (1) β2 −1 1 β2 −1 1 (2) Unit ball β1 −1 −1 (1) (1) β1 β1 2 2 1/2 (t) 2 βi ≤1 1 1 i=1 t=1 = 0.3 (1) (1) β2 −1 1 β2 −1 1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 19
  • 47. Group-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 1 1 2 coefficients (p = 2) =0 (1) (1) β2 −1 1 β2 −1 1 (2) Unit ball β1 −1 −1 (1) (1) β1 β1 2 2 1/2 (t) 2 βi ≤1 1 1 i=1 t=1 = 0.3 (1) (1) β2 −1 1 β2 −1 1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 19
  • 48. Group-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 1 1 2 coefficients (p = 2) =0 (1) (1) β2 −1 1 β2 −1 1 (2) Unit ball β1 −1 −1 (1) (1) β1 β1 2 2 1/2 (t) 2 βi ≤1 1 1 i=1 t=1 = 0.3 (1) (1) β2 −1 1 β2 −1 1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 19
  • 49. Group-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 1 1 2 coefficients (p = 2) =0 (1) (1) β2 −1 1 β2 −1 1 (2) Unit ball β1 −1 −1 (1) (1) β1 β1 2 2 1/2 (t) 2 βi ≤1 1 1 i=1 t=1 = 0.3 (1) (1) β2 −1 1 β2 −1 1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 19
  • 50. Cooperative-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 2 coefficients (p = 2) 1 1 Unit ball =0 (1) (1) β2 −1 1 β2 −1 1 (2) 1/2 β1 2 2 (t) 2 −1 −1 (1) (1) βj β1 β1 + j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3 + −βj ≤1 (1) β2 (1) β2 + −1 1 −1 1 j=1 t=1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 20
  • 51. Cooperative-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 2 coefficients (p = 2) 1 1 Unit ball =0 (1) (1) β2 −1 1 β2 −1 1 (2) 1/2 β1 2 2 (t) 2 −1 −1 (1) (1) βj β1 β1 + j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3 + −βj ≤1 (1) β2 (1) β2 + −1 1 −1 1 j=1 t=1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 20
  • 52. Cooperative-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 2 coefficients (p = 2) 1 1 Unit ball =0 (1) (1) β2 −1 1 β2 −1 1 (2) 1/2 β1 2 2 (t) 2 −1 −1 (1) (1) βj β1 β1 + j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3 + −βj ≤1 (1) β2 (1) β2 + −1 1 −1 1 j=1 t=1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 20
  • 53. Cooperative-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 2 coefficients (p = 2) 1 1 Unit ball =0 (1) (1) β2 −1 1 β2 −1 1 (2) 1/2 β1 2 2 (t) 2 −1 −1 (1) (1) βj β1 β1 + j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3 + −βj ≤1 (1) β2 (1) β2 + −1 1 −1 1 j=1 t=1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 20
  • 54. Cooperative-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 2 coefficients (p = 2) 1 1 Unit ball =0 (1) (1) β2 −1 1 β2 −1 1 (2) 1/2 β1 2 2 (t) 2 −1 −1 (1) (1) βj β1 β1 + j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3 + −βj ≤1 (1) β2 (1) β2 + −1 1 −1 1 j=1 t=1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 20
  • 55. Cooperative-L ASSO balls (2) (2) β2 =0 β2 = 0.3 Admissible set 2 tasks (T = 2) 2 coefficients (p = 2) 1 1 Unit ball =0 (1) (1) β2 −1 1 β2 −1 1 (2) 1/2 β1 2 2 (t) 2 −1 −1 (1) (1) βj β1 β1 + j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3 + −βj ≤1 (1) β2 (1) β2 + −1 1 −1 1 j=1 t=1 (2) β1 −1 −1 (1) (1) β1 β1 Inferring Multiple Graph Structures 20
  • 56. Decomposition strategy Estimate the j th neighborhood of the T graphs T max ˜ L(K(t) ; S(t) ) − λ Ω(K(t) ) K(t) ,t=1...,T t=1 decomposes into p convex optimization problems of size β j = argmin fj (β) + λ Ω(β) β∈RT ×(p−1) where β j is a minimizer iff 0 ∈ β fj (β) + λ∂β Ω(β) Inferring Multiple Graph Structures 21
  • 57. Decomposition strategy Estimate the j th neighborhood of the T graphs T max ˜ L(K(t) ; S(t) ) − λ Ω(K(t) ) K(t) ,t=1...,T t=1 decomposes into p convex optimization problems of size β j = argmin fj (β) + λ Ω(β) β∈RT ×(p−1) where β j is a minimizer iff 0 ∈ β fj (β) + λ∂β Ω(β) Group-L ASSO: p−1 [1:T ] Ω(β) = βi 2 i=1 [1:T ] where β i is the vector corresponding to the edges (i, j) across graphs Inferring Multiple Graph Structures 21
  • 58. Decomposition strategy Estimate the j th neighborhood of the T graphs T max ˜ L(K(t) ; S(t) ) − λ Ω(K(t) ) K(t) ,t=1...,T t=1 decomposes into p convex optimization problems of size β j = argmin fj (β) + λ Ω(β) β∈RT ×(p−1) where β j is a minimizer iff 0 ∈ β fj (β) + λ∂β Ω(β) Coop-L ASSO: p−1 [1:T ] [1:T ] Ω(β) = βi + −β i + 2 + 2 i=1 [1:T ] where β i is the vector corresponding to the edges (i, j) across graphs Inferring Multiple Graph Structures 21
  • 59. Active set algorithm: yellow belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h Ω(β A + h) = 0, where ∂h Ω = { h Ω} . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES A ← A{i} // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac ∂f (β) i ← arg max vj , where vj = min ∂βj + λν j∈Ac ν∈∂β Ω j end Inferring Multiple Graph Structures 22
  • 60. Active set algorithm: orange belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h Ω(β A + h) = 0, where ∂h Ω = { h Ω} . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES A ← A{i} // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac which violates the more the optimality conditions ∂f (β) i ← arg max vj , where vj = min ∂βj + λν j∈Ac ν∈∂β Ω j if it exists such an i then A ← A ∪ {i} else Stop and return β, which is optimal end end Inferring Multiple Graph Structures 22
  • 61. Active set algorithm: green belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h Ω(β A + h) = 0, where ∂h Ω = { h Ω} . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES ∂f (β) while ∃i ∈ A : βi = 0 and min ∂β + λν = 0 do ν∈∂β Ω i i A ← A{i} end // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac such that an infinitesimal change of βi provides the highest reduction of L ∂f (β) i ← arg max vj , where vj = min ∂β + λν ν∈∂β Ω j j∈Ac j if vi = 0 then A ← A ∪ {i} else Stop and return β, which is optimal end end Inferring Multiple Graph Structures 22
  • 62. Outline Statistical model Multi-task learning Algorithms and methods Model selection Experiments Inferring Multiple Graph Structures 23
  • 63. Tuning the penalty parameter What does the literature say? Theory based penalty choices √ 1. Optimal order of penalty in the p n framework: n log p Bunea et al. 2007, Bickel et al. 2009 2. Control on the probability of connecting two distinct connectivity sets Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009 practically much too conservative Cross-validation Optimal in terms of prediction, not in terms of selection Problematic with small samples: changes the sparsity constraint due to sample size Inferring Multiple Graph Structures 24
  • 64. Tuning the penalty parameter BIC / AIC Theorem (Zou et al. 2008) ˆlasso ˆlasso df(βλ ) = βλ 0 Straightforward extensions to the graphical framework ˆ ˆ log n BIC(λ) = L(Θλ ; X) − df(Θλ ) 2 ˆ ˆ AIC(λ) = L(Θλ ; X) − df(Θλ ) Rely on asymptotic approximations, but still relevant for small data set Inferring Multiple Graph Structures 25
  • 65. Outline Statistical model Multi-task learning Algorithms and methods Model selection Experiments Inferring Multiple Graph Structures 26
  • 66. Data Generation We set the number of nodes p the number of edges K the number of examples n Process 1. Generate a random adjacency matrix with 2 K off-diagonal terms 2. Compute the normalized Laplacian L 3. Generate a symmetric matrix of random signs R 4. Compute the concentration matrix Kij = Lij Rij 5. compute Σ by pseudo-inversion of K 6. generate correlated Gaussian data ∼ N (0, Σ ) Inferring Multiple Graph Structures 27
  • 67. Simulating Related Tasks Generate 1. an “ancestor” with p = 20 nodes and K = 20 edges 2. T = 4 children by adding and deleting δ edges 3. T = 4 Gaussian samples Figure: ancestor and children with δ = 2 perturbations Inferring Multiple Graph Structures 28
  • 68. Simulating Related Tasks Generate 1. an “ancestor” with p = 20 nodes and K = 20 edges 2. T = 4 children by adding and deleting δ edges 3. T = 4 Gaussian samples Figure: ancestor and children with δ = 2 perturbations Inferring Multiple Graph Structures 28
  • 69. Simulating Related Tasks Generate 1. an “ancestor” with p = 20 nodes and K = 20 edges 2. T = 4 children by adding and deleting δ edges 3. T = 4 Gaussian samples Figure: ancestor and children with δ = 2 perturbations Inferring Multiple Graph Structures 28
  • 70. Simulation results Precision/Recall curve ROC curve precision = TP/(TP+FP) fallout = FP/N (type I error) recall = TP/P (power) recall = TP/P (power) Inferring Multiple Graph Structures 29
  • 71. Simulation results large sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 1 Inferring Multiple Graph Structures 29
  • 72. Simulation results large sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 3 Inferring Multiple Graph Structures 29
  • 73. Simulation results large sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 5 Inferring Multiple Graph Structures 29
  • 74. Simulation results medium sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 1 Inferring Multiple Graph Structures 29
  • 75. Simulation results medium sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 3 Inferring Multiple Graph Structures 29
  • 76. Simulation results medium sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 5 Inferring Multiple Graph Structures 29
  • 77. Simulation results small sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 1 Inferring Multiple Graph Structures 29
  • 78. Simulation results small sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 3 Inferring Multiple Graph Structures 29
  • 79. Simulation results small sample size penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 5 Inferring Multiple Graph Structures 29
  • 80. Breast Cancer Prediction of the outcome of preoperative chemotherapy Two types of patients Patient response can be classified either as 1. pathologic complete response (PCR) 2. residual disease (not PCR) Gene expression data 133 patients (99 not PCR, 34 PCR) 26 identified genes (differential analysis) Inferring Multiple Graph Structures 30
  • 81. Package Demo cancer data: Coop-Lasso Inferring Multiple Graph Structures 31
  • 82. Conclusions To sum-up Clarified links between neighborhood selection and graphical L ASSO Identified the relevance of Multi-Task Learning in network inference First methods for inferring multiple Gaussian Graphical Models Consistent improvements upon the available baseline solutions Available in the R package SIMoNe Perspectives Explore model-selection capabilities Other applications of the Cooperative-L ASSO Theoretical analysis (uniqueness, selection consistency) Inferring Multiple Graph Structures 32