SlideShare une entreprise Scribd logo
1  sur  83
Center for Evolutionary Medicine and Informatics




        Multi-Task Learning:
Theory, Algorithms, and Applications
             Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2
  1Computer Science and Engineering, Arizona State University, AZ

      2The Biodesign Institute, Arizona State University, AZ

                     3GE Global Research, NY




              The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                         Tutorial Goals
• Understand the basic concepts in multi-task learning
• Understand different approaches to model task
  relatedness
• Get familiar with different types of multi-task
  learning techniques
• Introduce multi-task learning applications
• Introduce the multi-task learning package: MALSAR




2
            The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                    Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
  motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
    – Incomplete Multi-Source Fusion
    – Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions



3
              The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                    Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
  motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
    – Incomplete Multi-Source Fusion
    – Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions



4
              The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                                               Multiple Tasks
o Examination Scores Prediction1 (Argyriou et. al.’08)
            School 1 - Alverno High School

           Student     Birth     Previous      …       School    …                         Exam
              id       year       score                ranking                             score
            72981      1985         95         …        83%      …                          ?


                         student-dependent         school-dependent
                                              School 138 - Jefferson Intermediate School

                                             Student     Birth    Previous     …     School        …            Exam
                                                id       year      score             ranking                    score
                                             31256       1986        87        …       72%         …              ?


                                                         student-dependent       school-dependent
            School 139 - Rosemead High School

           Student     Birth     Previous      …       School    …                         Exam
              id       year       score                ranking                             score
            12381      1986         83         …        77%      …                          ?



                        student-dependent        school-dependent
    1The   Inner London Education Authority (ILEA)
5
                                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                              Learning Multiple Tasks
o Learning each task independently
         School 1 - Alverno High School
        Student      Birth      Previous      School          …                   Exam               task
           id        year        score        ranking                             Score               1st
         72981       1985           95          83%           …                     ?
                                                                                                              Excellent



                                                     School 138 - Jefferson Intermediate School
                                                  Student       Birth      Previous       School      …           Exam
                           task
                                                     id         year        score         ranking                 Score
                          138th
                                                  31256           1986        87           72%        …             ?
     Excellent




    School 139 - Rosemead High School
     Student      Birth      Previous      School         …                Exam
        id        year        score        ranking                         Score                     task
                                                                                                    139th
      12381       1986         83           77%           …                   ?
                                                                                                                Excellent
6
                               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                          Learning Multiple Tasks
o Leaning multiple tasks simultaneously
    School 1 - Alverno High School
     Student      Birth     Previous      School       …                    Exam                    task
        id        year       score        ranking                           Score                    1st
      72981       1985         95           83%        …                         ?



                                      School 138 - Jefferson Intermediate School
                 task                  Student      Birth       Previous    School       …                  Exam
                                          id        year         score      ranking                         Score
                138th
                                         31256      1986          87             72%     …                   ?



     School 139 - Rosemead High School
      Student      Birth      Previous      School          …              Exam                      task
         id        year        score        ranking                        Score                    139th
       12381       1986          83           77%           …                ?




      Learn tasks simultaneously
      Model the tasks relationship
7
                           The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                                   Performance of MTL
o Evaluation on the School data:
          •        Predict exam scores for 15362 students from 139 schools
          •        Describe each student by 27 attributes
          •        Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task
                   learning approach (trace-norm regularized learning)
        1.05
                                                                       Ridge Regression
                                                                       Lasso
          1                                                            Trace Norm


        0.95


         0.9
N-MSE




        0.85


         0.8


        0.75                                                                                        Performance measure:
                                                                                                               mean squared error
         0.7                                                                                         N−MSE =
               1         2         3         4          5             6          7           8                  variance (target)
8                                       Index of Training Ratio
                                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                 Multi-Task Learning
• Multi-task Learning is                              Single Task Learning

                                                                                          Trained
                                                      Task 1   Training Data   Training             Generalization

  different from single task                                                              Model




  learning in the training                            Task 2   Training Data   Training
                                                                                          Trained
                                                                                          Model
                                                                                                    Generalization




                                                                  ...



                                                                                           ...
  (induction) process.
                                                                                          Trained
                                                      Task t   Training Data   Training             Generalization
                                                                                          Model



                                                       Multi-Task Learning
• Inductions of multiple                              Task 1   Training Data
                                                                                          Trained
                                                                                                    Generalization
                                                                                          Model

  tasks are performed
                                                                                          Trained

  simultaneously to capture
                                                      Task 2   Training Data                        Generalization
                                                                                          Model
                                                                               Training




                                                                  ...



                                                                                           ...
  intrinsic relatedness.                                                                  Trained
                                                      Task t   Training Data                        Generalization
                                                                                          Model




9
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



       Learning Methods

                                       o
                                              –
                                              –
                                              –

                                       o
                                              –
                                              –
                                              –

                                       o
                                              –
                                              –




The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



          Web Pages Categorization
                                  Chen et. al. 2009 ICML

• Classify documents into                              Category
                                                       Classifiers
  categories                                                Health
                                                                                         Classifiers’ Parameters
                                                                                               Models of different
                                                                                                  categories are
• The classification of each                                Travel
                                                                                                 latently related


  category is a task




                                                           ...
                                                            World
• The tasks of predicting                                  Politics
  different categories may
                                                              US
  be latently related




11
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



        MTL for HIV Therapy Screening
                                    Bickel et. al. ICML 08
• Hundreds of possible combinations of drugs, some of which use
  similar biochemical mechanisms
• The samples available for each combination are limited.
• For a patient, the prediction of using one combination is a task
• Use the similarity information by multiple task learning
                             Drug               Patient Treatment Record


                             ETV

                             ABC                                              ETV

                             AZT

                             ...
                                                                              AZT

                                                                              FTC
                                                                                       ?
                             D4T

                             DDI

                             FTC

12
              The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                   Other Applications
• Portfolio selection [Ghosn and Bengio, NIPS’97]

• Collaborative ordinal regression [Yu et. al. NIPS’06]

• Web image and video search [Wang et. al. CVPR’09]

• Disease progression modeling [Zhou et. al. KDD’11]

• Disease prediction [Zhang et. al. NeuroImage 12]

13
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                     Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
  motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
     – Incomplete Multi-Source Fusion
     – Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions



14
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                  How Tasks Are Related

                                                       Methods
                                                       • Mean-regularized MTL
                                                       • Joint feature learning
                                                       • Trace-Norm regularized
                                                         MTL
     Assumption:
                                                       • Alternating structural
       All tasks are related
                                                         optimization (ASO)
                                                       • Shared Parameter
                                                         Gaussian Process
15
                  The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                 How Tasks Are Related

                                                       • Assume all tasks are
                                                         related may be too
                                                         strong for practical
                                                         applications.
                                                       • There are some
     Assumption:                                         irrelevant (outlier)
       There are outlier tasks
                                                         tasks.



16
                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                           How Tasks Are Related
                                                                    Methods
                                                                    • Clustered MTL
                                                                    • Tree MTL
                                                                    • Network MTL
         Assumption:
           Tasks have group structures
                                                              Assumption:
                                                                Tasks have tree structures




     Assumption:
       Tasks have graph/network structures
                                                                a      b                            c
                                                                                                                 Models
17
                           The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



           Multi-Task Learning Methods
• Regularization-based MTL
     – All tasks are related
        • regularized MTL, joint feature learning, low rank MTL, ASO
     – Learning with outlier tasks: robust MTL
     – Tasks form groups/graphs/trees
        • clustered MTL, network MTL, tree MTL


• Other Methods
     – Shared Hidden Nodes in Neural Network
     – Shared Parameter Gaussian Process


18
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Regularization-based Multi-Task Learning
• All tasks are related
     – Mean-Regularized MTL
     – MTL in high dimensional feature space
        • Embedded Feature Selection
        • Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure




19
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                                                                        Notation
                                     Dimension d                                                                                                Task m

                                                                         Task m                                  Task m
             Sample n2




                                                                                                  Sample n2
 Sample nt




                                                                                      Sample nt




                                                                                                                                                                 Dimension d
...




                                                                                     ...
                                                                                                                           Learning
                         Sample n1




                                                                                                              Sample n1
                                     Feature Matrices Xi                            Target Vectors Yi                                     Model Matrix W



             • We focus on linear models: Yi = 𝑋 𝑖 × 𝑊𝑖
               𝑋 𝑖 ∈ ℝ 𝑛 𝑖 ×𝑑 , 𝑌𝑖 ∈ ℝ 𝑛 𝑖 ×1 , 𝑊 = [𝑊1 , 𝑊2 , … , 𝑊 𝑚 ]
        20
                                                   The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



      Mean-Regularized Multi-Task Learning
                                        Evgeniou & Pontil, 2004 KDD

 • Assumption: task parameter vectors of all tasks are
   close to each other.
        – Advantage: simple, intuitive, easy to implement
        – Disadvantage: may not hold in real applications.

  Regularization
  penalizes the deviation of each task
  from the mean
                                                                                                         Task
                               𝑚                   𝑚         2
    1             2
                                             1
                                                                                                mean


min      𝑋𝑊 − 𝑌    𝐹   + 𝜆            𝑊𝑖 −              𝑊𝑠
 𝑊 2                                          𝑚
                             𝑖=1                  𝑠=1        2




 21
                       The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Regularization-based Multi-Task Learning
• All tasks are related
     – Mean-Regularized MTL
     – MTL in high dimensional feature space
        • Embedded Feature Selection
        • Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure




22
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Multi-Task Learning with High Dimensional Data
• In practical applications, we may deal with high
  dimensional data.
     – Gene expression data, biomedical image data
• Curse of Dimensionality
• Dealing with high dimensional data in multi-task
  learning
     – Embedded feature selection: L1/Lq - Group Lasso
     – Low-rank subspace learning: low-rank assumption – ASO,
       Trace-norm regularization



23
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Regularization-based Multi-Task Learning
• All tasks are related
     – Mean-Regularized MTL
     – MTL in high dimensional feature space
        • Embedded Feature Selection
        • Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure




24
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Multi-Task Learning with Joint Feature
                   Learning
• One way to capture the task                                                T ask
                                                                                   1
                                                                                       T ask
                                                                                             2
                                                                                                 T ask
                                                                                                       3
                                                                                                           T ask
                                                                                                                 4
                                                                                                                     T ask
                                                                                                                           5


  relatedness from multiple                                  Feature 1

  related tasks is to constrain                              Feature 2

                                                             Feature 3
  all models to share a common
                                                             Feature 4
  set of features.                                           Feature 5

• For example, in school data,                               Feature 6


  the scores from different                                  Feature 7

                                                             Feature 8
  schools may be determined                                  Feature 9
  by a similar set of features.                             Feature 10




25
            The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Multi-Task Learning with Joint Feature Learning
                              Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report

• Using group sparsity: ℓ1 /ℓ 𝑞 -norm regularization                                                                         𝑑
• When q>1 we have group sparsity.                                                                    𝑊       1,𝑞   =             𝒘𝑖   𝑞
                                                                                                                            𝑖=1
                       Y                                     X                                         W
                                                          Sample 1

                                                          Sample 2

                                                          Sample 3

                              ...        ≈                                       ×                            ...
                                                              ...
                                                          Sample n-2




                                                                                                     ...
                                                          Sample n-1

                                                          Sample n

        k1      k2       k3         km                                                  k1      k2       k3            km
     Tas     Tas      Tas        Tas                                                 Tas     Tas      Tas           Tas

                     Output                                 Input                                    Model
                 n×m                                        n×d                                 d×m
                                                1                      2
                                         min          𝑋𝑊 − 𝑌            𝐹   + 𝜆 𝑊      1,𝑞
                                           𝑊    2
26
                                  The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Writer-Specific Character Recognition
                          Obozinski, Taskar, and Jordan, 2006

• Each task is a classification between two letters for
  one writer.




27
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Dirty Model for Multi-Task Learning
                                   Jalali et. al. 2010 NIPS

• In practical applications, it is too restrictive to
  constrain all tasks to share a single shared structure.



                                    =                              +




                                            Group Sparse                       Sparse
                   Model
                                             Component                       Component
                     W                               P                            Q
                                                2
                min 𝑌 − 𝑋 𝑃 + 𝑄                  𝐹   + 𝜆1 𝑃     1,𝑞    + 𝜆2 𝑄     1
                  𝑃,𝑄
28
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                 Robust Multi-Task Learning
o Most Existing MTL Approaches                               o Robust MTL Approaches


       all tasks are relevant                                                                   relevant tasks

                                                         irrelevant task




                                                                                                irrelevant task

       Assumption:                                                      Assumption:
         All tasks are related                                            There are outlier tasks


  29
                       The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Robust Multi-Task Feature Learning
                                Gong et. al. 2012 Submitted
• Simultaneously captures a common set of features
  among relevant tasks and identifies outlier tasks.   Joint Selected
                                                          Features       Outlier Tasks




                            =                                 +




                                   Group Sparse                         Group Sparse
            Model
                                    Component                            Component
             W                            P                                  Q
                                                                                    2
                                               min 𝑌 − 𝑋 𝑃 + 𝑄                       𝐹   + 𝜆1 𝑃   1,𝑞   + 𝜆2 𝑄 𝑇   1,𝑞
30                                               𝑃,𝑄
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Regularization-based Multi-Task Learning
• All tasks are related
     – Mean-Regularized MTL
     – MTL in high dimensional feature space
        • Embedded Feature Selection
        • Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure




31
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



              Trace-Norm Regularized MTL
o Capture task relatedness via a shared low-rank structure

     task 1   training                                            trained                    generalization
                data                                               model



     task 2   training                                            trained                    generalization
                data                       training                model
               …




                                                                                                …
                                                                   …
     task n   training                                            trained                    generalization
                data                                               model




                           A shared low-rank subspace


32
                    The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



            Low-Rank Structure for MTL
• Assume we have a rank 2 model matrix:
          training data     weight vector       target                           basis vector   basis vector


 Task 1
                            ×             ≈                                  =      𝛼1      + 𝛼2
               …




 Task 2
                            ×             ≈                                 =       𝛽1      + 𝛽1
               …




 Task 3                     ×             ≈                                  =      𝛾1      + 𝛾2
               …




                   The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



          Low-Rank Structure for MTL

                                                                              𝛼1     𝛼2   𝑇
                               =                                 ×            𝛽1     𝛽2
                                                                              𝛾1     𝛾2



     A low rank structure                  Basis vectors                   Coefficients



                                                                         𝜏 11      ⋯      𝜏 1𝑚
                                                                 ×         ⋮       ⋱        ⋮
             …                =
                                                   …                     𝜏 𝑝1      ⋯      𝜏 𝑝𝑚


                                                                                   𝑚 > 𝑝
         m tasks                        p basis vectors
34
                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



              Low-Rank Structure for MTL
                                       Ji et. al. 2009 ICML

• Rank minimization formulation
     – min Loss(𝑊) + 𝜆 × Rank(𝑊)
         𝑊
     – Rank minimization is NP-Hard for general loss functions


• Convex relaxation: trace norm minimization
     – min Loss(𝑊) + 𝜆 ×                     𝑊    ∗            𝑊   ∗   : sum of singular values of W
          𝑊
     – The trace norm is theoretically shown to be a good
       approximation for rank function (Fazel et al., 2001).




35
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                              Low-Rank Structure for MTL
o Evaluation on the School data:
             •       Predict exam scores for 15362 students from 139 schools
             •       Describe each student by 27 attributes
             •       Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)

             1.05
                                                               Ridge Regression
                                                               Lasso                   Performance measure:
                 1                                             Trace Norm
                                                                                                  mean squared error
             0.95
                                                                                       N−MSE =
                                                                                                   variance (target)
              0.9
     N-MSE




                                                                                         The Low-Rank Structure
             0.85
                                                                                         (induced via Trace Norm)
              0.8                                                                        leads to the smallest N-MSE.

             0.75


              0.7
                     1    2      3        4          5         6       7          8
                                     Index of Training Ratio




36
                                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



  Alternating Structure Optimization (ASO)
                                    Ando and Zhang, 2005 JMRL

• ASO assumes that the model is the sum of two components: a
  task specific one and a shared low dimensional subspace.

         Task 1      Input         X   w1   +   Θ   X   v1
                                                             ≈




         Task 2      Input         X   w2   +   Θ   X   v2                          Θ
                                                             ≈
                                       ...


                                                                                   Low-
                                                                               Dimensional
                                                                               Feature Map


         Task m      Input         X   wm +     Θ   X   vm
                                                             ≈

 37
                  The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



 Alternating Structure Optimization (ASO)
                                       Ando and Zhang, 2005 JMRL
o Learning from the i-th task ui =Өvi + wi
                             Xi                                                 yi



                          Input             X   wi   +   Θ    X    vi
                                                                          ≈

                                                         Low-
                                                     Dimensional
                                                     Feature Map

o Empirical loss function for i-th task
                                                                                       2
                    ℒi Xi Өvi + wi , yi =                 Xi Өvi + wi − yi

o ASO simultaneously learns models and the shared structure:
                                       𝑚
                          min               ℒi Xi Өvi + wi , yi + 𝛼 wi                 2
                        Ө,{vi ,wi }
                                      𝑖=1
       subject to       ӨT Ө      =I
38
                    The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                                    iASO Formulation
                                                 Chen et al., 2009 ICML


o iASO formulation
                              𝑚
                min               ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi                      2
                                                                                            + 𝛽 wi   2
              Ө,{vi ,wi }
                            𝑖=1
         subject to           ӨT Ө = I


     •    control both model complexity and task relatedness
     •    subsume ASO (Ando et al.’05) as a special case
     •    naturally lead to a convex relaxation (Chen et al., 09, ICML)
     •    Convex relaxed ASO is equivalent to iASO under certain
          mild conditions

39
                            The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics


     Incoherent Low-Rank and Sparse Structures
                                          Chen et. al. 2010 KDD

• ASO uses L2-norm on task-specific component, we
  can also use L1-norm to learn task-specific features.
                    Task-specific features                            Capture task relatedness




               =                                +                             =       basis   X      coefficient




                          Element-wise Sparse                  Low-Rank
     Model                    Component                       Component
      W                           Q                              P
                      𝑚
             min           ℒi Xi Pi + Q i , yi + ߣ Q             1
             P,Q
                    𝑖=1                                                   Convex formulation
40                        subject to            P   ∗   ⩽ 𝜂
                   The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



               Robust Low-Rank in MTL
                                      Chen et. al. 2011 KDD

• Simultaneously perform low-rank MTL and identify
  outlier tasks.
               Identify irrelevant tasks                        Capture task relatedness




         =                               +                              =         basis
                                                                                          X    coefficient




                                     Outlier
                     Group Sparse    Tasks             Low-Rank
 Model                                                Component
                      Component
     W                    Q                                P
                 𝑚

         mini         ℒ i X i 𝑃 𝑖 + 𝑄 𝑖 , yi + 𝛼 P          ∗   + 𝛽 QT      1,𝑞
         P,Q
41              𝑖=1
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                                            Summary
 • All multi-task learning formulations discussed above
   can fit into the W=P+Q schema.
        – Component P: shared structure
        – Component Q: information not captured by the shared
          structure
Embedded Feature
Selection           Shared Structure P                              Component Q
L1/Lq               Feature Selection (L1/Lq Norm)                  0
Dirty               Feature Selection (L1/Lq Norm)                  L1-norm
rMTFL               Feature Selection (L1/Lq Norm)                  Outlier (column-wise L1/Lq Norm)
Low-Rank Subspace
Learning
Trace Norm          Low-Rank (Trace Norm)                           0
ISLR                Low-Rank (Trace Norm)                           L1-norm
ASO                 Low-Rank (Shared Subspace)                      L2-norm on independent comp.
RMTL                Low-Rank (Trace Norm)                           Outlier (column-wise L1/Lq Norm)
 42
                     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Regularization-based Multi-Task Learning
• All tasks are related
     – Mean-Regularized MTL
     – MTL in high dimensional feature space
        • Embedded Feature Selection
        • Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure




43
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



         Multi-Task Learning with Clustered
                     Structures
• Most MTL techniques assume
  all tasks are related
• Not true in many applications
• Clustered multi-task learning
  assumes
       the tasks have a group                                             Assumption:
                                                                             Tasks have group structures

        structure
       the models of tasks from the                             e.g. tasks in the yellow group are
                                                                 predictions of heart related
        same group are closer to each                            diseases and in the blue group are
        other than those from a                                  brain related diseases.
        different group

 44
                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



          Clustered Multi-Task Learning
                             Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS

• Use regularization to capture clustered structures.

                                                                             Clustered Models



     Training Data       X        ≈                                                           ...
                ...



                                                             Cluster 1       Cluster 2            Cluster k-1 Cluster k
                                                                                               Cluster 2

                                                               Cluster 1

     Training Data       X        ≈
                                                                                                          Cluster k-1

                                                                                              Cluster k

45
                     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



            Clustered Multi-Task Learning
                                              Zhou et. al. 2011 NIPS

• Capture structures by minimizing sum-
                                                                                                      m tasks
  of-square error (SSE) in K-means
  clustering:
                    𝑘                                                                                        ...
                                                       2
          min                         𝑤 𝑣 − 𝑤𝑗
            𝐼                                          2
                    𝑗=1 𝑣∈𝐼 𝑗
                                                                                Cluster 1        Cluster 2        Cluster k-1 Cluster k
                Ij index set of jth cluster                                 Clustered Models                    Cluster 2

                                                                                 Cluster 1




                                                                                                                         Cluster k-1

                                                                                                             Cluster k

     min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
      𝐹                                                                               task number m > cluster number k


     𝐹 : m×k orthogonal cluster indicator matrix
     𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
46
                        The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



            Clustered Multi-Task Learning
                                          Zhou et. al. 2011 NIPS

• Directly minimizing SSE is hard                                                                 m tasks

  because of the non-linear
  constraint on F:
                                                                                                         ...
     min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
       𝐹
      𝐹 : m×k orthogonal cluster indicator matrix                           Cluster 1        Cluster 2        Cluster k-1 Cluster k
      𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise                           Clustered Models                    Cluster 2

                                                                             Cluster 1




                                                                                                                     Cluster k-1

                                                                                                         Cluster k



      min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
       𝑇
                                                                                  task number m > cluster number k

     𝐹:𝐹 𝐹=𝐼 𝑘
                                Zha et. al. 2001 NIPS
47
                    The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



              Clustered Multi-Task Learning
                                        Zhou et. al. 2011 NIPS

• Clustered multi-task learning (CMTL) formulation
     min
       𝑇
              Loss W + 𝛼 tr 𝑊 𝑇 𝑊 − tr 𝐹 𝑇 𝑊 𝑇 𝑊𝐹                                          + 𝛽 tr 𝑊 𝑇 𝑊
𝑊,𝐹:𝐹 𝐹=𝐼 𝑘
                                         capture cluster structures                            Improves
                                                                   Cluster 2
                                                                                             generalization
                                                                                             performance
                                      Cluster 1




                                                                             Cluster k-1

                                                                 Cluster k




• CMTL has been shown to be equivalent to ASO
      – Given the dimension of the shared low-rank subspace in
        ASO and the cluster number in clustered multi-task
        learning (CMTL) are the same.
48
                  The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Convex Clustered Multi-Task Learning
                                     Zhou et. al. 2011 NIPS




          Ground Truth                        Mean Regularized MTL
                                                                                   noise introduced
                                                                                    by relaxations
              Low rank can also
                 well capture
               cluster structure



       Trace Norm Regularized                  Convex Relaxed CMTL
                MTL
49
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Regularization-based Multi-Task Learning
• All tasks are related
     – Mean-Regularized MTL
     – MTL in high dimensional feature space
        • Embedded Feature Selection
        • Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure




50
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



 Multi-Task Learning with Tree Structures
• In some applications,
  the tasks may be
  equipped with a tree                            Assumption:
                                                    Tasks have tree structures

  structure:
     – The tasks belonging to
       the same node are
       similar to each other
     – The similarity between
       two nodes is related to                      a      b                             c

       the depth of the                                        Task a is more similar with b,
                                                                                                        Models

       ‘common’ tree node                                              compared to c



51
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



 Multi-Task Learning with Tree Structures
                                                     Kim and Xing 2010 ICML

• Tree-Guided Group Lasso
                                                                                                           𝑑
                                                                                                                           𝑗
                                                                          min Loss 𝑊 + 𝜆                            𝑤𝑣   𝑊𝐺 𝑣
                         Structure                                           𝑊                                                  2
                                                                                                          𝑗=1 𝑣∈𝑉




                      W1 W2 W3
                                                                   Gv5={W1, W2, W3}
     Input Features




                                                           Gv4={W1, W2}


                                                     Gv1={W1}            Gv2={W2}            Gv3={W3}

                      Output (Tasks)
52
                                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Multi-Task Learning with Graph Structures
• In real applications, tasks
  involved in MTL may have
  graph structures
     – The two tasks are related if they
       are connected in a graph, i.e. the
       connected tasks are similar
     – The similarity of two related
                                                                              Assumption:
       tasks can be represented by the                                          Tasks have graph/network structures

       weight of the connecting edge.




53
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Multi-Task Learning with Graph Structures
• A simple way to encode graph structure is to
  penalize the difference of two tasks that have an
  edge between them
• Given a set of edges E, we thus penalize:
                      𝐸
                                                             2
                                                                                      2            𝐸 ×𝑚
                               𝑊𝑒{𝑖,1} − 𝑊𝑒{𝑖,2}                 =         𝑊𝑅 𝑇        𝐹   𝑅∈ℝ
                                                             2
                   𝑖=1
• The graph regularization term can also be
  represented in the form of Laplacian term
            2
     𝑊𝑅 𝑇    𝐹   = 𝑡𝑟          𝑊𝑅 𝑇        𝑇
                                               𝑊𝑅 𝑇 = 𝑡𝑟 𝑊𝑅 𝑇 𝑅𝑊 𝑇 = 𝑡𝑟(𝑊ℒ𝑊 𝑇 )


54
                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Multi-Task Learning with Graph Structures
• How to obtain graph information
     – External domain knowledge
        • protein-protein interaction (PPI) for microarray
     – Discover task relatedness from data
        • Pairwise correlation coefficient
        • Sparse inverse covariance (Friedman et. al. 2008 Biostatistics)




55
                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Multi-Task Learning with Graph Structures
                     Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics

     • Graph-guided Fused Lasso
                    Gene
         Input
                    SNP       ACGTTTTACTGTACAATTTAC

                                                                                         Lasso
         Output   phenotype




                    Gene
         Input
                    SNP       ACGTTTTACTGTACAATTTAC
                                                                                       Graph-Guided
                                                                                        Fused Lasso
         Output   phenotype




         min Loss 𝑊 + 𝜆             𝑊   1   + Ω(𝑊)          Graph-guided Fusion Penalty
          𝑊                                                                        𝐽

                                                  Ω 𝑊 = 𝛾                                𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙
56                                                                  𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1
                    The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



Multi-Task Learning with Graph Structures
                                          Kim et. al. 2009 Bioinformatics

• In some applications, we know not only which pairs
  are related, but also how they are related.
• Graph-Weighted Fused Lasso.
               Gene
Input
               SNP        ACGTTTTACTGTACAATTTAC
                                                                                                  Graph-Weighted
                                                                                                    Fused Lasso
Output       phenotype




                                                                      𝐽

     min Loss 𝑊 + 𝜆            𝑊   1   + 𝛾                𝜏(𝑟 𝑚𝑙 )          𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙
         𝑊
                                             𝑒= 𝑚,𝑙 ∈𝐸               𝑗=1

                                                             Added weight information!
57
                         The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                      Practical Guideline
• MTL versus STL
     – MTL is preferred when dealing with multiple related tasks
       with small number of training samples


• Shared features versus shared subspace
     – Identifying shared features is preferred When the data
       dimensionality is large
     – Identifying a shared subspace is preferred when the
       number of tasks is large




58
                The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                     Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
  motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
     – Incomplete Multi-Source Fusion
     – Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions



59
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics


     Case Study I: Incomplete Multi-Source
                   Data Fusion
                              Yuan et. al. 2012 NeuroImage
• In many applications,                                              PET                            MRI                     CSF
                                                             P1 P2 P3 … P114 P115 P116   M1 M2 M3 M4 … M303 M304 M305   C1 C2 C3 C4 C5

  multiple data sources                          Subject1




                                                  ...




                                                                                                    ...
                                                                      ...
  may contain a                                  Subject60

                                                 Subject61

  considerable amount of                         Subject62




                                                  ...




                                                                                                     ...
                                                                      ...




                                                                                                                             ...
  missing data.                                 Subject139
                                                Subject140
                                                Subject141

• In ADNI, over half of the
                                                  ...




                                                                                                     ...



                                                                                                                             ...
                                                Subject148


  subjects lack CSF                             Subject149

                                                  ...




                                                                                                     ...
  measurements; an                              Subject245




  independent half of the
  subjects do not have
  FDG-PET.
60
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                          Overview of iMSF
                                   Yuan et. al. 2012 NeuroImage

 PET    MRI       CSF

       Task I                                     PET                      MRI                    CSF

                                                                                                        Model I
       Task II                                                                                          Model II
                                                                                                        Model III
                                                                                                        Model IV
       Task III


       Task IV
                                              𝑚                                   𝑑
                                     1
                                 min              𝐿𝑜𝑠𝑠 𝑋 𝑖 , 𝑌𝑖 , 𝑊𝑖 + 𝜆                   𝑊𝐺 𝑘
                                  𝑊 𝑚                                                             2
                                            𝑖=1                                 𝑘=1




61
                  The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                                                  iMSF: Performance
                                                           Yuan et. al. 2012 NeuroImage
                                0.84

                                0.83
                                                                                                                                           iMSF
                     Accuracy




                                0.82                                                                                                       Zero
                                                                                                                                           EM
                                0.81
                                                                                                                                           KNN
                                 0.8                                                                                                       SVD

                                0.79
                                           50.0%                              66.7%                                 75.0%

               0.5                                                                                1
              0.45                                                                              0.9
               0.4                                                                              0.8
              0.35                                                      iMSF                    0.7                                               iMSF
Sensitivity




                                                                                  Specificity
               0.3                                                      Zero                    0.6                                               Zero
              0.25                                                                              0.5
                                                                        EM                                                                        EM
               0.2                                                                              0.4
              0.15                                                      KNN                     0.3                                               KNN
               0.1                                                      SVD                     0.2                                               SVD
              0.05                                                                              0.1
                 0                                                                                0
                                  50.0%   66.7%         75.0%                                           50.0%        66.7%         75.0%




   62
                                          The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



 Case Study II: Drosophila Gene Expression
               Image Analysis
• Drosophila (fruit fly) is a favorite model system for geneticists and
  developmental biologists studying embryogenesis.
   – The small size and short generation time make it ideal for genetic studies.
• In situ hybridization allows us to generate images showing when and where
  individual genes were active.
   – The analysis of such images can potentially reveal gene functions and
       gene-gene interactions.




                   The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



     Drosophila gene expression pattern images
    bgm



    en


              Stage 1-3         Stage 4-6         Stage 7-8         Stage 9-10     Stage 11-12       Stage 13-
                                Berkeley Drosophila Genome Project (BDGP)
Expressions                                         http://www.fruitfly.org/



    hb




    runt


                Stage 1-3             Stage 4-5               Stage 6-7          Stage 8-9           Stage 10-
                                                           Fly-FISH
                                              http://fly-fish.ccbr.utoronto.ca/
                                                 [Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]
                            The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                       Comparative image analysis
                   Twist                                   heartless                                stumps



stage 4-6



            anterior endoderm AISN                dorsal ectoderm AISN                        anterior endoderm AISN
            trunk mesoderm AISN                   procephalic ectoderm AISN                   trunk mesoderm AISN
            subset                                subset                                      head mesoderm AISN
            cellular blastoderm                   cellular blastoderm
            mesoderm AISN                         mesoderm AISN



stage 7-8


            trunk mesoderm PR                       trunk mesoderm PR                        yolk nuclei
            head mesoderm PR                        head mesoderm PR                         trunk mesoderm PR
            anterior endoderm anlage                                                         head mesoderm PR
                                                                                             anterior endoderm anlage

   We need the spatial and temporal annotations of expressions
                                  [Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]
                           The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                     Challenges of manual annotation
                   180000
                   160000
                                    Cumulative number of in situ images
                   140000
Number of images


                   120000
                   100000
                   80000
                   60000
                   40000
                   20000
                       0
                            1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010




                             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics


                            Method outline
   Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS


                               Feature extraction                           Model construction




                                                                                   Low-rank
                                                                                   multi-task
Images                            Sparse coding


                                                                                 Graph-based
                                                                                  multi-task




               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics


Low rank multi-task learning model
                     Ji et. al. 2009 BMC Bioinformatics


                                                                      W
  X                                 Y




               k     n
                                                                                 Low rank

              L( wiT x j , Y ji )   W
              i 1 j 1
                                                                *


                                     trace norm = sum of singular values

            Loss term
      The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics


            Graph-based multi-task learning model
                                                   Ji et. al. 2009 SIGKDD

                                                                 sensory system
                                                                      head
                                                  0.56
                             ventral sensory
                                complex
                                                                                  0.31
                                                       0.57
                                0.79
             0.50                                                             embryonic antennal
                                                         0.35
                                                                                 sense organ
                              dorsal/lateral
                            sensory complexes
                                                                                          0.60
                                       0.36
                                                                              embryonic maxillary
                                                                               sensory complex
                             sensory nervous
                                 system



              loss                     2-norm
k     n

 L(w                                                             g (C
                                                                                                                          2
                        x j , Y ji )  1 W               2                             )  wp  sgn( C pq ) wq
                    T                              2
                    i                              F                                 pq
i 1 j 1                                                       ( p , q )G


                                                Closed-form solution

                             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                 Spatial annotation performance
      85
      83
      81
      79
                                                                                                    SC+LR
      77
AUC




      75                                                                                            SC+Graph
      73                                                                                            BoW+Graph
      71                                                                                            Kernel+SVM
      69
      67
      65
            Stages 4-6      Stages 7-8       Stages 9-10       Stages 11-12       Stages 13-16



•     50% data for training and 50% for testing and 30 random trials are generated
•     Multi-task approaches outperform single-task approaches

                         The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                     Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
  motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
     – Incomplete Multi-Source Fusion
     – Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions



71
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics




• A multi-task learning package
• Encode task relationship via structural regularization
• www.public.asu.edu/~jye02/Software/MALSAR/


72
             The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



            MTL Algorithms in MALSAR 1.0
     • Mean-Regularized Multi-Task Learning
     • MTL with Embedded Feature Selection
        – Joint Feature Learning
        – Dirty Multi-Task Learning
        – Robust Multi-Task Feature Learning
     • MTL with Low-Rank Subspace Learning
        –   Trace Norm Regularized Learning
        –   Alternating Structure Optimization
        –   Incoherent Sparse and Low Rank Learning
        –   Robust Low-Rank Multi-Task Learning
     • Clustered Multi-Task Learning
     • Graph Regularized Multi-Task Learning
73
                 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                     Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
  motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
     – Incomplete Multi-Source Fusion
     – Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions



74
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



          Trends in Multi-Task Learning
• Develop efficient algorithms for large-scale multi-
  task learning.

• Semi-supervised and unsupervised MTL

• Learn task structures automatically in MTL

• Asymmetric MTL

• Cross-Domain MTL
     – The features may be different                                           The relationship is not mutual


75
               The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                 Acknowledgement

 • National Science Foundation
 • National Institute of Health




76
            The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




77
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




78
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




79
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




80
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




81
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




82
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics



                        Reference




83
     The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Contenu connexe

En vedette (14)

Programa_Digital_Marketing_Lisboa_weekend
Programa_Digital_Marketing_Lisboa_weekendPrograma_Digital_Marketing_Lisboa_weekend
Programa_Digital_Marketing_Lisboa_weekend
 
2015 Resume Project Cordinator
2015 Resume Project Cordinator2015 Resume Project Cordinator
2015 Resume Project Cordinator
 
Evidential pluralism outside the comfort zone of philosophy
Evidential pluralism outside the comfort zone of philosophyEvidential pluralism outside the comfort zone of philosophy
Evidential pluralism outside the comfort zone of philosophy
 
3 bonnes raisons de choisir la chirurgie ambulatoire
3 bonnes raisons de choisir la chirurgie ambulatoire3 bonnes raisons de choisir la chirurgie ambulatoire
3 bonnes raisons de choisir la chirurgie ambulatoire
 
Mapa gerencia de proyecto
Mapa gerencia de proyectoMapa gerencia de proyecto
Mapa gerencia de proyecto
 
Mcl 12 flyer
Mcl 12 flyerMcl 12 flyer
Mcl 12 flyer
 
Presentacion simplicio
Presentacion simplicioPresentacion simplicio
Presentacion simplicio
 
Advento
AdventoAdvento
Advento
 
Conseils en cas de vagues de très grand froid
Conseils en cas de vagues de très grand froidConseils en cas de vagues de très grand froid
Conseils en cas de vagues de très grand froid
 
Info biomark
Info biomarkInfo biomark
Info biomark
 
Moovit Growth Case Study / Yovav Meydad
Moovit Growth Case Study / Yovav MeydadMoovit Growth Case Study / Yovav Meydad
Moovit Growth Case Study / Yovav Meydad
 
Primera Guerra Mundial
Primera Guerra MundialPrimera Guerra Mundial
Primera Guerra Mundial
 
Computer ScienceTechnical quiz
Computer ScienceTechnical quizComputer ScienceTechnical quiz
Computer ScienceTechnical quiz
 
Natural Capital and Environmental Decision Making
Natural Capital and Environmental Decision MakingNatural Capital and Environmental Decision Making
Natural Capital and Environmental Decision Making
 

Tutorial SIAM Data Mining 2012

  • 1. Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2 1Computer Science and Engineering, Arizona State University, AZ 2The Biodesign Institute, Arizona State University, AZ 3GE Global Research, NY The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 2. Center for Evolutionary Medicine and Informatics Tutorial Goals • Understand the basic concepts in multi-task learning • Understand different approaches to model task relatedness • Get familiar with different types of multi-task learning techniques • Introduce multi-task learning applications • Introduce the multi-task learning package: MALSAR 2 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 3. Center for Evolutionary Medicine and Informatics Tutorial Road Map • Part I: Multi-task Learning (MTL) background and motivations • Part II: MTL formulations • Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis • Part IV: An MTL Package (MALSAR) • Current and future directions 3 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 4. Center for Evolutionary Medicine and Informatics Tutorial Road Map • Part I: Multi-task Learning (MTL) background and motivations • Part II: MTL formulations • Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis • Part IV: An MTL Package (MALSAR) • Current and future directions 4 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 5. Center for Evolutionary Medicine and Informatics Multiple Tasks o Examination Scores Prediction1 (Argyriou et. al.’08) School 1 - Alverno High School Student Birth Previous … School … Exam id year score ranking score 72981 1985 95 … 83% … ? student-dependent school-dependent School 138 - Jefferson Intermediate School Student Birth Previous … School … Exam id year score ranking score 31256 1986 87 … 72% … ? student-dependent school-dependent School 139 - Rosemead High School Student Birth Previous … School … Exam id year score ranking score 12381 1986 83 … 77% … ? student-dependent school-dependent 1The Inner London Education Authority (ILEA) 5 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 6. Center for Evolutionary Medicine and Informatics Learning Multiple Tasks o Learning each task independently School 1 - Alverno High School Student Birth Previous School … Exam task id year score ranking Score 1st 72981 1985 95 83% … ? Excellent School 138 - Jefferson Intermediate School Student Birth Previous School … Exam task id year score ranking Score 138th 31256 1986 87 72% … ? Excellent School 139 - Rosemead High School Student Birth Previous School … Exam id year score ranking Score task 139th 12381 1986 83 77% … ? Excellent 6 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 7. Center for Evolutionary Medicine and Informatics Learning Multiple Tasks o Leaning multiple tasks simultaneously School 1 - Alverno High School Student Birth Previous School … Exam task id year score ranking Score 1st 72981 1985 95 83% … ? School 138 - Jefferson Intermediate School task Student Birth Previous School … Exam id year score ranking Score 138th 31256 1986 87 72% … ? School 139 - Rosemead High School Student Birth Previous School … Exam task id year score ranking Score 139th 12381 1986 83 77% … ? Learn tasks simultaneously Model the tasks relationship 7 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 8. Center for Evolutionary Medicine and Informatics Performance of MTL o Evaluation on the School data: • Predict exam scores for 15362 students from 139 schools • Describe each student by 27 attributes • Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task learning approach (trace-norm regularized learning) 1.05 Ridge Regression Lasso 1 Trace Norm 0.95 0.9 N-MSE 0.85 0.8 0.75 Performance measure: mean squared error 0.7 N−MSE = 1 2 3 4 5 6 7 8 variance (target) 8 Index of Training Ratio The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 9. Center for Evolutionary Medicine and Informatics Multi-Task Learning • Multi-task Learning is Single Task Learning Trained Task 1 Training Data Training Generalization different from single task Model learning in the training Task 2 Training Data Training Trained Model Generalization ... ... (induction) process. Trained Task t Training Data Training Generalization Model Multi-Task Learning • Inductions of multiple Task 1 Training Data Trained Generalization Model tasks are performed Trained simultaneously to capture Task 2 Training Data Generalization Model Training ... ... intrinsic relatedness. Trained Task t Training Data Generalization Model 9 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 10. Center for Evolutionary Medicine and Informatics Learning Methods o – – – o – – – o – – The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 11. Center for Evolutionary Medicine and Informatics Web Pages Categorization Chen et. al. 2009 ICML • Classify documents into Category Classifiers categories Health Classifiers’ Parameters Models of different categories are • The classification of each Travel latently related category is a task ... World • The tasks of predicting Politics different categories may US be latently related 11 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 12. Center for Evolutionary Medicine and Informatics MTL for HIV Therapy Screening Bickel et. al. ICML 08 • Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms • The samples available for each combination are limited. • For a patient, the prediction of using one combination is a task • Use the similarity information by multiple task learning Drug Patient Treatment Record ETV ABC ETV AZT ... AZT FTC ? D4T DDI FTC 12 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 13. Center for Evolutionary Medicine and Informatics Other Applications • Portfolio selection [Ghosn and Bengio, NIPS’97] • Collaborative ordinal regression [Yu et. al. NIPS’06] • Web image and video search [Wang et. al. CVPR’09] • Disease progression modeling [Zhou et. al. KDD’11] • Disease prediction [Zhang et. al. NeuroImage 12] 13 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 14. Center for Evolutionary Medicine and Informatics Tutorial Road Map • Part I: Multi-task Learning (MTL) background and motivations • Part II: MTL formulations • Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis • Part IV: An MTL Package (MALSAR) • Current and future directions 14 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 15. Center for Evolutionary Medicine and Informatics How Tasks Are Related Methods • Mean-regularized MTL • Joint feature learning • Trace-Norm regularized MTL Assumption: • Alternating structural All tasks are related optimization (ASO) • Shared Parameter Gaussian Process 15 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 16. Center for Evolutionary Medicine and Informatics How Tasks Are Related • Assume all tasks are related may be too strong for practical applications. • There are some Assumption: irrelevant (outlier) There are outlier tasks tasks. 16 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 17. Center for Evolutionary Medicine and Informatics How Tasks Are Related Methods • Clustered MTL • Tree MTL • Network MTL Assumption: Tasks have group structures Assumption: Tasks have tree structures Assumption: Tasks have graph/network structures a b c Models 17 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 18. Center for Evolutionary Medicine and Informatics Multi-Task Learning Methods • Regularization-based MTL – All tasks are related • regularized MTL, joint feature learning, low rank MTL, ASO – Learning with outlier tasks: robust MTL – Tasks form groups/graphs/trees • clustered MTL, network MTL, tree MTL • Other Methods – Shared Hidden Nodes in Neural Network – Shared Parameter Gaussian Process 18 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 19. Center for Evolutionary Medicine and Informatics Regularization-based Multi-Task Learning • All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning • Clustered MTL • MTL with Tree/Graph structure 19 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 20. Center for Evolutionary Medicine and Informatics Notation Dimension d Task m Task m Task m Sample n2 Sample n2 Sample nt Sample nt Dimension d ... ... Learning Sample n1 Sample n1 Feature Matrices Xi Target Vectors Yi Model Matrix W • We focus on linear models: Yi = 𝑋 𝑖 × 𝑊𝑖 𝑋 𝑖 ∈ ℝ 𝑛 𝑖 ×𝑑 , 𝑌𝑖 ∈ ℝ 𝑛 𝑖 ×1 , 𝑊 = [𝑊1 , 𝑊2 , … , 𝑊 𝑚 ] 20 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 21. Center for Evolutionary Medicine and Informatics Mean-Regularized Multi-Task Learning Evgeniou & Pontil, 2004 KDD • Assumption: task parameter vectors of all tasks are close to each other. – Advantage: simple, intuitive, easy to implement – Disadvantage: may not hold in real applications. Regularization penalizes the deviation of each task from the mean Task 𝑚 𝑚 2 1 2 1 mean min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊𝑖 − 𝑊𝑠 𝑊 2 𝑚 𝑖=1 𝑠=1 2 21 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 22. Center for Evolutionary Medicine and Informatics Regularization-based Multi-Task Learning • All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning • Clustered MTL • MTL with Tree/Graph structure 22 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 23. Center for Evolutionary Medicine and Informatics Multi-Task Learning with High Dimensional Data • In practical applications, we may deal with high dimensional data. – Gene expression data, biomedical image data • Curse of Dimensionality • Dealing with high dimensional data in multi-task learning – Embedded feature selection: L1/Lq - Group Lasso – Low-rank subspace learning: low-rank assumption – ASO, Trace-norm regularization 23 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 24. Center for Evolutionary Medicine and Informatics Regularization-based Multi-Task Learning • All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning • Clustered MTL • MTL with Tree/Graph structure 24 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 25. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Joint Feature Learning • One way to capture the task T ask 1 T ask 2 T ask 3 T ask 4 T ask 5 relatedness from multiple Feature 1 related tasks is to constrain Feature 2 Feature 3 all models to share a common Feature 4 set of features. Feature 5 • For example, in school data, Feature 6 the scores from different Feature 7 Feature 8 schools may be determined Feature 9 by a similar set of features. Feature 10 25 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 26. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Joint Feature Learning Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report • Using group sparsity: ℓ1 /ℓ 𝑞 -norm regularization 𝑑 • When q>1 we have group sparsity. 𝑊 1,𝑞 = 𝒘𝑖 𝑞 𝑖=1 Y X W Sample 1 Sample 2 Sample 3 ... ≈ × ... ... Sample n-2 ... Sample n-1 Sample n k1 k2 k3 km k1 k2 k3 km Tas Tas Tas Tas Tas Tas Tas Tas Output Input Model n×m n×d d×m 1 2 min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊 1,𝑞 𝑊 2 26 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 27. Center for Evolutionary Medicine and Informatics Writer-Specific Character Recognition Obozinski, Taskar, and Jordan, 2006 • Each task is a classification between two letters for one writer. 27 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 28. Center for Evolutionary Medicine and Informatics Dirty Model for Multi-Task Learning Jalali et. al. 2010 NIPS • In practical applications, it is too restrictive to constrain all tasks to share a single shared structure. = + Group Sparse Sparse Model Component Component W P Q 2 min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 1 𝑃,𝑄 28 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 29. Center for Evolutionary Medicine and Informatics Robust Multi-Task Learning o Most Existing MTL Approaches o Robust MTL Approaches all tasks are relevant relevant tasks irrelevant task irrelevant task Assumption: Assumption: All tasks are related There are outlier tasks 29 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 30. Center for Evolutionary Medicine and Informatics Robust Multi-Task Feature Learning Gong et. al. 2012 Submitted • Simultaneously captures a common set of features among relevant tasks and identifies outlier tasks. Joint Selected Features Outlier Tasks = + Group Sparse Group Sparse Model Component Component W P Q 2 min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 𝑇 1,𝑞 30 𝑃,𝑄 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 31. Center for Evolutionary Medicine and Informatics Regularization-based Multi-Task Learning • All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning • Clustered MTL • MTL with Tree/Graph structure 31 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 32. Center for Evolutionary Medicine and Informatics Trace-Norm Regularized MTL o Capture task relatedness via a shared low-rank structure task 1 training trained generalization data model task 2 training trained generalization data training model … … … task n training trained generalization data model A shared low-rank subspace 32 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 33. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL • Assume we have a rank 2 model matrix: training data weight vector target basis vector basis vector Task 1 × ≈ = 𝛼1 + 𝛼2 … Task 2 × ≈ = 𝛽1 + 𝛽1 … Task 3 × ≈ = 𝛾1 + 𝛾2 … The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 34. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL 𝛼1 𝛼2 𝑇 = × 𝛽1 𝛽2 𝛾1 𝛾2 A low rank structure Basis vectors Coefficients 𝜏 11 ⋯ 𝜏 1𝑚 × ⋮ ⋱ ⋮ … = … 𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚 𝑚 > 𝑝 m tasks p basis vectors 34 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 35. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL Ji et. al. 2009 ICML • Rank minimization formulation – min Loss(𝑊) + 𝜆 × Rank(𝑊) 𝑊 – Rank minimization is NP-Hard for general loss functions • Convex relaxation: trace norm minimization – min Loss(𝑊) + 𝜆 × 𝑊 ∗ 𝑊 ∗ : sum of singular values of W 𝑊 – The trace norm is theoretically shown to be a good approximation for rank function (Fazel et al., 2001). 35 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 36. Center for Evolutionary Medicine and Informatics Low-Rank Structure for MTL o Evaluation on the School data: • Predict exam scores for 15362 students from 139 schools • Describe each student by 27 attributes • Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure) 1.05 Ridge Regression Lasso Performance measure: 1 Trace Norm mean squared error 0.95 N−MSE = variance (target) 0.9 N-MSE The Low-Rank Structure 0.85 (induced via Trace Norm) 0.8 leads to the smallest N-MSE. 0.75 0.7 1 2 3 4 5 6 7 8 Index of Training Ratio 36 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 37. Center for Evolutionary Medicine and Informatics Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL • ASO assumes that the model is the sum of two components: a task specific one and a shared low dimensional subspace. Task 1 Input X w1 + Θ X v1 ≈ Task 2 Input X w2 + Θ X v2 Θ ≈ ... Low- Dimensional Feature Map Task m Input X wm + Θ X vm ≈ 37 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 38. Center for Evolutionary Medicine and Informatics Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL o Learning from the i-th task ui =Өvi + wi Xi yi Input X wi + Θ X vi ≈ Low- Dimensional Feature Map o Empirical loss function for i-th task 2 ℒi Xi Өvi + wi , yi = Xi Өvi + wi − yi o ASO simultaneously learns models and the shared structure: 𝑚 min ℒi Xi Өvi + wi , yi + 𝛼 wi 2 Ө,{vi ,wi } 𝑖=1 subject to ӨT Ө =I 38 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 39. Center for Evolutionary Medicine and Informatics iASO Formulation Chen et al., 2009 ICML o iASO formulation 𝑚 min ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi 2 + 𝛽 wi 2 Ө,{vi ,wi } 𝑖=1 subject to ӨT Ө = I • control both model complexity and task relatedness • subsume ASO (Ando et al.’05) as a special case • naturally lead to a convex relaxation (Chen et al., 09, ICML) • Convex relaxed ASO is equivalent to iASO under certain mild conditions 39 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 40. Center for Evolutionary Medicine and Informatics Incoherent Low-Rank and Sparse Structures Chen et. al. 2010 KDD • ASO uses L2-norm on task-specific component, we can also use L1-norm to learn task-specific features. Task-specific features Capture task relatedness = + = basis X coefficient Element-wise Sparse Low-Rank Model Component Component W Q P 𝑚 min ℒi Xi Pi + Q i , yi + ߣ Q 1 P,Q 𝑖=1 Convex formulation 40 subject to P ∗ ⩽ 𝜂 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 41. Center for Evolutionary Medicine and Informatics Robust Low-Rank in MTL Chen et. al. 2011 KDD • Simultaneously perform low-rank MTL and identify outlier tasks. Identify irrelevant tasks Capture task relatedness = + = basis X coefficient Outlier Group Sparse Tasks Low-Rank Model Component Component W Q P 𝑚 mini ℒ i X i 𝑃 𝑖 + 𝑄 𝑖 , yi + 𝛼 P ∗ + 𝛽 QT 1,𝑞 P,Q 41 𝑖=1 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 42. Center for Evolutionary Medicine and Informatics Summary • All multi-task learning formulations discussed above can fit into the W=P+Q schema. – Component P: shared structure – Component Q: information not captured by the shared structure Embedded Feature Selection Shared Structure P Component Q L1/Lq Feature Selection (L1/Lq Norm) 0 Dirty Feature Selection (L1/Lq Norm) L1-norm rMTFL Feature Selection (L1/Lq Norm) Outlier (column-wise L1/Lq Norm) Low-Rank Subspace Learning Trace Norm Low-Rank (Trace Norm) 0 ISLR Low-Rank (Trace Norm) L1-norm ASO Low-Rank (Shared Subspace) L2-norm on independent comp. RMTL Low-Rank (Trace Norm) Outlier (column-wise L1/Lq Norm) 42 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 43. Center for Evolutionary Medicine and Informatics Regularization-based Multi-Task Learning • All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning • Clustered MTL • MTL with Tree/Graph structure 43 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 44. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Clustered Structures • Most MTL techniques assume all tasks are related • Not true in many applications • Clustered multi-task learning assumes  the tasks have a group Assumption: Tasks have group structures structure  the models of tasks from the e.g. tasks in the yellow group are predictions of heart related same group are closer to each diseases and in the blue group are other than those from a brain related diseases. different group 44 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 45. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS • Use regularization to capture clustered structures. Clustered Models Training Data X ≈ ... ... Cluster 1 Cluster 2 Cluster k-1 Cluster k Cluster 2 Cluster 1 Training Data X ≈ Cluster k-1 Cluster k 45 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 46. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Zhou et. al. 2011 NIPS • Capture structures by minimizing sum- m tasks of-square error (SSE) in K-means clustering: 𝑘 ... 2 min 𝑤 𝑣 − 𝑤𝑗 𝐼 2 𝑗=1 𝑣∈𝐼 𝑗 Cluster 1 Cluster 2 Cluster k-1 Cluster k Ij index set of jth cluster Clustered Models Cluster 2 Cluster 1 Cluster k-1 Cluster k min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹) 𝐹 task number m > cluster number k 𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise 46 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 47. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Zhou et. al. 2011 NIPS • Directly minimizing SSE is hard m tasks because of the non-linear constraint on F: ... min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹) 𝐹 𝐹 : m×k orthogonal cluster indicator matrix Cluster 1 Cluster 2 Cluster k-1 Cluster k 𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise Clustered Models Cluster 2 Cluster 1 Cluster k-1 Cluster k min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹) 𝑇 task number m > cluster number k 𝐹:𝐹 𝐹=𝐼 𝑘 Zha et. al. 2001 NIPS 47 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 48. Center for Evolutionary Medicine and Informatics Clustered Multi-Task Learning Zhou et. al. 2011 NIPS • Clustered multi-task learning (CMTL) formulation min 𝑇 Loss W + 𝛼 tr 𝑊 𝑇 𝑊 − tr 𝐹 𝑇 𝑊 𝑇 𝑊𝐹 + 𝛽 tr 𝑊 𝑇 𝑊 𝑊,𝐹:𝐹 𝐹=𝐼 𝑘 capture cluster structures Improves Cluster 2 generalization performance Cluster 1 Cluster k-1 Cluster k • CMTL has been shown to be equivalent to ASO – Given the dimension of the shared low-rank subspace in ASO and the cluster number in clustered multi-task learning (CMTL) are the same. 48 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 49. Center for Evolutionary Medicine and Informatics Convex Clustered Multi-Task Learning Zhou et. al. 2011 NIPS Ground Truth Mean Regularized MTL noise introduced by relaxations Low rank can also well capture cluster structure Trace Norm Regularized Convex Relaxed CMTL MTL 49 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 50. Center for Evolutionary Medicine and Informatics Regularization-based Multi-Task Learning • All tasks are related – Mean-Regularized MTL – MTL in high dimensional feature space • Embedded Feature Selection • Low-Rank Subspace Learning • Clustered MTL • MTL with Tree/Graph structure 50 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 51. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Tree Structures • In some applications, the tasks may be equipped with a tree Assumption: Tasks have tree structures structure: – The tasks belonging to the same node are similar to each other – The similarity between two nodes is related to a b c the depth of the Task a is more similar with b, Models ‘common’ tree node compared to c 51 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 52. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Tree Structures Kim and Xing 2010 ICML • Tree-Guided Group Lasso 𝑑 𝑗 min Loss 𝑊 + 𝜆 𝑤𝑣 𝑊𝐺 𝑣 Structure 𝑊 2 𝑗=1 𝑣∈𝑉 W1 W2 W3 Gv5={W1, W2, W3} Input Features Gv4={W1, W2} Gv1={W1} Gv2={W2} Gv3={W3} Output (Tasks) 52 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 53. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Graph Structures • In real applications, tasks involved in MTL may have graph structures – The two tasks are related if they are connected in a graph, i.e. the connected tasks are similar – The similarity of two related Assumption: tasks can be represented by the Tasks have graph/network structures weight of the connecting edge. 53 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 54. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Graph Structures • A simple way to encode graph structure is to penalize the difference of two tasks that have an edge between them • Given a set of edges E, we thus penalize: 𝐸 2 2 𝐸 ×𝑚 𝑊𝑒{𝑖,1} − 𝑊𝑒{𝑖,2} = 𝑊𝑅 𝑇 𝐹 𝑅∈ℝ 2 𝑖=1 • The graph regularization term can also be represented in the form of Laplacian term 2 𝑊𝑅 𝑇 𝐹 = 𝑡𝑟 𝑊𝑅 𝑇 𝑇 𝑊𝑅 𝑇 = 𝑡𝑟 𝑊𝑅 𝑇 𝑅𝑊 𝑇 = 𝑡𝑟(𝑊ℒ𝑊 𝑇 ) 54 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 55. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Graph Structures • How to obtain graph information – External domain knowledge • protein-protein interaction (PPI) for microarray – Discover task relatedness from data • Pairwise correlation coefficient • Sparse inverse covariance (Friedman et. al. 2008 Biostatistics) 55 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 56. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Graph Structures Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics • Graph-guided Fused Lasso Gene Input SNP ACGTTTTACTGTACAATTTAC Lasso Output phenotype Gene Input SNP ACGTTTTACTGTACAATTTAC Graph-Guided Fused Lasso Output phenotype min Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty 𝑊 𝐽 Ω 𝑊 = 𝛾 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙 56 𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 57. Center for Evolutionary Medicine and Informatics Multi-Task Learning with Graph Structures Kim et. al. 2009 Bioinformatics • In some applications, we know not only which pairs are related, but also how they are related. • Graph-Weighted Fused Lasso. Gene Input SNP ACGTTTTACTGTACAATTTAC Graph-Weighted Fused Lasso Output phenotype 𝐽 min Loss 𝑊 + 𝜆 𝑊 1 + 𝛾 𝜏(𝑟 𝑚𝑙 ) 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙 𝑊 𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1 Added weight information! 57 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 58. Center for Evolutionary Medicine and Informatics Practical Guideline • MTL versus STL – MTL is preferred when dealing with multiple related tasks with small number of training samples • Shared features versus shared subspace – Identifying shared features is preferred When the data dimensionality is large – Identifying a shared subspace is preferred when the number of tasks is large 58 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 59. Center for Evolutionary Medicine and Informatics Tutorial Road Map • Part I: Multi-task Learning (MTL) background and motivations • Part II: MTL formulations • Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis • Part IV: An MTL Package (MALSAR) • Current and future directions 59 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 60. Center for Evolutionary Medicine and Informatics Case Study I: Incomplete Multi-Source Data Fusion Yuan et. al. 2012 NeuroImage • In many applications, PET MRI CSF P1 P2 P3 … P114 P115 P116 M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5 multiple data sources Subject1 ... ... ... may contain a Subject60 Subject61 considerable amount of Subject62 ... ... ... ... missing data. Subject139 Subject140 Subject141 • In ADNI, over half of the ... ... ... Subject148 subjects lack CSF Subject149 ... ... measurements; an Subject245 independent half of the subjects do not have FDG-PET. 60 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 61. Center for Evolutionary Medicine and Informatics Overview of iMSF Yuan et. al. 2012 NeuroImage PET MRI CSF Task I PET MRI CSF Model I Task II Model II Model III Model IV Task III Task IV 𝑚 𝑑 1 min 𝐿𝑜𝑠𝑠 𝑋 𝑖 , 𝑌𝑖 , 𝑊𝑖 + 𝜆 𝑊𝐺 𝑘 𝑊 𝑚 2 𝑖=1 𝑘=1 61 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 62. Center for Evolutionary Medicine and Informatics iMSF: Performance Yuan et. al. 2012 NeuroImage 0.84 0.83 iMSF Accuracy 0.82 Zero EM 0.81 KNN 0.8 SVD 0.79 50.0% 66.7% 75.0% 0.5 1 0.45 0.9 0.4 0.8 0.35 iMSF 0.7 iMSF Sensitivity Specificity 0.3 Zero 0.6 Zero 0.25 0.5 EM EM 0.2 0.4 0.15 KNN 0.3 KNN 0.1 SVD 0.2 SVD 0.05 0.1 0 0 50.0% 66.7% 75.0% 50.0% 66.7% 75.0% 62 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 63. Center for Evolutionary Medicine and Informatics Case Study II: Drosophila Gene Expression Image Analysis • Drosophila (fruit fly) is a favorite model system for geneticists and developmental biologists studying embryogenesis. – The small size and short generation time make it ideal for genetic studies. • In situ hybridization allows us to generate images showing when and where individual genes were active. – The analysis of such images can potentially reveal gene functions and gene-gene interactions. The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 64. Center for Evolutionary Medicine and Informatics Drosophila gene expression pattern images bgm en Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13- Berkeley Drosophila Genome Project (BDGP) Expressions http://www.fruitfly.org/ hb runt Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10- Fly-FISH http://fly-fish.ccbr.utoronto.ca/ [Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell] The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 65. Center for Evolutionary Medicine and Informatics Comparative image analysis Twist heartless stumps stage 4-6 anterior endoderm AISN dorsal ectoderm AISN anterior endoderm AISN trunk mesoderm AISN procephalic ectoderm AISN trunk mesoderm AISN subset subset head mesoderm AISN cellular blastoderm cellular blastoderm mesoderm AISN mesoderm AISN stage 7-8 trunk mesoderm PR trunk mesoderm PR yolk nuclei head mesoderm PR head mesoderm PR trunk mesoderm PR anterior endoderm anlage head mesoderm PR anterior endoderm anlage We need the spatial and temporal annotations of expressions [Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.] The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 66. Center for Evolutionary Medicine and Informatics Challenges of manual annotation 180000 160000 Cumulative number of in situ images 140000 Number of images 120000 100000 80000 60000 40000 20000 0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 67. Center for Evolutionary Medicine and Informatics Method outline Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS Feature extraction Model construction Low-rank multi-task Images Sparse coding Graph-based multi-task The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 68. Center for Evolutionary Medicine and Informatics Low rank multi-task learning model Ji et. al. 2009 BMC Bioinformatics W X Y k n Low rank  L( wiT x j , Y ji )   W i 1 j 1 * trace norm = sum of singular values Loss term The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 69. Center for Evolutionary Medicine and Informatics Graph-based multi-task learning model Ji et. al. 2009 SIGKDD sensory system head 0.56 ventral sensory complex 0.31 0.57 0.79 0.50 embryonic antennal 0.35 sense organ dorsal/lateral sensory complexes 0.60 0.36 embryonic maxillary sensory complex sensory nervous system loss 2-norm k n  L(w  g (C 2 x j , Y ji )  1 W  2 )  wp  sgn( C pq ) wq T 2 i F pq i 1 j 1 ( p , q )G Closed-form solution The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 70. Center for Evolutionary Medicine and Informatics Spatial annotation performance 85 83 81 79 SC+LR 77 AUC 75 SC+Graph 73 BoW+Graph 71 Kernel+SVM 69 67 65 Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16 • 50% data for training and 50% for testing and 30 random trials are generated • Multi-task approaches outperform single-task approaches The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 71. Center for Evolutionary Medicine and Informatics Tutorial Road Map • Part I: Multi-task Learning (MTL) background and motivations • Part II: MTL formulations • Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis • Part IV: An MTL Package (MALSAR) • Current and future directions 71 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 72. Center for Evolutionary Medicine and Informatics • A multi-task learning package • Encode task relationship via structural regularization • www.public.asu.edu/~jye02/Software/MALSAR/ 72 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 73. Center for Evolutionary Medicine and Informatics MTL Algorithms in MALSAR 1.0 • Mean-Regularized Multi-Task Learning • MTL with Embedded Feature Selection – Joint Feature Learning – Dirty Multi-Task Learning – Robust Multi-Task Feature Learning • MTL with Low-Rank Subspace Learning – Trace Norm Regularized Learning – Alternating Structure Optimization – Incoherent Sparse and Low Rank Learning – Robust Low-Rank Multi-Task Learning • Clustered Multi-Task Learning • Graph Regularized Multi-Task Learning 73 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 74. Center for Evolutionary Medicine and Informatics Tutorial Road Map • Part I: Multi-task Learning (MTL) background and motivations • Part II: MTL formulations • Part III: Case study of real-world applications – Incomplete Multi-Source Fusion – Drosophila Gene Expression Image Analysis • Part IV: An MTL Package (MALSAR) • Current and future directions 74 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 75. Center for Evolutionary Medicine and Informatics Trends in Multi-Task Learning • Develop efficient algorithms for large-scale multi- task learning. • Semi-supervised and unsupervised MTL • Learn task structures automatically in MTL • Asymmetric MTL • Cross-Domain MTL – The features may be different The relationship is not mutual 75 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 76. Center for Evolutionary Medicine and Informatics Acknowledgement • National Science Foundation • National Institute of Health 76 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 77. Center for Evolutionary Medicine and Informatics Reference 77 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 78. Center for Evolutionary Medicine and Informatics Reference 78 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 79. Center for Evolutionary Medicine and Informatics Reference 79 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 80. Center for Evolutionary Medicine and Informatics Reference 80 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 81. Center for Evolutionary Medicine and Informatics Reference 81 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 82. Center for Evolutionary Medicine and Informatics Reference 82 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
  • 83. Center for Evolutionary Medicine and Informatics Reference 83 The Twelfth SIAM Information Conference on Data Mining, April 27, 2012