1. Center for Evolutionary Medicine and Informatics
Multi-Task Learning:
Theory, Algorithms, and Applications
Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2
1Computer Science and Engineering, Arizona State University, AZ
2The Biodesign Institute, Arizona State University, AZ
3GE Global Research, NY
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
2. Center for Evolutionary Medicine and Informatics
Tutorial Goals
• Understand the basic concepts in multi-task learning
• Understand different approaches to model task
relatedness
• Get familiar with different types of multi-task
learning techniques
• Introduce multi-task learning applications
• Introduce the multi-task learning package: MALSAR
2
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
3. Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
3
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
4. Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
4
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
5. Center for Evolutionary Medicine and Informatics
Multiple Tasks
o Examination Scores Prediction1 (Argyriou et. al.’08)
School 1 - Alverno High School
Student Birth Previous … School … Exam
id year score ranking score
72981 1985 95 … 83% … ?
student-dependent school-dependent
School 138 - Jefferson Intermediate School
Student Birth Previous … School … Exam
id year score ranking score
31256 1986 87 … 72% … ?
student-dependent school-dependent
School 139 - Rosemead High School
Student Birth Previous … School … Exam
id year score ranking score
12381 1986 83 … 77% … ?
student-dependent school-dependent
1The Inner London Education Authority (ILEA)
5
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
6. Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks
o Learning each task independently
School 1 - Alverno High School
Student Birth Previous School … Exam task
id year score ranking Score 1st
72981 1985 95 83% … ?
Excellent
School 138 - Jefferson Intermediate School
Student Birth Previous School … Exam
task
id year score ranking Score
138th
31256 1986 87 72% … ?
Excellent
School 139 - Rosemead High School
Student Birth Previous School … Exam
id year score ranking Score task
139th
12381 1986 83 77% … ?
Excellent
6
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
7. Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks
o Leaning multiple tasks simultaneously
School 1 - Alverno High School
Student Birth Previous School … Exam task
id year score ranking Score 1st
72981 1985 95 83% … ?
School 138 - Jefferson Intermediate School
task Student Birth Previous School … Exam
id year score ranking Score
138th
31256 1986 87 72% … ?
School 139 - Rosemead High School
Student Birth Previous School … Exam task
id year score ranking Score 139th
12381 1986 83 77% … ?
Learn tasks simultaneously
Model the tasks relationship
7
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
8. Center for Evolutionary Medicine and Informatics
Performance of MTL
o Evaluation on the School data:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task
learning approach (trace-norm regularized learning)
1.05
Ridge Regression
Lasso
1 Trace Norm
0.95
0.9
N-MSE
0.85
0.8
0.75 Performance measure:
mean squared error
0.7 N−MSE =
1 2 3 4 5 6 7 8 variance (target)
8 Index of Training Ratio
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
9. Center for Evolutionary Medicine and Informatics
Multi-Task Learning
• Multi-task Learning is Single Task Learning
Trained
Task 1 Training Data Training Generalization
different from single task Model
learning in the training Task 2 Training Data Training
Trained
Model
Generalization
...
...
(induction) process.
Trained
Task t Training Data Training Generalization
Model
Multi-Task Learning
• Inductions of multiple Task 1 Training Data
Trained
Generalization
Model
tasks are performed
Trained
simultaneously to capture
Task 2 Training Data Generalization
Model
Training
...
...
intrinsic relatedness. Trained
Task t Training Data Generalization
Model
9
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
10. Center for Evolutionary Medicine and Informatics
Learning Methods
o
–
–
–
o
–
–
–
o
–
–
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
11. Center for Evolutionary Medicine and Informatics
Web Pages Categorization
Chen et. al. 2009 ICML
• Classify documents into Category
Classifiers
categories Health
Classifiers’ Parameters
Models of different
categories are
• The classification of each Travel
latently related
category is a task
...
World
• The tasks of predicting Politics
different categories may
US
be latently related
11
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
12. Center for Evolutionary Medicine and Informatics
MTL for HIV Therapy Screening
Bickel et. al. ICML 08
• Hundreds of possible combinations of drugs, some of which use
similar biochemical mechanisms
• The samples available for each combination are limited.
• For a patient, the prediction of using one combination is a task
• Use the similarity information by multiple task learning
Drug Patient Treatment Record
ETV
ABC ETV
AZT
...
AZT
FTC
?
D4T
DDI
FTC
12
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
13. Center for Evolutionary Medicine and Informatics
Other Applications
• Portfolio selection [Ghosn and Bengio, NIPS’97]
• Collaborative ordinal regression [Yu et. al. NIPS’06]
• Web image and video search [Wang et. al. CVPR’09]
• Disease progression modeling [Zhou et. al. KDD’11]
• Disease prediction [Zhang et. al. NeuroImage 12]
13
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
14. Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
14
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
15. Center for Evolutionary Medicine and Informatics
How Tasks Are Related
Methods
• Mean-regularized MTL
• Joint feature learning
• Trace-Norm regularized
MTL
Assumption:
• Alternating structural
All tasks are related
optimization (ASO)
• Shared Parameter
Gaussian Process
15
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
16. Center for Evolutionary Medicine and Informatics
How Tasks Are Related
• Assume all tasks are
related may be too
strong for practical
applications.
• There are some
Assumption: irrelevant (outlier)
There are outlier tasks
tasks.
16
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
17. Center for Evolutionary Medicine and Informatics
How Tasks Are Related
Methods
• Clustered MTL
• Tree MTL
• Network MTL
Assumption:
Tasks have group structures
Assumption:
Tasks have tree structures
Assumption:
Tasks have graph/network structures
a b c
Models
17
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
18. Center for Evolutionary Medicine and Informatics
Multi-Task Learning Methods
• Regularization-based MTL
– All tasks are related
• regularized MTL, joint feature learning, low rank MTL, ASO
– Learning with outlier tasks: robust MTL
– Tasks form groups/graphs/trees
• clustered MTL, network MTL, tree MTL
• Other Methods
– Shared Hidden Nodes in Neural Network
– Shared Parameter Gaussian Process
18
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
19. Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
19
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
20. Center for Evolutionary Medicine and Informatics
Notation
Dimension d Task m
Task m Task m
Sample n2
Sample n2
Sample nt
Sample nt
Dimension d
...
...
Learning
Sample n1
Sample n1
Feature Matrices Xi Target Vectors Yi Model Matrix W
• We focus on linear models: Yi = 𝑋 𝑖 × 𝑊𝑖
𝑋 𝑖 ∈ ℝ 𝑛 𝑖 ×𝑑 , 𝑌𝑖 ∈ ℝ 𝑛 𝑖 ×1 , 𝑊 = [𝑊1 , 𝑊2 , … , 𝑊 𝑚 ]
20
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
21. Center for Evolutionary Medicine and Informatics
Mean-Regularized Multi-Task Learning
Evgeniou & Pontil, 2004 KDD
• Assumption: task parameter vectors of all tasks are
close to each other.
– Advantage: simple, intuitive, easy to implement
– Disadvantage: may not hold in real applications.
Regularization
penalizes the deviation of each task
from the mean
Task
𝑚 𝑚 2
1 2
1
mean
min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊𝑖 − 𝑊𝑠
𝑊 2 𝑚
𝑖=1 𝑠=1 2
21
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
22. Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
22
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
23. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with High Dimensional Data
• In practical applications, we may deal with high
dimensional data.
– Gene expression data, biomedical image data
• Curse of Dimensionality
• Dealing with high dimensional data in multi-task
learning
– Embedded feature selection: L1/Lq - Group Lasso
– Low-rank subspace learning: low-rank assumption – ASO,
Trace-norm regularization
23
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
24. Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
24
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
25. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature
Learning
• One way to capture the task T ask
1
T ask
2
T ask
3
T ask
4
T ask
5
relatedness from multiple Feature 1
related tasks is to constrain Feature 2
Feature 3
all models to share a common
Feature 4
set of features. Feature 5
• For example, in school data, Feature 6
the scores from different Feature 7
Feature 8
schools may be determined Feature 9
by a similar set of features. Feature 10
25
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
26. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning
Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report
• Using group sparsity: ℓ1 /ℓ 𝑞 -norm regularization 𝑑
• When q>1 we have group sparsity. 𝑊 1,𝑞 = 𝒘𝑖 𝑞
𝑖=1
Y X W
Sample 1
Sample 2
Sample 3
... ≈ × ...
...
Sample n-2
...
Sample n-1
Sample n
k1 k2 k3 km k1 k2 k3 km
Tas Tas Tas Tas Tas Tas Tas Tas
Output Input Model
n×m n×d d×m
1 2
min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊 1,𝑞
𝑊 2
26
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
27. Center for Evolutionary Medicine and Informatics
Writer-Specific Character Recognition
Obozinski, Taskar, and Jordan, 2006
• Each task is a classification between two letters for
one writer.
27
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
28. Center for Evolutionary Medicine and Informatics
Dirty Model for Multi-Task Learning
Jalali et. al. 2010 NIPS
• In practical applications, it is too restrictive to
constrain all tasks to share a single shared structure.
= +
Group Sparse Sparse
Model
Component Component
W P Q
2
min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 1
𝑃,𝑄
28
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
29. Center for Evolutionary Medicine and Informatics
Robust Multi-Task Learning
o Most Existing MTL Approaches o Robust MTL Approaches
all tasks are relevant relevant tasks
irrelevant task
irrelevant task
Assumption: Assumption:
All tasks are related There are outlier tasks
29
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
30. Center for Evolutionary Medicine and Informatics
Robust Multi-Task Feature Learning
Gong et. al. 2012 Submitted
• Simultaneously captures a common set of features
among relevant tasks and identifies outlier tasks. Joint Selected
Features Outlier Tasks
= +
Group Sparse Group Sparse
Model
Component Component
W P Q
2
min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 𝑇 1,𝑞
30 𝑃,𝑄
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
31. Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
31
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
32. Center for Evolutionary Medicine and Informatics
Trace-Norm Regularized MTL
o Capture task relatedness via a shared low-rank structure
task 1 training trained generalization
data model
task 2 training trained generalization
data training model
…
…
…
task n training trained generalization
data model
A shared low-rank subspace
32
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
33. Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
• Assume we have a rank 2 model matrix:
training data weight vector target basis vector basis vector
Task 1
× ≈ = 𝛼1 + 𝛼2
…
Task 2
× ≈ = 𝛽1 + 𝛽1
…
Task 3 × ≈ = 𝛾1 + 𝛾2
…
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
34. Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
𝛼1 𝛼2 𝑇
= × 𝛽1 𝛽2
𝛾1 𝛾2
A low rank structure Basis vectors Coefficients
𝜏 11 ⋯ 𝜏 1𝑚
× ⋮ ⋱ ⋮
… =
… 𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚
𝑚 > 𝑝
m tasks p basis vectors
34
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
35. Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
Ji et. al. 2009 ICML
• Rank minimization formulation
– min Loss(𝑊) + 𝜆 × Rank(𝑊)
𝑊
– Rank minimization is NP-Hard for general loss functions
• Convex relaxation: trace norm minimization
– min Loss(𝑊) + 𝜆 × 𝑊 ∗ 𝑊 ∗ : sum of singular values of W
𝑊
– The trace norm is theoretically shown to be a good
approximation for rank function (Fazel et al., 2001).
35
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
36. Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
o Evaluation on the School data:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)
1.05
Ridge Regression
Lasso Performance measure:
1 Trace Norm
mean squared error
0.95
N−MSE =
variance (target)
0.9
N-MSE
The Low-Rank Structure
0.85
(induced via Trace Norm)
0.8 leads to the smallest N-MSE.
0.75
0.7
1 2 3 4 5 6 7 8
Index of Training Ratio
36
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
37. Center for Evolutionary Medicine and Informatics
Alternating Structure Optimization (ASO)
Ando and Zhang, 2005 JMRL
• ASO assumes that the model is the sum of two components: a
task specific one and a shared low dimensional subspace.
Task 1 Input X w1 + Θ X v1
≈
Task 2 Input X w2 + Θ X v2 Θ
≈
...
Low-
Dimensional
Feature Map
Task m Input X wm + Θ X vm
≈
37
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
38. Center for Evolutionary Medicine and Informatics
Alternating Structure Optimization (ASO)
Ando and Zhang, 2005 JMRL
o Learning from the i-th task ui =Өvi + wi
Xi yi
Input X wi + Θ X vi
≈
Low-
Dimensional
Feature Map
o Empirical loss function for i-th task
2
ℒi Xi Өvi + wi , yi = Xi Өvi + wi − yi
o ASO simultaneously learns models and the shared structure:
𝑚
min ℒi Xi Өvi + wi , yi + 𝛼 wi 2
Ө,{vi ,wi }
𝑖=1
subject to ӨT Ө =I
38
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
39. Center for Evolutionary Medicine and Informatics
iASO Formulation
Chen et al., 2009 ICML
o iASO formulation
𝑚
min ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi 2
+ 𝛽 wi 2
Ө,{vi ,wi }
𝑖=1
subject to ӨT Ө = I
• control both model complexity and task relatedness
• subsume ASO (Ando et al.’05) as a special case
• naturally lead to a convex relaxation (Chen et al., 09, ICML)
• Convex relaxed ASO is equivalent to iASO under certain
mild conditions
39
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
40. Center for Evolutionary Medicine and Informatics
Incoherent Low-Rank and Sparse Structures
Chen et. al. 2010 KDD
• ASO uses L2-norm on task-specific component, we
can also use L1-norm to learn task-specific features.
Task-specific features Capture task relatedness
= + = basis X coefficient
Element-wise Sparse Low-Rank
Model Component Component
W Q P
𝑚
min ℒi Xi Pi + Q i , yi + ߣ Q 1
P,Q
𝑖=1 Convex formulation
40 subject to P ∗ ⩽ 𝜂
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
41. Center for Evolutionary Medicine and Informatics
Robust Low-Rank in MTL
Chen et. al. 2011 KDD
• Simultaneously perform low-rank MTL and identify
outlier tasks.
Identify irrelevant tasks Capture task relatedness
= + = basis
X coefficient
Outlier
Group Sparse Tasks Low-Rank
Model Component
Component
W Q P
𝑚
mini ℒ i X i 𝑃 𝑖 + 𝑄 𝑖 , yi + 𝛼 P ∗ + 𝛽 QT 1,𝑞
P,Q
41 𝑖=1
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
42. Center for Evolutionary Medicine and Informatics
Summary
• All multi-task learning formulations discussed above
can fit into the W=P+Q schema.
– Component P: shared structure
– Component Q: information not captured by the shared
structure
Embedded Feature
Selection Shared Structure P Component Q
L1/Lq Feature Selection (L1/Lq Norm) 0
Dirty Feature Selection (L1/Lq Norm) L1-norm
rMTFL Feature Selection (L1/Lq Norm) Outlier (column-wise L1/Lq Norm)
Low-Rank Subspace
Learning
Trace Norm Low-Rank (Trace Norm) 0
ISLR Low-Rank (Trace Norm) L1-norm
ASO Low-Rank (Shared Subspace) L2-norm on independent comp.
RMTL Low-Rank (Trace Norm) Outlier (column-wise L1/Lq Norm)
42
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
43. Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
43
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
44. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Clustered
Structures
• Most MTL techniques assume
all tasks are related
• Not true in many applications
• Clustered multi-task learning
assumes
the tasks have a group Assumption:
Tasks have group structures
structure
the models of tasks from the e.g. tasks in the yellow group are
predictions of heart related
same group are closer to each diseases and in the blue group are
other than those from a brain related diseases.
different group
44
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
45. Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning
Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS
• Use regularization to capture clustered structures.
Clustered Models
Training Data X ≈ ...
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 2
Cluster 1
Training Data X ≈
Cluster k-1
Cluster k
45
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
46. Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning
Zhou et. al. 2011 NIPS
• Capture structures by minimizing sum-
m tasks
of-square error (SSE) in K-means
clustering:
𝑘 ...
2
min 𝑤 𝑣 − 𝑤𝑗
𝐼 2
𝑗=1 𝑣∈𝐼 𝑗
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Ij index set of jth cluster Clustered Models Cluster 2
Cluster 1
Cluster k-1
Cluster k
min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
𝐹 task number m > cluster number k
𝐹 : m×k orthogonal cluster indicator matrix
𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
46
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
47. Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning
Zhou et. al. 2011 NIPS
• Directly minimizing SSE is hard m tasks
because of the non-linear
constraint on F:
...
min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
𝐹
𝐹 : m×k orthogonal cluster indicator matrix Cluster 1 Cluster 2 Cluster k-1 Cluster k
𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise Clustered Models Cluster 2
Cluster 1
Cluster k-1
Cluster k
min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
𝑇
task number m > cluster number k
𝐹:𝐹 𝐹=𝐼 𝑘
Zha et. al. 2001 NIPS
47
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
48. Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning
Zhou et. al. 2011 NIPS
• Clustered multi-task learning (CMTL) formulation
min
𝑇
Loss W + 𝛼 tr 𝑊 𝑇 𝑊 − tr 𝐹 𝑇 𝑊 𝑇 𝑊𝐹 + 𝛽 tr 𝑊 𝑇 𝑊
𝑊,𝐹:𝐹 𝐹=𝐼 𝑘
capture cluster structures Improves
Cluster 2
generalization
performance
Cluster 1
Cluster k-1
Cluster k
• CMTL has been shown to be equivalent to ASO
– Given the dimension of the shared low-rank subspace in
ASO and the cluster number in clustered multi-task
learning (CMTL) are the same.
48
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
49. Center for Evolutionary Medicine and Informatics
Convex Clustered Multi-Task Learning
Zhou et. al. 2011 NIPS
Ground Truth Mean Regularized MTL
noise introduced
by relaxations
Low rank can also
well capture
cluster structure
Trace Norm Regularized Convex Relaxed CMTL
MTL
49
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
50. Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
50
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
51. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Tree Structures
• In some applications,
the tasks may be
equipped with a tree Assumption:
Tasks have tree structures
structure:
– The tasks belonging to
the same node are
similar to each other
– The similarity between
two nodes is related to a b c
the depth of the Task a is more similar with b,
Models
‘common’ tree node compared to c
51
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
52. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Tree Structures
Kim and Xing 2010 ICML
• Tree-Guided Group Lasso
𝑑
𝑗
min Loss 𝑊 + 𝜆 𝑤𝑣 𝑊𝐺 𝑣
Structure 𝑊 2
𝑗=1 𝑣∈𝑉
W1 W2 W3
Gv5={W1, W2, W3}
Input Features
Gv4={W1, W2}
Gv1={W1} Gv2={W2} Gv3={W3}
Output (Tasks)
52
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
53. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• In real applications, tasks
involved in MTL may have
graph structures
– The two tasks are related if they
are connected in a graph, i.e. the
connected tasks are similar
– The similarity of two related
Assumption:
tasks can be represented by the Tasks have graph/network structures
weight of the connecting edge.
53
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
54. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• A simple way to encode graph structure is to
penalize the difference of two tasks that have an
edge between them
• Given a set of edges E, we thus penalize:
𝐸
2
2 𝐸 ×𝑚
𝑊𝑒{𝑖,1} − 𝑊𝑒{𝑖,2} = 𝑊𝑅 𝑇 𝐹 𝑅∈ℝ
2
𝑖=1
• The graph regularization term can also be
represented in the form of Laplacian term
2
𝑊𝑅 𝑇 𝐹 = 𝑡𝑟 𝑊𝑅 𝑇 𝑇
𝑊𝑅 𝑇 = 𝑡𝑟 𝑊𝑅 𝑇 𝑅𝑊 𝑇 = 𝑡𝑟(𝑊ℒ𝑊 𝑇 )
54
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
55. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• How to obtain graph information
– External domain knowledge
• protein-protein interaction (PPI) for microarray
– Discover task relatedness from data
• Pairwise correlation coefficient
• Sparse inverse covariance (Friedman et. al. 2008 Biostatistics)
55
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
56. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics
• Graph-guided Fused Lasso
Gene
Input
SNP ACGTTTTACTGTACAATTTAC
Lasso
Output phenotype
Gene
Input
SNP ACGTTTTACTGTACAATTTAC
Graph-Guided
Fused Lasso
Output phenotype
min Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty
𝑊 𝐽
Ω 𝑊 = 𝛾 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙
56 𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
57. Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
Kim et. al. 2009 Bioinformatics
• In some applications, we know not only which pairs
are related, but also how they are related.
• Graph-Weighted Fused Lasso.
Gene
Input
SNP ACGTTTTACTGTACAATTTAC
Graph-Weighted
Fused Lasso
Output phenotype
𝐽
min Loss 𝑊 + 𝜆 𝑊 1 + 𝛾 𝜏(𝑟 𝑚𝑙 ) 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙
𝑊
𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1
Added weight information!
57
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
58. Center for Evolutionary Medicine and Informatics
Practical Guideline
• MTL versus STL
– MTL is preferred when dealing with multiple related tasks
with small number of training samples
• Shared features versus shared subspace
– Identifying shared features is preferred When the data
dimensionality is large
– Identifying a shared subspace is preferred when the
number of tasks is large
58
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
59. Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
59
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
60. Center for Evolutionary Medicine and Informatics
Case Study I: Incomplete Multi-Source
Data Fusion
Yuan et. al. 2012 NeuroImage
• In many applications, PET MRI CSF
P1 P2 P3 … P114 P115 P116 M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5
multiple data sources Subject1
...
...
...
may contain a Subject60
Subject61
considerable amount of Subject62
...
...
...
...
missing data. Subject139
Subject140
Subject141
• In ADNI, over half of the
...
...
...
Subject148
subjects lack CSF Subject149
...
...
measurements; an Subject245
independent half of the
subjects do not have
FDG-PET.
60
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
61. Center for Evolutionary Medicine and Informatics
Overview of iMSF
Yuan et. al. 2012 NeuroImage
PET MRI CSF
Task I PET MRI CSF
Model I
Task II Model II
Model III
Model IV
Task III
Task IV
𝑚 𝑑
1
min 𝐿𝑜𝑠𝑠 𝑋 𝑖 , 𝑌𝑖 , 𝑊𝑖 + 𝜆 𝑊𝐺 𝑘
𝑊 𝑚 2
𝑖=1 𝑘=1
61
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
62. Center for Evolutionary Medicine and Informatics
iMSF: Performance
Yuan et. al. 2012 NeuroImage
0.84
0.83
iMSF
Accuracy
0.82 Zero
EM
0.81
KNN
0.8 SVD
0.79
50.0% 66.7% 75.0%
0.5 1
0.45 0.9
0.4 0.8
0.35 iMSF 0.7 iMSF
Sensitivity
Specificity
0.3 Zero 0.6 Zero
0.25 0.5
EM EM
0.2 0.4
0.15 KNN 0.3 KNN
0.1 SVD 0.2 SVD
0.05 0.1
0 0
50.0% 66.7% 75.0% 50.0% 66.7% 75.0%
62
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
63. Center for Evolutionary Medicine and Informatics
Case Study II: Drosophila Gene Expression
Image Analysis
• Drosophila (fruit fly) is a favorite model system for geneticists and
developmental biologists studying embryogenesis.
– The small size and short generation time make it ideal for genetic studies.
• In situ hybridization allows us to generate images showing when and where
individual genes were active.
– The analysis of such images can potentially reveal gene functions and
gene-gene interactions.
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
64. Center for Evolutionary Medicine and Informatics
Drosophila gene expression pattern images
bgm
en
Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13-
Berkeley Drosophila Genome Project (BDGP)
Expressions http://www.fruitfly.org/
hb
runt
Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10-
Fly-FISH
http://fly-fish.ccbr.utoronto.ca/
[Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
65. Center for Evolutionary Medicine and Informatics
Comparative image analysis
Twist heartless stumps
stage 4-6
anterior endoderm AISN dorsal ectoderm AISN anterior endoderm AISN
trunk mesoderm AISN procephalic ectoderm AISN trunk mesoderm AISN
subset subset head mesoderm AISN
cellular blastoderm cellular blastoderm
mesoderm AISN mesoderm AISN
stage 7-8
trunk mesoderm PR trunk mesoderm PR yolk nuclei
head mesoderm PR head mesoderm PR trunk mesoderm PR
anterior endoderm anlage head mesoderm PR
anterior endoderm anlage
We need the spatial and temporal annotations of expressions
[Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
66. Center for Evolutionary Medicine and Informatics
Challenges of manual annotation
180000
160000
Cumulative number of in situ images
140000
Number of images
120000
100000
80000
60000
40000
20000
0
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
67. Center for Evolutionary Medicine and Informatics
Method outline
Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS
Feature extraction Model construction
Low-rank
multi-task
Images Sparse coding
Graph-based
multi-task
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
68. Center for Evolutionary Medicine and Informatics
Low rank multi-task learning model
Ji et. al. 2009 BMC Bioinformatics
W
X Y
k n
Low rank
L( wiT x j , Y ji ) W
i 1 j 1
*
trace norm = sum of singular values
Loss term
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
69. Center for Evolutionary Medicine and Informatics
Graph-based multi-task learning model
Ji et. al. 2009 SIGKDD
sensory system
head
0.56
ventral sensory
complex
0.31
0.57
0.79
0.50 embryonic antennal
0.35
sense organ
dorsal/lateral
sensory complexes
0.60
0.36
embryonic maxillary
sensory complex
sensory nervous
system
loss 2-norm
k n
L(w g (C
2
x j , Y ji ) 1 W 2 ) wp sgn( C pq ) wq
T 2
i F pq
i 1 j 1 ( p , q )G
Closed-form solution
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
70. Center for Evolutionary Medicine and Informatics
Spatial annotation performance
85
83
81
79
SC+LR
77
AUC
75 SC+Graph
73 BoW+Graph
71 Kernel+SVM
69
67
65
Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16
• 50% data for training and 50% for testing and 30 random trials are generated
• Multi-task approaches outperform single-task approaches
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
71. Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
71
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
72. Center for Evolutionary Medicine and Informatics
• A multi-task learning package
• Encode task relationship via structural regularization
• www.public.asu.edu/~jye02/Software/MALSAR/
72
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
73. Center for Evolutionary Medicine and Informatics
MTL Algorithms in MALSAR 1.0
• Mean-Regularized Multi-Task Learning
• MTL with Embedded Feature Selection
– Joint Feature Learning
– Dirty Multi-Task Learning
– Robust Multi-Task Feature Learning
• MTL with Low-Rank Subspace Learning
– Trace Norm Regularized Learning
– Alternating Structure Optimization
– Incoherent Sparse and Low Rank Learning
– Robust Low-Rank Multi-Task Learning
• Clustered Multi-Task Learning
• Graph Regularized Multi-Task Learning
73
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
74. Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
74
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
75. Center for Evolutionary Medicine and Informatics
Trends in Multi-Task Learning
• Develop efficient algorithms for large-scale multi-
task learning.
• Semi-supervised and unsupervised MTL
• Learn task structures automatically in MTL
• Asymmetric MTL
• Cross-Domain MTL
– The features may be different The relationship is not mutual
75
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
76. Center for Evolutionary Medicine and Informatics
Acknowledgement
• National Science Foundation
• National Institute of Health
76
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
77. Center for Evolutionary Medicine and Informatics
Reference
77
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
78. Center for Evolutionary Medicine and Informatics
Reference
78
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
79. Center for Evolutionary Medicine and Informatics
Reference
79
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
80. Center for Evolutionary Medicine and Informatics
Reference
80
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
81. Center for Evolutionary Medicine and Informatics
Reference
81
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
82. Center for Evolutionary Medicine and Informatics
Reference
82
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
83. Center for Evolutionary Medicine and Informatics
Reference
83
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012