Tutorial SIAM Data Mining 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning:
Theory, Algorithms, and Applications
Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2
1Computer Science and Engineering, Arizona State University, AZ

2The Biodesign Institute, Arizona State University, AZ

3GE Global Research, NY

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012


Tutorial Goals
• Understand the basic concepts in multi-task learning
• Understand different approaches to model task
relatedness
• Get familiar with different types of multi-task
learning techniques
• Introduce multi-task learning applications
• Introduce the multi-task learning package: MALSAR

2


Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and
motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions

3


Tutorial Road Map
motivations

4


Multiple Tasks
o Examination Scores Prediction1 (Argyriou et. al.’08)
School 1 - Alverno High School

Student Birth Previous … School … Exam
id year score ranking score
72981 1985 95 … 83% … ?

student-dependent school-dependent
School 138 - Jefferson Intermediate School

31256 1986 87 … 72% … ?

School 139 - Rosemead High School

12381 1986 83 … 77% … ?

1The Inner London Education Authority (ILEA)
5


Learning Multiple Tasks
o Learning each task independently
Student Birth Previous School … Exam task
id year score ranking Score 1st
72981 1985 95 83% … ?
Excellent

Student Birth Previous School … Exam
task
id year score ranking Score
138th
31256 1986 87 72% … ?
Excellent

Student Birth Previous School … Exam
id year score ranking Score task
139th
12381 1986 83 77% … ?
Excellent
6


Learning Multiple Tasks
o Leaning multiple tasks simultaneously
id year score ranking Score 1st
72981 1985 95 83% … ?

task Student Birth Previous School … Exam
id year score ranking Score
138th
31256 1986 87 72% … ?

id year score ranking Score 139th
12381 1986 83 77% … ?

Learn tasks simultaneously
Model the tasks relationship
7


Performance of MTL
o Evaluation on the School data:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task
learning approach (trace-norm regularized learning)
1.05
Ridge Regression
Lasso
1 Trace Norm

0.95

0.9
N-MSE

0.85

0.8

0.75 Performance measure:
mean squared error
0.7 N−MSE =
1 2 3 4 5 6 7 8 variance (target)
8 Index of Training Ratio


Multi-Task Learning
• Multi-task Learning is Single Task Learning

Trained
Task 1 Training Data Training Generalization

different from single task Model

learning in the training Task 2 Training Data Training
Trained
Model
Generalization

...

...
(induction) process.
Trained
Task t Training Data Training Generalization
Model

Multi-Task Learning
• Inductions of multiple Task 1 Training Data
Trained
Generalization
Model

tasks are performed
Trained

simultaneously to capture
Task 2 Training Data Generalization
Model
Training

...

...
intrinsic relatedness. Trained
Task t Training Data Generalization
Model

9


Learning Methods

o
–
–
–

o
–
–
–

o
–
–



Web Pages Categorization
Chen et. al. 2009 ICML

• Classify documents into Category
Classifiers
categories Health
Classifiers’ Parameters
Models of different
categories are
• The classification of each Travel
latently related

category is a task

...
World
• The tasks of predicting Politics
different categories may
US
be latently related

11


MTL for HIV Therapy Screening
Bickel et. al. ICML 08
• Hundreds of possible combinations of drugs, some of which use
similar biochemical mechanisms
• The samples available for each combination are limited.
• For a patient, the prediction of using one combination is a task
• Use the similarity information by multiple task learning
Drug Patient Treatment Record

ETV

ABC ETV

AZT

...
AZT

FTC
?
D4T

DDI

FTC

12


Other Applications
• Portfolio selection [Ghosn and Bengio, NIPS’97]

• Collaborative ordinal regression [Yu et. al. NIPS’06]

• Web image and video search [Wang et. al. CVPR’09]

• Disease progression modeling [Zhou et. al. KDD’11]

• Disease prediction [Zhang et. al. NeuroImage 12]

13


Tutorial Road Map
motivations

14


How Tasks Are Related

Methods
• Mean-regularized MTL
• Joint feature learning
• Trace-Norm regularized
MTL
Assumption:
• Alternating structural
All tasks are related
optimization (ASO)
• Shared Parameter
Gaussian Process
15



• Assume all tasks are
related may be too
strong for practical
applications.
• There are some
Assumption: irrelevant (outlier)
There are outlier tasks
tasks.

16


Methods
• Clustered MTL
• Tree MTL
• Network MTL
Assumption:
Tasks have group structures
Assumption:
Tasks have tree structures

Assumption:
Tasks have graph/network structures
a b c
Models
17


Multi-Task Learning Methods
• Regularization-based MTL
– All tasks are related
• regularized MTL, joint feature learning, low rank MTL, ASO
– Learning with outlier tasks: robust MTL
– Tasks form groups/graphs/trees
• clustered MTL, network MTL, tree MTL

• Other Methods
– Shared Hidden Nodes in Neural Network
– Shared Parameter Gaussian Process

18


Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space
• Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure

19


Notation
Dimension d Task m

Task m Task m
Sample n2

Sample n2
Sample nt

Sample nt

Dimension d
...

...
Learning
Sample n1

Sample n1
Feature Matrices Xi Target Vectors Yi Model Matrix W

• We focus on linear models: Yi = 𝑋 𝑖 × 𝑊𝑖
𝑋 𝑖 ∈ ℝ 𝑛 𝑖 ×𝑑 , 𝑌𝑖 ∈ ℝ 𝑛 𝑖 ×1 , 𝑊 = [𝑊1 , 𝑊2 , … , 𝑊 𝑚 ]
20


Mean-Regularized Multi-Task Learning
Evgeniou & Pontil, 2004 KDD

• Assumption: task parameter vectors of all tasks are
close to each other.
– Advantage: simple, intuitive, easy to implement
– Disadvantage: may not hold in real applications.

Regularization
penalizes the deviation of each task
from the mean
Task
𝑚 𝑚 2
1 2
1
mean

min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊𝑖 − 𝑊𝑠
𝑊 2 𝑚
𝑖=1 𝑠=1 2

21


• Clustered MTL

22


Multi-Task Learning with High Dimensional Data
• In practical applications, we may deal with high
dimensional data.
– Gene expression data, biomedical image data
• Curse of Dimensionality
• Dealing with high dimensional data in multi-task
learning
– Embedded feature selection: L1/Lq - Group Lasso
– Low-rank subspace learning: low-rank assumption – ASO,
Trace-norm regularization

23


• Clustered MTL

24


Multi-Task Learning with Joint Feature
Learning
• One way to capture the task T ask
1
T ask
2
T ask
3
T ask
4
T ask
5

relatedness from multiple Feature 1

related tasks is to constrain Feature 2

Feature 3
all models to share a common
Feature 4
set of features. Feature 5

• For example, in school data, Feature 6

the scores from different Feature 7

Feature 8
schools may be determined Feature 9
by a similar set of features. Feature 10

25


Multi-Task Learning with Joint Feature Learning
Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report

• Using group sparsity: ℓ1 /ℓ 𝑞 -norm regularization 𝑑
• When q>1 we have group sparsity. 𝑊 1,𝑞 = 𝒘𝑖 𝑞
𝑖=1
Y X W
Sample 1

Sample 2

Sample 3

... ≈ × ...
...
Sample n-2

...
Sample n-1

Sample n

k1 k2 k3 km k1 k2 k3 km
Tas Tas Tas Tas Tas Tas Tas Tas

Output Input Model
n×m n×d d×m
1 2
min 𝑋𝑊 − 𝑌 𝐹 + 𝜆 𝑊 1,𝑞
𝑊 2
26


Writer-Specific Character Recognition
Obozinski, Taskar, and Jordan, 2006

• Each task is a classification between two letters for
one writer.

27


Dirty Model for Multi-Task Learning
Jalali et. al. 2010 NIPS

• In practical applications, it is too restrictive to
constrain all tasks to share a single shared structure.

= +

Group Sparse Sparse
Model
Component Component
W P Q
2
min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 1
𝑃,𝑄
28


Robust Multi-Task Learning
o Most Existing MTL Approaches o Robust MTL Approaches

all tasks are relevant relevant tasks

irrelevant task

irrelevant task

Assumption: Assumption:
All tasks are related There are outlier tasks

29


Robust Multi-Task Feature Learning
Gong et. al. 2012 Submitted
• Simultaneously captures a common set of features
among relevant tasks and identifies outlier tasks. Joint Selected
Features Outlier Tasks

= +

Group Sparse Group Sparse
Model
Component Component
W P Q
2
min 𝑌 − 𝑋 𝑃 + 𝑄 𝐹 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 𝑇 1,𝑞
30 𝑃,𝑄


• Clustered MTL

31


Trace-Norm Regularized MTL
o Capture task relatedness via a shared low-rank structure

task 1 training trained generalization
data model

task 2 training trained generalization
data training model
…

…
…
task n training trained generalization
data model

A shared low-rank subspace

32


Low-Rank Structure for MTL
• Assume we have a rank 2 model matrix:
training data weight vector target basis vector basis vector

Task 1
× ≈ = 𝛼1 + 𝛼2
…

Task 2
× ≈ = 𝛽1 + 𝛽1
…

Task 3 × ≈ = 𝛾1 + 𝛾2
…




𝛼1 𝛼2 𝑇
= × 𝛽1 𝛽2
𝛾1 𝛾2

A low rank structure Basis vectors Coefficients

𝜏 11 ⋯ 𝜏 1𝑚
× ⋮ ⋱ ⋮
… =
… 𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚

𝑚 > 𝑝
m tasks p basis vectors
34


Ji et. al. 2009 ICML

• Rank minimization formulation
– min Loss(𝑊) + 𝜆 × Rank(𝑊)
𝑊
– Rank minimization is NP-Hard for general loss functions

• Convex relaxation: trace norm minimization
– min Loss(𝑊) + 𝜆 × 𝑊 ∗ 𝑊 ∗ : sum of singular values of W
𝑊
– The trace norm is theoretically shown to be a good
approximation for rank function (Fazel et al., 2001).

35


o Evaluation on the School data:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)

1.05
Ridge Regression
Lasso Performance measure:
1 Trace Norm
mean squared error
0.95
N−MSE =
variance (target)
0.9
N-MSE

The Low-Rank Structure
0.85
(induced via Trace Norm)
0.8 leads to the smallest N-MSE.

0.75

0.7
1 2 3 4 5 6 7 8
Index of Training Ratio

36


Alternating Structure Optimization (ASO)
Ando and Zhang, 2005 JMRL

• ASO assumes that the model is the sum of two components: a
task specific one and a shared low dimensional subspace.

Task 1 Input X w1 + Θ X v1
≈

Task 2 Input X w2 + Θ X v2 Θ
≈
...

Low-
Dimensional
Feature Map

Task m Input X wm + Θ X vm
≈

37


Alternating Structure Optimization (ASO)
Ando and Zhang, 2005 JMRL
o Learning from the i-th task ui =Өvi + wi
Xi yi

Input X wi + Θ X vi
≈

Low-
Dimensional
Feature Map

o Empirical loss function for i-th task
2
ℒi Xi Өvi + wi , yi = Xi Өvi + wi − yi

o ASO simultaneously learns models and the shared structure:
𝑚
min ℒi Xi Өvi + wi , yi + 𝛼 wi 2
Ө,{vi ,wi }
𝑖=1
subject to ӨT Ө =I
38


iASO Formulation
Chen et al., 2009 ICML

o iASO formulation
𝑚
min ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi 2
+ 𝛽 wi 2
Ө,{vi ,wi }
𝑖=1
subject to ӨT Ө = I

• control both model complexity and task relatedness
• subsume ASO (Ando et al.’05) as a special case
• naturally lead to a convex relaxation (Chen et al., 09, ICML)
• Convex relaxed ASO is equivalent to iASO under certain
mild conditions

39


Incoherent Low-Rank and Sparse Structures
Chen et. al. 2010 KDD

• ASO uses L2-norm on task-specific component, we
can also use L1-norm to learn task-specific features.
Task-specific features Capture task relatedness

= + = basis X coefficient

Element-wise Sparse Low-Rank
Model Component Component
W Q P
𝑚
min ℒi Xi Pi + Q i , yi + ߣ Q 1
P,Q
𝑖=1 Convex formulation
40 subject to P ∗ ⩽ 𝜂


Robust Low-Rank in MTL
Chen et. al. 2011 KDD

• Simultaneously perform low-rank MTL and identify
outlier tasks.
Identify irrelevant tasks Capture task relatedness

= + = basis
X coefficient

Outlier
Group Sparse Tasks Low-Rank
Model Component
Component
W Q P
𝑚

mini ℒ i X i 𝑃 𝑖 + 𝑄 𝑖 , yi + 𝛼 P ∗ + 𝛽 QT 1,𝑞
P,Q
41 𝑖=1


Summary
• All multi-task learning formulations discussed above
can fit into the W=P+Q schema.
– Component P: shared structure
– Component Q: information not captured by the shared
structure
Embedded Feature
Selection Shared Structure P Component Q
L1/Lq Feature Selection (L1/Lq Norm) 0
Dirty Feature Selection (L1/Lq Norm) L1-norm
rMTFL Feature Selection (L1/Lq Norm) Outlier (column-wise L1/Lq Norm)
Low-Rank Subspace
Learning
Trace Norm Low-Rank (Trace Norm) 0
ISLR Low-Rank (Trace Norm) L1-norm
ASO Low-Rank (Shared Subspace) L2-norm on independent comp.
RMTL Low-Rank (Trace Norm) Outlier (column-wise L1/Lq Norm)
42


• Clustered MTL

43


Multi-Task Learning with Clustered
Structures
• Most MTL techniques assume
all tasks are related
• Not true in many applications
• Clustered multi-task learning
assumes
 the tasks have a group Assumption:
Tasks have group structures

structure
 the models of tasks from the e.g. tasks in the yellow group are
predictions of heart related
same group are closer to each diseases and in the blue group are
other than those from a brain related diseases.
different group

44


Clustered Multi-Task Learning
Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS

• Use regularization to capture clustered structures.

Clustered Models

Training Data X ≈ ...
...

Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 2

Cluster 1

Training Data X ≈
Cluster k-1

Cluster k

45


Zhou et. al. 2011 NIPS

• Capture structures by minimizing sum-
m tasks
of-square error (SSE) in K-means
clustering:
𝑘 ...
2
min 𝑤 𝑣 − 𝑤𝑗
𝐼 2
𝑗=1 𝑣∈𝐼 𝑗
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Ij index set of jth cluster Clustered Models Cluster 2

Cluster 1

Cluster k-1

Cluster k

min tr 𝑊 𝑇 𝑊 − tr(𝐹 𝑇 𝑊 𝑇 𝑊𝐹)
𝐹 task number m > cluster number k

𝐹 : m×k orthogonal cluster indicator matrix
𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
46



• Directly minimizing SSE is hard m tasks

because of the non-linear
constraint on F:
...
𝐹
𝐹 : m×k orthogonal cluster indicator matrix Cluster 1 Cluster 2 Cluster k-1 Cluster k
𝐹𝑖,𝑗 = 1/ 𝑛 𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise Clustered Models Cluster 2

Cluster 1

Cluster k-1

Cluster k

𝑇
task number m > cluster number k

𝐹:𝐹 𝐹=𝐼 𝑘
Zha et. al. 2001 NIPS
47



• Clustered multi-task learning (CMTL) formulation
min
𝑇
Loss W + 𝛼 tr 𝑊 𝑇 𝑊 − tr 𝐹 𝑇 𝑊 𝑇 𝑊𝐹 + 𝛽 tr 𝑊 𝑇 𝑊
𝑊,𝐹:𝐹 𝐹=𝐼 𝑘
capture cluster structures Improves
Cluster 2
generalization
performance
Cluster 1

Cluster k-1

Cluster k

• CMTL has been shown to be equivalent to ASO
– Given the dimension of the shared low-rank subspace in
ASO and the cluster number in clustered multi-task
learning (CMTL) are the same.
48


Convex Clustered Multi-Task Learning

Ground Truth Mean Regularized MTL
noise introduced
by relaxations
Low rank can also
well capture
cluster structure

Trace Norm Regularized Convex Relaxed CMTL
MTL
49


• Clustered MTL

50


Multi-Task Learning with Tree Structures
• In some applications,
the tasks may be
equipped with a tree Assumption:
Tasks have tree structures

structure:
– The tasks belonging to
the same node are
similar to each other
– The similarity between
two nodes is related to a b c

the depth of the Task a is more similar with b,
Models

‘common’ tree node compared to c

51


Multi-Task Learning with Tree Structures
Kim and Xing 2010 ICML

• Tree-Guided Group Lasso
𝑑
𝑗
min Loss 𝑊 + 𝜆 𝑤𝑣 𝑊𝐺 𝑣
Structure 𝑊 2
𝑗=1 𝑣∈𝑉

W1 W2 W3
Gv5={W1, W2, W3}
Input Features

Gv4={W1, W2}

Gv1={W1} Gv2={W2} Gv3={W3}

Output (Tasks)
52


Multi-Task Learning with Graph Structures
• In real applications, tasks
involved in MTL may have
graph structures
– The two tasks are related if they
are connected in a graph, i.e. the
connected tasks are similar
– The similarity of two related
Assumption:
tasks can be represented by the Tasks have graph/network structures

weight of the connecting edge.

53


• A simple way to encode graph structure is to
penalize the difference of two tasks that have an
edge between them
• Given a set of edges E, we thus penalize:
𝐸
2
2 𝐸 ×𝑚
𝑊𝑒{𝑖,1} − 𝑊𝑒{𝑖,2} = 𝑊𝑅 𝑇 𝐹 𝑅∈ℝ
2
𝑖=1
• The graph regularization term can also be
represented in the form of Laplacian term
2
𝑊𝑅 𝑇 𝐹 = 𝑡𝑟 𝑊𝑅 𝑇 𝑇
𝑊𝑅 𝑇 = 𝑡𝑟 𝑊𝑅 𝑇 𝑅𝑊 𝑇 = 𝑡𝑟(𝑊ℒ𝑊 𝑇 )

54


• How to obtain graph information
– External domain knowledge
• protein-protein interaction (PPI) for microarray
– Discover task relatedness from data
• Pairwise correlation coefficient
• Sparse inverse covariance (Friedman et. al. 2008 Biostatistics)

55


Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics

• Graph-guided Fused Lasso
Gene
Input
SNP ACGTTTTACTGTACAATTTAC

Lasso
Output phenotype

Gene
Input
Graph-Guided
Fused Lasso
Output phenotype

min Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty
𝑊 𝐽

Ω 𝑊 = 𝛾 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙
56 𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1


Kim et. al. 2009 Bioinformatics

• In some applications, we know not only which pairs
are related, but also how they are related.
• Graph-Weighted Fused Lasso.
Gene
Input
Graph-Weighted
Fused Lasso
Output phenotype

𝐽

min Loss 𝑊 + 𝜆 𝑊 1 + 𝛾 𝜏(𝑟 𝑚𝑙 ) 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟 𝑚𝑙 𝑊𝑗𝑙
𝑊
𝑒= 𝑚,𝑙 ∈𝐸 𝑗=1

Added weight information!
57


Practical Guideline
• MTL versus STL
– MTL is preferred when dealing with multiple related tasks
with small number of training samples

• Shared features versus shared subspace
– Identifying shared features is preferred When the data
dimensionality is large
– Identifying a shared subspace is preferred when the
number of tasks is large

58


Tutorial Road Map
motivations

59


Case Study I: Incomplete Multi-Source
Data Fusion
Yuan et. al. 2012 NeuroImage
• In many applications, PET MRI CSF
P1 P2 P3 … P114 P115 P116 M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5

multiple data sources Subject1

...

...
...
may contain a Subject60

Subject61

considerable amount of Subject62

...

...
...

...
missing data. Subject139
Subject140
Subject141

• In ADNI, over half of the
...

...

...
Subject148

subjects lack CSF Subject149

...

...
measurements; an Subject245

independent half of the
subjects do not have
FDG-PET.
60


Overview of iMSF

PET MRI CSF

Task I PET MRI CSF

Model I
Task II Model II
Model III
Model IV
Task III

Task IV
𝑚 𝑑
1
min 𝐿𝑜𝑠𝑠 𝑋 𝑖 , 𝑌𝑖 , 𝑊𝑖 + 𝜆 𝑊𝐺 𝑘
𝑊 𝑚 2
𝑖=1 𝑘=1

61


iMSF: Performance
0.84

0.83
iMSF
Accuracy

0.82 Zero
EM
0.81
KNN
0.8 SVD

0.79
50.0% 66.7% 75.0%

0.5 1
0.45 0.9
0.4 0.8
0.35 iMSF 0.7 iMSF
Sensitivity

Specificity
0.3 Zero 0.6 Zero
0.25 0.5
EM EM
0.2 0.4
0.15 KNN 0.3 KNN
0.1 SVD 0.2 SVD
0.05 0.1
0 0
50.0% 66.7% 75.0% 50.0% 66.7% 75.0%

62


Case Study II: Drosophila Gene Expression
Image Analysis
• Drosophila (fruit fly) is a favorite model system for geneticists and
developmental biologists studying embryogenesis.
– The small size and short generation time make it ideal for genetic studies.
• In situ hybridization allows us to generate images showing when and where
individual genes were active.
– The analysis of such images can potentially reveal gene functions and
gene-gene interactions.



Drosophila gene expression pattern images
bgm

en

Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13-
Berkeley Drosophila Genome Project (BDGP)
Expressions http://www.fruitfly.org/

hb

runt

Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10-
Fly-FISH
http://fly-fish.ccbr.utoronto.ca/
[Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]


Comparative image analysis
Twist heartless stumps

stage 4-6

anterior endoderm AISN dorsal ectoderm AISN anterior endoderm AISN
trunk mesoderm AISN procephalic ectoderm AISN trunk mesoderm AISN
subset subset head mesoderm AISN
cellular blastoderm cellular blastoderm
mesoderm AISN mesoderm AISN

stage 7-8

trunk mesoderm PR trunk mesoderm PR yolk nuclei
head mesoderm PR head mesoderm PR trunk mesoderm PR
anterior endoderm anlage head mesoderm PR
anterior endoderm anlage

We need the spatial and temporal annotations of expressions
[Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]


Challenges of manual annotation
180000
160000
Cumulative number of in situ images
140000
Number of images

120000
100000
80000
60000
40000
20000
0
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010



Method outline
Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS

Feature extraction Model construction

Low-rank
multi-task
Images Sparse coding

Graph-based
multi-task



Low rank multi-task learning model
Ji et. al. 2009 BMC Bioinformatics

W
X Y

k n
Low rank

 L( wiT x j , Y ji )   W
i 1 j 1
*

trace norm = sum of singular values

Loss term


Graph-based multi-task learning model
Ji et. al. 2009 SIGKDD

sensory system
head
0.56
ventral sensory
complex
0.31
0.57
0.79
0.50 embryonic antennal
0.35
sense organ
dorsal/lateral
sensory complexes
0.60
0.36
embryonic maxillary
sensory complex
sensory nervous
system

loss 2-norm
k n

 L(w  g (C
2
x j , Y ji )  1 W  2 )  wp  sgn( C pq ) wq
T 2
i F pq
i 1 j 1 ( p , q )G

Closed-form solution



Spatial annotation performance
85
83
81
79
SC+LR
77
AUC

75 SC+Graph
73 BoW+Graph
71 Kernel+SVM
69
67
65
Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16

• 50% data for training and 50% for testing and 30 random trials are generated
• Multi-task approaches outperform single-task approaches



Tutorial Road Map
motivations

71


• A multi-task learning package
• Encode task relationship via structural regularization
• www.public.asu.edu/~jye02/Software/MALSAR/

72


MTL Algorithms in MALSAR 1.0
• Mean-Regularized Multi-Task Learning
• MTL with Embedded Feature Selection
– Joint Feature Learning
– Dirty Multi-Task Learning
– Robust Multi-Task Feature Learning
• MTL with Low-Rank Subspace Learning
– Trace Norm Regularized Learning
– Alternating Structure Optimization
– Incoherent Sparse and Low Rank Learning
– Robust Low-Rank Multi-Task Learning
• Clustered Multi-Task Learning
• Graph Regularized Multi-Task Learning
73


Tutorial Road Map
motivations

74


Trends in Multi-Task Learning
• Develop efficient algorithms for large-scale multi-
task learning.

• Semi-supervised and unsupervised MTL

• Learn task structures automatically in MTL

• Asymmetric MTL

• Cross-Domain MTL
– The features may be different The relationship is not mutual

75


Acknowledgement

• National Science Foundation
• National Institute of Health

76


Reference

77


Reference

78


Reference

79


Reference

80


Reference

81


Reference

82


Reference

83

Tutorial SIAM Data Mining 2012

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (14)

Tutorial SIAM Data Mining 2012