SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Facial Landmark Detection by
Deep Multi-task Learning
2015/7/2
Masahiro Suzuki
Contents
¤ Paper Information
¤ Introduction
¤ Related work
¤ Tasks-Constrained Deep Convolutional Network
¤ Experiment
¤ Conclusion
Paper Information
Title : Facial Landmark Detection by Deep Multi-task Learning
(2014)
Authors : Zhanpeng Zhang, Ping Luo, Chen Change Loy, and
Xiaoou Tang
¤ The Chinese University of Hong Kong / Multimedia Laboratory
Deep Learning (CNN) + Multitask Learning
¤ Motivation
¤ I’m studying lifelong learning (online multitask learning) by deep
learning
Facial landmark detection
¤ Facial landmark detection is a fundamental component in
many face analysis task
¤ facial attribute inference
¤ face verification
¤ face recognition
¤ remains a formidable challenge
¤ partial occlusion and large head pose variations
Approach
the authors thought that …
¤ facial landmark detection is not a standalone problem
¤ its estimation can be influencedby a number of heterogeneous
and subtly correlated factors
Main task
Auxiliary
task
Multitask learning
Contribution
They propose a Tasks-Constrained Deep Convolutional Network
(TCDCN)
¤ the first attempt to investigate how facial landmark detection
can be optimized together with heterogeneous but subtly
correlated tasks
¤ show that …
¤ the representations learned from related tasks facilitate the learning
of the main task
¤ tasks relatedness are captured implicitly by the proposed model
¤ the proposed approach outperforms the existing methods
¤ demonstrate the effectiveness of using five-landmark estimation
as robust initialization for improving a state-of-the-art face
alignment method
Contents
¤ Paper Information
¤ Introduction
¤ Related work
¤ Tasks-Constrained Deep Convolutional Network
¤ Experiment
¤ Conclusion
Facial landmark detection
regression-based method
¤ 画像パッチからSVRを使ってlandmarkを直接推定
¤ 多くの先⾏研究がランダム回帰フォレストを利⽤
¤ 最初にlandmarkを推定してから繰り返すので、初期値依存
template fitting method
¤ 顔のテンプレートを画像に当てはめる
¤ face detection, facial landmark detection, pose estimationを同時に
できる
5
● Regression-based method
● Template fitting method
● Cascaded CNN
顔特徴点検出の先行研究
Valstar, M., Martinez, B., Binefa, X., Pantic, M.:
Facial point detection using boosted regression
and graph models. In: CVPR. pp. 2729-2736 (2010)
Cootes, T.F., Edwards, G.J., Taylor, C.J.:
Active appearance models.
PAMI 23(6), 681-685 (2001)
Sun, Y., Wang, X., Tang, X.:
Deep convolutional network cascade
for facial point detection.
In: CVPR. pp. 3476-3483 (2013)
回帰で、点の位置を直接求める
位置や見た目のモデルをあてはめる
同じ研究室の手法
特徴点ごとに分割して段階的にCNNを適用.
CNN数が多い. 23 CNNs.
先行研究に対し,補助的なタスクを使うことと,
Raw-pixel入力のCNNで,Cascadeせずに
少ない処理時間で処理できることが特徴.
gression-based method
mplate fitting method
caded CNN
顔特徴点検出の先行研究
Martinez, B., Binefa, X., Pantic, M.:
detection using boosted regression
models. In: CVPR. pp. 2729-2736 (2010)
, Edwards, G.J., Taylor, C.J.:
arance models.
681-685 (2001)
回帰で、点の位置を直接求める
置や見た目のモデルをあてはめる
Landmark detection by CNN
cascaded CNN [Sun et al. 2013]
¤ 顔を予め幾つかのパーツに分けてそれぞれCNNでlandmarkを推定し、最
後に平均をとって出⼒
¤ 元論⽂を読むと、段階ごとにCNNを適⽤してるっぽい
¤ 本研究に最も近い研究(本著者と同じ研究室)
Figure 2: Three-level cascaded convolutional networks. The input is the face region returned by a face detector. The three
networks at level 1 are denoted as F1, EN1, and NM1. Networks at level 2 are denoted as LE21, LE22, RE21, RE22, N21,
N22, LM21, LM22, RM21, and RM22. Both LE21 and LE22 predict the left eye center, and so forth. Networks at level 3
Multi-task learning
¤ deep learningとmulti-task learningは相性がいい
¤ あるタスクで学習した特徴量を他の特徴量でも利⽤できる
¤ 通常のマルチタスク学習では、それぞれのタスクが同じ難易度・収束
率と考えている
¤ 今回の問題は各タスクが平等ではないのでそのままでは利⽤できない
本研究ではタスクごとに早期終了(early-stopping)を設定
([Caruana et al. 1997]がヒント)
blem Formulation
ional multi-task learning (MTL) seeks to improve the ge
ce of multiple related tasks by learning them jointly. Supp
T tasks and the training data for the t-th task are denote
{1, . . . , T}, i = {1, . . . , N}, with xt
i 2 Rd
and yt
i 2 R bein
label, respectively1
. The goal of the MTL is to minimize
argmin
{wt}T
t=1
TX
t=1
NX
i=1
`(yt
i, f(xt
i; wt
)) + (wt
),
; wt
) is a function of xt
and parameterized by a weight ve
on is denoted by `(·). A typical choice is the least square f
nge loss for classification. The (wt
) is the regularizatio
he complexity of weights.
Problem Formulation
¤ 従来のマルチタスク学習は、複数の関連するタスクを同時に学習する
ことで汎化性能を⾼める
訓練事例集合
タスク
損失関数 正則化項
ラベル 素性 重み
Proposed Formulation
¤ 本研究のマルチタスク学習
特徴
¤ 異なる2つの誤差関数を同時に最適化できる(回帰とクラス分類でも可能)
¤ 素性xがタスク依存でなく共通
loss function is denoted by `(·). A typical choice is the least square for regression
and the hinge loss for classification. The (wt
) is the regularization term that
penalizes the complexity of weights.
In contrast to conventional MTL that maximizes the performance of all tasks
our aim is to optimize the main task r, which is facial landmark detection, with
the assistances of arbitrary number of related/auxiliary tasks a 2 A. Examples
or related tasks include facial pose estimation and attribute inference. To this
end, our problem can be formulated as
argmin
Wr,{Wa}a2A
NX
i=1
`r
(yr
i , f(xi; Wr
)) +
NX
i=1
X
a2A
a
`a
(ya
i , f(xi; Wa
)), (2)
1
In this paper, scalar, vector, and matrix are denoted by lowercase, bold lowercase
and bold capital letter, respectively.
メインタスク 補助タスク
a番⽬の補助タスクの重要度
Proposed Formulation
¤ メインタスクが回帰問題、補助タスクがクラス分類なので、誤差関数
はそれぞれ2乗誤差、クロスエントロピー誤差となる
¤ 共有する画像の特徴量をDeep CNで学習
これら2つの式を合わせて学習する
メインタスク
補助タスク
can be combined, while existing methods [30] that employ Eq.(1) assume implic-
itly that the loss functions across all tasks are identical. Second, Eq.(1) allows
data xt
i in di↵erent tasks to have di↵erent input representations, while Eq.(2)
focuses on a shared input representation xi. The latter is more suitable for our
problem, since all tasks share similar facial representation.
In the following, we formulate our facial landmark detection model based on
Eq.(2). Suppose we have a set of feature vectors in a shared feature space across
tasks {xi}N
i=1 and their corresponding labels {yr
i , yp
i , yg
i , yw
i , ys
i }N
i=1, where yr
i is
the target of landmark detection and the remaining are the targets of auxiliary
tasks, including inferences of ‘pose’, ‘gender’, ‘wear glasses’, and ‘smiling’. More
specifically, yr
i 2 R10
is the 2D coordinates of the five landmarks (centers of the
eyes, nose, corners of the mouth), yp
i 2 {0, 1, .., 4} indicates five di↵erent poses
(0 , ±30 , ±60 ), and yg
i , yw
i , ys
i 2 {0, 1} are binary attributes. It is reasonable
to employ the least square and cross-entropy as the loss functions for the main
task (regression) and the auxiliary tasks (classification), respectively. Therefore,
the objective function can be rewritten as
argmin
Wr,{Wa}
1
2
NX
i=1
kyr
i f(xi; Wr
)k2
NX
i=1
X
a2A
a
ya
i log(p(ya
i |xi; Wa
))+
TX
t=1
kWk2
2,
(3)
where f(xi; Wr
) = (Wr
)
T
xi in the first term is a linear function. The second
term is a softmax function p(yi = m|xi) =
exp{(Wa
m)T
xi}
P
j exp{(Wa
j )T
xi}
, which models the
class posterior probability (Wa
j denotes the jth column of the matrix), and
the third term penalizes large weights (W = {Wr
, {Wa
}}). In this work, we
adopt the deep convolutional network (DCN) to jointly learn the share feature
space x, since the unique structure of DCN allows for multitask and shared
representation.
tasks, including inferences of ‘pose’, ‘gender’, ‘wear glasses’, and ‘smiling’. More
specifically, yr
i 2 R10
is the 2D coordinates of the five landmarks (centers of the
eyes, nose, corners of the mouth), yp
i 2 {0, 1, .., 4} indicates five di↵erent poses
(0 , ±30 , ±60 ), and yg
i , yw
i , ys
i 2 {0, 1} are binary attributes. It is reasonable
to employ the least square and cross-entropy as the loss functions for the main
task (regression) and the auxiliary tasks (classification), respectively. Therefore,
the objective function can be rewritten as
argmin
Wr,{Wa}
1
2
NX
i=1
kyr
i f(xi; Wr
)k2
NX
i=1
X
a2A
a
ya
i log(p(ya
i |xi; Wa
))+
TX
t=1
kWk2
2,
(3)
where f(xi; Wr
) = (Wr
)
T
xi in the first term is a linear function. The second
term is a softmax function p(yi = m|xi) =
exp{(Wa
m)T
xi}
P
j exp{(Wa
j )T
xi}
, which models the
class posterior probability (Wa
j denotes the jth column of the matrix), and
the third term penalizes large weights (W = {Wr
, {Wa
}}). In this work, we
adopt the deep convolutional network (DCN) to jointly learn the share feature
space x, since the unique structure of DCN allows for multitask and shared
representation.
In particular, given a face image x0
, the DCN projects it to higher level
representation gradually by learning a sequence of non-linear mappings
x0 ((Ws1 )T
x0
)
! x1 ((Ws2 )T
x1
)
! ...
((Wsl )T
xl 1
)
! xl
. (4)
Here, (·) and Wsl
indicate the non-linear activation function and the filters
needed to be learned in the layer l of DCN. For instance, xl
=
⇣
(Wsl
)
T
xl 1
⌘
.
Note that xl
is the shared representation between the main task r, and related
Tasks-Constrained Deep Convolutional Network
全体構造
DCN部分
• モデルは各タスクで共通
マルチタスク部分
Task-wise early stopping
¤ マルチタスクの場合、異なるタスクで難易度や収束率が異なる
¤ メインタスクよりも補助タスクの⽅が簡単そう→早く収束しそう
¤ 補助タスクが先に最適解に到達してるのにマルチタスク学習を続けると、
過学習となってしまい、メインタスクに悪影響を与えることになる
→タスクによって学習をhaltするtask-wise early stopping
¤ ⾃動的にタスクを停⽌する基準
Facial Landmark Detection by Deep Multi-task Learning 7
of the training process, the TCDCN is constrained by all tasks to avoid being
trapped at a bad local minima. As training proceeds, certain auxiliary tasks are
no longer beneficial to the main task after they reach their peak performance
their learning process thus should be halted. Note that the regularization o↵ered
by early stopping is di↵erent from weight regularization in Eq.(3). The latte
globally helps to prevent over-fitting in each task through penalizing certain
parameter configurations. In Section 4.2, we show that task-wise early stopping
is critical for multi-task learning convergence even with weight regularization.
Now we introduce a criterion to automatically determine when to stop learn
ing an auxiliary task. Let Ea
val and Ea
tr be the values of the loss function of task
a on the validation set and training set, respectively. We stop the task if its
measure exceeds a threshold ✏ as below
k · medt
j=t kEa
tr(j)
Pt
j=t k Ea
tr(j) k · medt
j=t kEa
tr(j)
·
Ea
val(t) minj=1..t Ea
tr(j)
a · minj=1..t Ea
tr(j)
> ✏, (5
where t denotes the current iteration and k controls a training strip of length
k. The ‘med’ denotes the function for calculating median value. The first ter
m in Eq.(5) represents the tendency of the training error. If the training erro
drops rapidly within a period of length k, the value of the first term is small
indicating that training can be continued as the task is still valuable; otherwise
閾値
訓練誤差の傾向
• 訓練データの⼀部kにおいて訓練誤差
が急激に落ちると値は⼩さくなる
→⽌まらない
汎化誤差
• 訓練誤差に対する汎化誤差
• 汎化誤差と訓練誤差の差が⼤
きくなる→⽌まる
Learning procedure
¤ 最急降下法で求める
is the importance coe cient of a-th task’s er
gradient descent. Its magnitude reveals that m
longer impact. This strategy achieves satisfac
volution network given multiple tasks. Its sup
in Section 4.2.
Learning procedure: We have discussed w
iliary task during training before it over-fit
stochastic gradient descent to update the w
the network. For example, the weight matri
Wr
= ⌘ @Er
@Wr with ⌘ being the learning ra
tion), and @Er
@Wr = (yr
i (Wr
)
T
xi)xT
i . Also, th
weights can be calculated in a similar manne
For the filters in the lower layer, we compute
loss error back following the back-propagatio
"1
(Ws2 )T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @ (u2
@u2
where "l
is the error at the shared represent
(Wr
)T
xi] +
P
a2A(p(ya
i |xi; Wa
) ya
i )Wa
, w
derivatives. The errors of the lower layers a
instance, "l 1
= (Wsl
)T
"l @ (ul 1
)
@ul 1 , where @ (
@u
function. Then, the gradient of the filter is o
⌦ represents the receptive field of the filter.
ing an auxiliary task. Let Ea
val and Ea
tr be the values of the loss function of task
a on the validation set and training set, respectively. We stop the task if its
measure exceeds a threshold ✏ as below
k · medt
j=t kEa
tr(j)
Pt
j=t k Ea
tr(j) k · medt
j=t kEa
tr(j)
·
Ea
val(t) minj=1..t Ea
tr(j)
a · minj=1..t Ea
tr(j)
> ✏, (5)
where t denotes the current iteration and k controls a training strip of length
k. The ‘med’ denotes the function for calculating median value. The first ter-
m in Eq.(5) represents the tendency of the training error. If the training error
drops rapidly within a period of length k, the value of the first term is small,
indicating that training can be continued as the task is still valuable; otherwise,
the first term is large, then the task is more likely to be stopped. The second
term measures the generalization error compared to the training error. The a
is the importance coe cient of a-th task’s error, which can be learned through
gradient descent. Its magnitude reveals that more important task tends to have
longer impact. This strategy achieves satisfactory results for learning deep con-
volution network given multiple tasks. Its superior performance is demonstrated
in Section 4.2.
Learning procedure: We have discussed when and how to switch o↵ an aux-
iliary task during training before it over-fits. For each iteration, we perform
stochastic gradient descent to update the weights of the tasks and filters of
the network. For example, the weight matrix of the main task is updated by
Wr
= ⌘ @Er
@Wr with ⌘ being the learning rate (⌘ = 0.003 in our implementa-
tion), and @Er
@Wr = (yr
i (Wr
)
T
xi)xT
i . Also, the derivative of the auxiliary task’s
weights can be calculated in a similar manner as @Ea
@Wa = (p(ya
i |xi; Wa
) ya
i )xi.
For the filters in the lower layer, we compute the gradients by propagating the
loss error back following the back-propagation strategy as
"1
(Ws2 )T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @ (u2)
@u2
...
(Wsl )T
"l @ (ul 1)
@ul 1
"l
, (6)
where "l
is the error at the shared representation layer and "l
= (Wr
)T
[yr
i
(Wr
)T
xi] +
P
a2A(p(ya
i |xi; Wa
) ya
i )Wa
, which is the integration of all tasks’
derivatives. The errors of the lower layers are computed following Eq.(6). For
instance, "l 1
= (Wsl
)T
"l @ (ul 1
)
@ul 1 , where @ (u)
@u is the gradient of the activation
function. Then, the gradient of the filter is obtained by @E
@Wsl
= "l
xl 1
⌦ , where
⌦ represents the receptive field of the filter.
メインタスク
補助タスク
ts magnitude reveals that more important task tends to have
is strategy achieves satisfactory results for learning deep con-
given multiple tasks. Its superior performance is demonstrated
dure: We have discussed when and how to switch o↵ an aux-
training before it over-fits. For each iteration, we perform
t descent to update the weights of the tasks and filters of
example, the weight matrix of the main task is updated by
with ⌘ being the learning rate (⌘ = 0.003 in our implementa-
(yr
i (Wr
)
T
xi)xT
i . Also, the derivative of the auxiliary task’s
culated in a similar manner as @Ea
@Wa = (p(ya
i |xi; Wa
) ya
i )xi.
he lower layer, we compute the gradients by propagating the
owing the back-propagation strategy as
)T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @ (u2)
@u2
...
(Wsl )T
"l @ (ul 1)
@ul 1
"l
, (6)
ror at the shared representation layer and "l
= (Wr
)T
[yr
i
(p(ya
i |xi; Wa
) ya
i )Wa
, which is the integration of all tasks’
rrors of the lower layers are computed following Eq.(6). For
Wsl
)T
"l @ (ul 1
)
@ul 1 , where @ (u)
@u is the gradient of the activation
he gradient of the filter is obtained by @E
@Wsl
= "l
xl 1
⌦ , where
volution network
in Section 4.2.
Learning proce
iliary task during
stochastic gradien
the network. For
Wr
= ⌘ @Er
@Wr
tion), and @Er
@Wr =
weights can be ca
For the filters in
loss error back fo
"1
(Ws
where "l
is the er
(Wr
)T
xi] +
P
a2A
derivatives. The
instance, "l 1
= (
function. Then, t
⌦ represents the
gradient descent. Its magnitude reveals tha
longer impact. This strategy achieves satis
volution network given multiple tasks. Its s
in Section 4.2.
Learning procedure: We have discussed
iliary task during training before it over-
stochastic gradient descent to update the
the network. For example, the weight ma
Wr
= ⌘ @Er
@Wr with ⌘ being the learning
tion), and @Er
@Wr = (yr
i (Wr
)
T
xi)xT
i . Also,
weights can be calculated in a similar man
For the filters in the lower layer, we compu
loss error back following the back-propagat
"1
(Ws2 )T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @
@
where "l
is the error at the shared represe
(Wr
)T
xi] +
P
a2A(p(ya
i |xi; Wa
) ya
i )Wa
,
derivatives. The errors of the lower layers
instance, "l 1
= (Wsl
)T
"l @ (ul 1
)
@ul 1 , where @
バックプロパゲーション
reveals that more important task tends to have
ieves satisfactory results for learning deep con-
tasks. Its superior performance is demonstrated
discussed when and how to switch o↵ an aux-
re it over-fits. For each iteration, we perform
update the weights of the tasks and filters of
weight matrix of the main task is updated by
he learning rate (⌘ = 0.003 in our implementa-
i)xT
i . Also, the derivative of the auxiliary task’s
milar manner as @Ea
@Wa = (p(ya
i |xi; Wa
) ya
i )xi.
we compute the gradients by propagating the
k-propagation strategy as
(Ws3 )T
"3 @ (u2)
@u2
...
(Wsl )T
"l @ (ul 1)
@ul 1
"l
, (6)
red representation layer and "l
= (Wr
)T
[yr
i
ya
i )Wa
, which is the integration of all tasks’
wer layers are computed following Eq.(6). For
)
, where @ (u)
@u is the gradient of the activation
Experiments
¤ Network Structure
¤ Model training
¤ 学習するデータセット:10,000 outdoor face images from the web
¤ 移動とか回転、ズームはあまり気にしないで収集
¤ テストデータ:AFLWとAFL
¤ Evaluation metrics
¤ 平均エラー率
¤ 正解と推定したlandmarkの距離を計算し、⽬の間隔で正規化
¤ 誤り率
¤ 10%を越えると誤りと判断
the Effectiveness of Learning with Related Task
¤ AFLWで評価
¤ 左が各landmarkのエラー率、右が全部のlandmarkの失敗率
¤ 補助タスクによって確かにエラー率も失敗率も下がっている
¤ 全部の補助タスクを利⽤すると、失敗率を10%も改善できる
¤ poseが⼀番効いてるっぽい
Facial Landmark Detection by Deep Multi-task Learning 9
6
8
10
12
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
FLD FLD+gender FLD+glasses FLD+smile FLD+pose FLD+all
35.62
31.86
32.87 32.37
28.76
25.00
20
25
30
35
40
failurerate(%)
Fig. 4. Comparison of di↵erent model variants of TCDCN: the mean error over di↵erent
landmarks, and the overall failure rate.
4.1 Evaluating the E↵ectiveness of Learning with Related Task
To examine the influence of related tasks, we evaluate five variants of the pro-
posed model. In particular, the first variant is trained only on facial landmark
detection. We train another four model variants on facial landmark detection
along with the auxiliary task of recognizing ‘pose’, ‘gender’, ‘wearing glasses’,
FLD vs. FLD + smile
smileがどのlandmarkで効果的かを検証
(a):⿐や⼝で効果がある
¤ smileは顔の下半分に該当するから
(b):最終層の重みのピアソンの相関係数
¤ ⼝と強い相関
10 Z. Zhang, P. Luo, C. C. Loy, and X. Tang
8
8.5
9
9.5
10
10.5
11
11.5
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
FLD FLD+smile
0.11
0.32
0.17
0.22
0.40
left eye
right eye
nose
left mouth
corner
right mouth
corner
correlation
Landmarkdetectionweights
(a) (b) Learned weights’ correlation with the
weights of‘smiling’task
Fig. 5. FLD vs. FLD+smile. The smiling attribute helps detection more on the nose
and corners of mouth, than the centers of eyes, since ‘smiling’ mainly a↵ects the lower
part of a face.
FLD vs. FLD + pose
ポーズの効果を検証
(a):どのポーズでもエラー率は下がっている
(b):正解の改善率で⾒ても、どのポーズでもよくなっている
Fig. 5. FLD vs. FLD+smile. The smiling attribute helps detection more on the nose
and corners of mouth, than the centers of eyes, since ‘smiling’ mainly a↵ects the lower
part of a face.
0
0.5
1
1.5
2
2.5
3
left
profile
left frontal right right
profle
accuracyimprovement(%)
(a)
5
10
15
20
left
profile
left frontal right right
profle
meanerror(%)
FLD FLD+pose
(b)
Fig. 6. FLD vs. FLD+pose. (a) Mean error in di↵erent poses, and (b) Accuracy im-
provement by the FLD+pose in di↵erent poses.
weight vectors, which are learned to predict the positions of the mouth’s corners
have high correlation with the weights of ‘smiling’ inference. This demonstrates
that TCDCN implicitly learns relationship between tasks.
FLD vs. FLD+pose: As observed in Figure 6(a), detection errors of FLD
The Benefits of Task-wise Early Stopping
(a):task-wise early stoppingでかなりエラーが落ちている
(b):訓練誤差・汎化誤差がearly stoppingで⼩さくなっている
Facial Landmark Detection by Deep Multi-
stop ‘glasses’
stop ‘gender’
stop ‘smile’
stop ‘pose’6
8
10
12
14
16
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
FLD+all
FLD+all with task-wise early-stopping
Fig. 7. (a) Task-wise early stopping leads to substantially lower
di↵erent landmarks. (b) Its benefit is also reflected on the trainin
convergence rate. The error is measured in L2-norm with respec
of the 10 coordinates values (normalized to [0,1]) for the 5 landm
4.3 Comparison with the Cascaded CNN [21]
Although both the TCDCN and the cascaded CNN [21] a
we show that the proposed model can achieve better detect
significantly lower computational cost. We use the full mo
the publicly available binary code of the cascaded CNN in t
Landmark localization accuracy: Similar to Section 4.1
Comparison with the Cascaded CNN
¤ 訓練データを同じにしてAFLWでテスト
¤ 異なる点は、マルチタスク学習を利⽤しているかどうかという点
¤ 4つのlandmarkでcascaded CNNを上回る
¤ 全体的にはcascaded CNNに勝っている
that we use the same 10,000 training faces as in the cascaded CNN method.
Thus the only di↵erence is that we exploit a multi-task learning approach. It
is observed from Figure 8 that our method performs better in four out of five
landmarks, and the overall accuracy is superior to that of cascaded CNN.
(a) (b)
7
8
9
10
11
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
cascaded CNN Ours
10
20
30
40
50
left eye right eye nose left mouth
corner
right mouth
corner
failurerate(%)
Fig. 8. The proposed TCDCN vs. cascaded CNN [21]: (a) mean error over di↵erent
landmarks and (b) the overall failure rate.
Computational e ciency: Suppose the computation time of a 2D-convolution
operation is ⌧, the total time cost for a CNN with L layers can be approximated
by
PL
l=1 s2
l qlql 1⌧, where s2
is the 2D size of the input feature map for l-th
layer, and q is the number of filters. The algorithm complexity of a CNN is thus
O(s2
q2
), directly related to the input image size and number of filters. Note that
Comparison with other State-of-the-art Methods
¤ AFLWでの結果
¤ 他の既存研究の結果を全て上回っている
¤ AFWでの結果
¤ AFLWと同様
12 Z. Zhang, P. Luo, C. C. Loy, and X. Tang
5
10
15
20
25
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
TSPM ESR CDM Luxand RCPR SDM Ours
15.9
13.0 13.1
12.4
11.6
8.5 8.0
5
10
15
20
meanerror(%)
5
10
15
20
25
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
14.3
12.2
11.1
10.4
9.3
8.8 8.2
5
10
15
20
meanerror(%)
AFLWAFW
Fig. 9. Comparison with RCPR [3], TSPM [32], CDM [27], Luxand [18], and SDM [25]
on AFLW [11] (the first row) and AFW [32] (the second row) datasets. The left sub-
figures show the mean errors on di↵erent landmarks, while the right subfigures show
the overall errors.
multiple CNNs in di↵erent cascaded layers (23 CNNs in its implementation).
Hence, TCDCN has much lower computational cost. The cascaded CNN requires
Comparison with other State-of-the-art Methods
⾊々な画像の結果
¤ 1⾏⽬:メガネかけてる
¤ 2⾏⽬:ポーズのバリエーション
¤ 3⾏⽬:
¤ 1,2列⽬:光の当たり⽅が違う
¤ 3列⽬:画像の質が悪い
¤ 4,5列⽬:異なる表情
¤ 6~8列⽬:間違った例(⾚が間違った部分)
Facial Landmark Detection by Deep Multi-task Learning 13
0’ NS NG F30’ NS G F
60’ NS NG F 30’ S NG F-30’ NS NG F
0’ NS G M
60’ NS NG F
-30’ NS G M-30’ S G M
-30’ S NG F
0’ NS NG F 0’ S NG F
-30’ NS NG M60’ S NG M60’ NS NG M
0’ NS NG M0’ S NG F
0’ S NG M
0’ NS NG M0’ NS NG M 0’ NS NG M -30’ NS NG F 0’ S NG F0’ NS NG F
Fig. 10. Example detections by the proposed model on AFLW [11] and AFW [32]
images. The labels below each image denote the tagging results for the related tasks:
(0 , ±30 , ±60 ) for pose; S/NS = smiling/not-smiling; G/NG = with-glasses/without-
glasses; M/F = male/female. Red rectangles indicate wrong tagging.
4.5 TCDCN for Robust Initialization
This section shows that the TCDCN can be used to generate a good initialization
to improve the state-of-the-art method, owing to its accuracy and e ciency. We
take RCPR [3] as an example. Instead of drawing training samples randomly as
initialization as did in [3], we initialize RCPR by first applying TCDCN on the
TCDCN for Robust Initialization
¤ TCDCNはよい初期化を得る⼿法としても利⽤できる
¤ 既存研究であるRCPRについて、TCDCNでの初期化をしたものとしな
かったもので⽐較
(a):相対的な改善(改善後のエラー/元のエラー)
(b):改善の可視化(上が普通のRCPR、下がTCDCNで初期化したRCPR)
14 Z. Zhang, P. Luo, C. C. Loy, and X. Tang
1
23 4
5
6
7
8
9 10
11 12
13
14
15
16
17 18
19 20
21
22
23 24
25
26
27
28
29 0
5
10
15
20
1 6 11 16 21 26
relativeimprovment(%)
landmarks
(a) (b)
Fig. 11. Initialization with our five-landmark estimation for RCPR [3] on
COFW dataset [3]. (a) shows the relative improvement on each landmark
(relative improvement = reduced error
original error
). (b) visualizes the improvement. The upper row
depicts the results of RCPR [3], while the lower row shows the improved results by our
initialization.
heterogeneous but subtly correlated tasks, such as appearance attribute, expres-
sion, demographic, and head pose. The proposed Tasks-Constrained DCN allows
errors of related tasks to be back-propagated in deep hidden layers for construct-
ing a shared representation to be relevant to the main task. We have shown that
Conclusion
¤ ヘテロだが相互に関連のあるタスクを同時に学習することで、よりロ
バストなlandmark detectionができることを⽰した
¤ TCDCNは関連したタスクのエラーをバックプロパゲーションによっ
て、共通した表現を学習できる
¤ task-wise early stoppingがモデルの収束を確実にするために重要
¤ マルチタスク学習によって顔の状態にかなりロバストなモデルを作成
できた
¤ Future work:より密なlandmark detectionへの適⽤、他の画像認識
問題へのディープマルチタスク学習の適⽤
感想
¤ CNNのマルチタスク学習の⽅法がわかってよかった
¤ この⽅法は知らなかった(新しい?)
¤ CNN+線形識別器だったが、CNN+CNNでも良さそう
¤ 個⼈的には、論⽂の書き⽅が参考になりそうだった
¤ 新しい⼿法が多くて⾊々実験している点が今書いている論⽂とにている

Contenu connexe

Tendances

ピンホールカメラモデル
ピンホールカメラモデルピンホールカメラモデル
ピンホールカメラモデルShohei Mori
 
backbone としての timm 入門
backbone としての timm 入門backbone としての timm 入門
backbone としての timm 入門Takuji Tahara
 
局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出MPRG_Chubu_University
 
画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -
画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -
画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -MPRG_Chubu_University
 
Introduction to YOLO detection model
Introduction to YOLO detection modelIntroduction to YOLO detection model
Introduction to YOLO detection modelWEBFARMER. ltd.
 
SSII2018TS: 3D物体検出とロボットビジョンへの応用
SSII2018TS: 3D物体検出とロボットビジョンへの応用SSII2018TS: 3D物体検出とロボットビジョンへの応用
SSII2018TS: 3D物体検出とロボットビジョンへの応用SSII
 
確率モデルを用いた3D点群レジストレーション
確率モデルを用いた3D点群レジストレーション確率モデルを用いた3D点群レジストレーション
確率モデルを用いた3D点群レジストレーションKenta Tanaka
 
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual VideosDeep Learning JP
 
Fisher Vectorによる画像認識
Fisher Vectorによる画像認識Fisher Vectorによる画像認識
Fisher Vectorによる画像認識Takao Yamanaka
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
 
SSII2020SS: 微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​
SSII2020SS:  微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​SSII2020SS:  微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​
SSII2020SS: 微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​SSII
 
機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計Takahiro Kubo
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
SSII2014 チュートリアル資料
SSII2014 チュートリアル資料SSII2014 チュートリアル資料
SSII2014 チュートリアル資料Masayuki Tanaka
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
一般化線形モデル
一般化線形モデル一般化線形モデル
一般化線形モデルMatsuiRyo
 
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)Yusuke Uchida
 

Tendances (20)

ResNetの仕組み
ResNetの仕組みResNetの仕組み
ResNetの仕組み
 
ピンホールカメラモデル
ピンホールカメラモデルピンホールカメラモデル
ピンホールカメラモデル
 
backbone としての timm 入門
backbone としての timm 入門backbone としての timm 入門
backbone としての timm 入門
 
局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出
 
画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -
画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -
画像局所特徴量と特定物体認識 - SIFTと最近のアプローチ -
 
Introduction to YOLO detection model
Introduction to YOLO detection modelIntroduction to YOLO detection model
Introduction to YOLO detection model
 
SSII2018TS: 3D物体検出とロボットビジョンへの応用
SSII2018TS: 3D物体検出とロボットビジョンへの応用SSII2018TS: 3D物体検出とロボットビジョンへの応用
SSII2018TS: 3D物体検出とロボットビジョンへの応用
 
確率モデルを用いた3D点群レジストレーション
確率モデルを用いた3D点群レジストレーション確率モデルを用いた3D点群レジストレーション
確率モデルを用いた3D点群レジストレーション
 
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
 
Structure from Motion
Structure from MotionStructure from Motion
Structure from Motion
 
Fisher Vectorによる画像認識
Fisher Vectorによる画像認識Fisher Vectorによる画像認識
Fisher Vectorによる画像認識
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
SSII2020SS: 微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​
SSII2020SS:  微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​SSII2020SS:  微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​
SSII2020SS: 微分可能レンダリングの最新動向 〜「見比べる」ことによる3次元理解 〜​
 
機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計機械学習で泣かないためのコード設計
機械学習で泣かないためのコード設計
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
SSII2014 チュートリアル資料
SSII2014 チュートリアル資料SSII2014 チュートリアル資料
SSII2014 チュートリアル資料
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
Cvpr 2019 pvnet
Cvpr 2019 pvnetCvpr 2019 pvnet
Cvpr 2019 pvnet
 
一般化線形モデル
一般化線形モデル一般化線形モデル
一般化線形モデル
 
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)
 

En vedette

(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"Yukiyoshi Sasao
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target PropagationMasahiro Suzuki
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural NetworkMasahiro Suzuki
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization TrickMasahiro Suzuki
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...Masahiro Suzuki
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel LearningMasahiro Suzuki
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?Masahiro Suzuki
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman FiltersMasahiro Suzuki
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsTakuya Akiba
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...Yen Ho
 
What is pattern recognition field?
What is pattern recognition field?What is pattern recognition field?
What is pattern recognition field?Randa Elanwar
 
Synops emotion recognize
Synops emotion recognizeSynops emotion recognize
Synops emotion recognizeAvdhesh Gupta
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 

En vedette (20)

(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Local Gray Code Pattern (LGCP): A Robust Feature Descriptor for Facial Expres...
Local Gray Code Pattern (LGCP): A Robust Feature Descriptor for Facial Expres...Local Gray Code Pattern (LGCP): A Robust Feature Descriptor for Facial Expres...
Local Gray Code Pattern (LGCP): A Robust Feature Descriptor for Facial Expres...
 
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...
 
What is pattern recognition field?
What is pattern recognition field?What is pattern recognition field?
What is pattern recognition field?
 
Synops emotion recognize
Synops emotion recognizeSynops emotion recognize
Synops emotion recognize
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 

Similaire à (研究会輪読) Facial Landmark Detection by Deep Multi-task Learning

Convolution Neural Networks
Convolution Neural NetworksConvolution Neural Networks
Convolution Neural NetworksAhmedMahany
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464IJRAT
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksAlina Leidinger
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureIJRES Journal
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
20070823
2007082320070823
20070823neostar
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral ClusteringAustin Benson
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDARaghu Palakodety
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Mumbai Academisc
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...Anirbit Mukherjee
 
An efficient approach to wavelet image Denoising
An efficient approach to wavelet image DenoisingAn efficient approach to wavelet image Denoising
An efficient approach to wavelet image Denoisingijcsit
 
Image Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet TreeImage Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet TreeIDES Editor
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMScsandit
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMScsandit
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transformijsrd.com
 

Similaire à (研究会輪読) Facial Landmark Detection by Deep Multi-task Learning (20)

Convolution Neural Networks
Convolution Neural NetworksConvolution Neural Networks
Convolution Neural Networks
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural Networks
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
20070823
2007082320070823
20070823
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDA
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
 
An efficient approach to wavelet image Denoising
An efficient approach to wavelet image DenoisingAn efficient approach to wavelet image Denoising
An efficient approach to wavelet image Denoising
 
Image Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet TreeImage Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet Tree
 
1 d analysis
1 d analysis1 d analysis
1 d analysis
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transform
 

Plus de Masahiro Suzuki

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)Masahiro Suzuki
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択Masahiro Suzuki
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについてMasahiro Suzuki
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデルMasahiro Suzuki
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究についてMasahiro Suzuki
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習Masahiro Suzuki
 

Plus de Masahiro Suzuki (7)

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
 

Dernier

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning

  • 1. Facial Landmark Detection by Deep Multi-task Learning 2015/7/2 Masahiro Suzuki
  • 2. Contents ¤ Paper Information ¤ Introduction ¤ Related work ¤ Tasks-Constrained Deep Convolutional Network ¤ Experiment ¤ Conclusion
  • 3. Paper Information Title : Facial Landmark Detection by Deep Multi-task Learning (2014) Authors : Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang ¤ The Chinese University of Hong Kong / Multimedia Laboratory Deep Learning (CNN) + Multitask Learning ¤ Motivation ¤ I’m studying lifelong learning (online multitask learning) by deep learning
  • 4. Facial landmark detection ¤ Facial landmark detection is a fundamental component in many face analysis task ¤ facial attribute inference ¤ face verification ¤ face recognition ¤ remains a formidable challenge ¤ partial occlusion and large head pose variations
  • 5. Approach the authors thought that … ¤ facial landmark detection is not a standalone problem ¤ its estimation can be influencedby a number of heterogeneous and subtly correlated factors Main task Auxiliary task Multitask learning
  • 6. Contribution They propose a Tasks-Constrained Deep Convolutional Network (TCDCN) ¤ the first attempt to investigate how facial landmark detection can be optimized together with heterogeneous but subtly correlated tasks ¤ show that … ¤ the representations learned from related tasks facilitate the learning of the main task ¤ tasks relatedness are captured implicitly by the proposed model ¤ the proposed approach outperforms the existing methods ¤ demonstrate the effectiveness of using five-landmark estimation as robust initialization for improving a state-of-the-art face alignment method
  • 7. Contents ¤ Paper Information ¤ Introduction ¤ Related work ¤ Tasks-Constrained Deep Convolutional Network ¤ Experiment ¤ Conclusion
  • 8. Facial landmark detection regression-based method ¤ 画像パッチからSVRを使ってlandmarkを直接推定 ¤ 多くの先⾏研究がランダム回帰フォレストを利⽤ ¤ 最初にlandmarkを推定してから繰り返すので、初期値依存 template fitting method ¤ 顔のテンプレートを画像に当てはめる ¤ face detection, facial landmark detection, pose estimationを同時に できる 5 ● Regression-based method ● Template fitting method ● Cascaded CNN 顔特徴点検出の先行研究 Valstar, M., Martinez, B., Binefa, X., Pantic, M.: Facial point detection using boosted regression and graph models. In: CVPR. pp. 2729-2736 (2010) Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. PAMI 23(6), 681-685 (2001) Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: CVPR. pp. 3476-3483 (2013) 回帰で、点の位置を直接求める 位置や見た目のモデルをあてはめる 同じ研究室の手法 特徴点ごとに分割して段階的にCNNを適用. CNN数が多い. 23 CNNs. 先行研究に対し,補助的なタスクを使うことと, Raw-pixel入力のCNNで,Cascadeせずに 少ない処理時間で処理できることが特徴. gression-based method mplate fitting method caded CNN 顔特徴点検出の先行研究 Martinez, B., Binefa, X., Pantic, M.: detection using boosted regression models. In: CVPR. pp. 2729-2736 (2010) , Edwards, G.J., Taylor, C.J.: arance models. 681-685 (2001) 回帰で、点の位置を直接求める 置や見た目のモデルをあてはめる
  • 9. Landmark detection by CNN cascaded CNN [Sun et al. 2013] ¤ 顔を予め幾つかのパーツに分けてそれぞれCNNでlandmarkを推定し、最 後に平均をとって出⼒ ¤ 元論⽂を読むと、段階ごとにCNNを適⽤してるっぽい ¤ 本研究に最も近い研究(本著者と同じ研究室) Figure 2: Three-level cascaded convolutional networks. The input is the face region returned by a face detector. The three networks at level 1 are denoted as F1, EN1, and NM1. Networks at level 2 are denoted as LE21, LE22, RE21, RE22, N21, N22, LM21, LM22, RM21, and RM22. Both LE21 and LE22 predict the left eye center, and so forth. Networks at level 3
  • 10. Multi-task learning ¤ deep learningとmulti-task learningは相性がいい ¤ あるタスクで学習した特徴量を他の特徴量でも利⽤できる ¤ 通常のマルチタスク学習では、それぞれのタスクが同じ難易度・収束 率と考えている ¤ 今回の問題は各タスクが平等ではないのでそのままでは利⽤できない 本研究ではタスクごとに早期終了(early-stopping)を設定 ([Caruana et al. 1997]がヒント)
  • 11. blem Formulation ional multi-task learning (MTL) seeks to improve the ge ce of multiple related tasks by learning them jointly. Supp T tasks and the training data for the t-th task are denote {1, . . . , T}, i = {1, . . . , N}, with xt i 2 Rd and yt i 2 R bein label, respectively1 . The goal of the MTL is to minimize argmin {wt}T t=1 TX t=1 NX i=1 `(yt i, f(xt i; wt )) + (wt ), ; wt ) is a function of xt and parameterized by a weight ve on is denoted by `(·). A typical choice is the least square f nge loss for classification. The (wt ) is the regularizatio he complexity of weights. Problem Formulation ¤ 従来のマルチタスク学習は、複数の関連するタスクを同時に学習する ことで汎化性能を⾼める 訓練事例集合 タスク 損失関数 正則化項 ラベル 素性 重み
  • 12. Proposed Formulation ¤ 本研究のマルチタスク学習 特徴 ¤ 異なる2つの誤差関数を同時に最適化できる(回帰とクラス分類でも可能) ¤ 素性xがタスク依存でなく共通 loss function is denoted by `(·). A typical choice is the least square for regression and the hinge loss for classification. The (wt ) is the regularization term that penalizes the complexity of weights. In contrast to conventional MTL that maximizes the performance of all tasks our aim is to optimize the main task r, which is facial landmark detection, with the assistances of arbitrary number of related/auxiliary tasks a 2 A. Examples or related tasks include facial pose estimation and attribute inference. To this end, our problem can be formulated as argmin Wr,{Wa}a2A NX i=1 `r (yr i , f(xi; Wr )) + NX i=1 X a2A a `a (ya i , f(xi; Wa )), (2) 1 In this paper, scalar, vector, and matrix are denoted by lowercase, bold lowercase and bold capital letter, respectively. メインタスク 補助タスク a番⽬の補助タスクの重要度
  • 13. Proposed Formulation ¤ メインタスクが回帰問題、補助タスクがクラス分類なので、誤差関数 はそれぞれ2乗誤差、クロスエントロピー誤差となる ¤ 共有する画像の特徴量をDeep CNで学習 これら2つの式を合わせて学習する メインタスク 補助タスク can be combined, while existing methods [30] that employ Eq.(1) assume implic- itly that the loss functions across all tasks are identical. Second, Eq.(1) allows data xt i in di↵erent tasks to have di↵erent input representations, while Eq.(2) focuses on a shared input representation xi. The latter is more suitable for our problem, since all tasks share similar facial representation. In the following, we formulate our facial landmark detection model based on Eq.(2). Suppose we have a set of feature vectors in a shared feature space across tasks {xi}N i=1 and their corresponding labels {yr i , yp i , yg i , yw i , ys i }N i=1, where yr i is the target of landmark detection and the remaining are the targets of auxiliary tasks, including inferences of ‘pose’, ‘gender’, ‘wear glasses’, and ‘smiling’. More specifically, yr i 2 R10 is the 2D coordinates of the five landmarks (centers of the eyes, nose, corners of the mouth), yp i 2 {0, 1, .., 4} indicates five di↵erent poses (0 , ±30 , ±60 ), and yg i , yw i , ys i 2 {0, 1} are binary attributes. It is reasonable to employ the least square and cross-entropy as the loss functions for the main task (regression) and the auxiliary tasks (classification), respectively. Therefore, the objective function can be rewritten as argmin Wr,{Wa} 1 2 NX i=1 kyr i f(xi; Wr )k2 NX i=1 X a2A a ya i log(p(ya i |xi; Wa ))+ TX t=1 kWk2 2, (3) where f(xi; Wr ) = (Wr ) T xi in the first term is a linear function. The second term is a softmax function p(yi = m|xi) = exp{(Wa m)T xi} P j exp{(Wa j )T xi} , which models the class posterior probability (Wa j denotes the jth column of the matrix), and the third term penalizes large weights (W = {Wr , {Wa }}). In this work, we adopt the deep convolutional network (DCN) to jointly learn the share feature space x, since the unique structure of DCN allows for multitask and shared representation. tasks, including inferences of ‘pose’, ‘gender’, ‘wear glasses’, and ‘smiling’. More specifically, yr i 2 R10 is the 2D coordinates of the five landmarks (centers of the eyes, nose, corners of the mouth), yp i 2 {0, 1, .., 4} indicates five di↵erent poses (0 , ±30 , ±60 ), and yg i , yw i , ys i 2 {0, 1} are binary attributes. It is reasonable to employ the least square and cross-entropy as the loss functions for the main task (regression) and the auxiliary tasks (classification), respectively. Therefore, the objective function can be rewritten as argmin Wr,{Wa} 1 2 NX i=1 kyr i f(xi; Wr )k2 NX i=1 X a2A a ya i log(p(ya i |xi; Wa ))+ TX t=1 kWk2 2, (3) where f(xi; Wr ) = (Wr ) T xi in the first term is a linear function. The second term is a softmax function p(yi = m|xi) = exp{(Wa m)T xi} P j exp{(Wa j )T xi} , which models the class posterior probability (Wa j denotes the jth column of the matrix), and the third term penalizes large weights (W = {Wr , {Wa }}). In this work, we adopt the deep convolutional network (DCN) to jointly learn the share feature space x, since the unique structure of DCN allows for multitask and shared representation. In particular, given a face image x0 , the DCN projects it to higher level representation gradually by learning a sequence of non-linear mappings x0 ((Ws1 )T x0 ) ! x1 ((Ws2 )T x1 ) ! ... ((Wsl )T xl 1 ) ! xl . (4) Here, (·) and Wsl indicate the non-linear activation function and the filters needed to be learned in the layer l of DCN. For instance, xl = ⇣ (Wsl ) T xl 1 ⌘ . Note that xl is the shared representation between the main task r, and related
  • 14. Tasks-Constrained Deep Convolutional Network 全体構造 DCN部分 • モデルは各タスクで共通 マルチタスク部分
  • 15. Task-wise early stopping ¤ マルチタスクの場合、異なるタスクで難易度や収束率が異なる ¤ メインタスクよりも補助タスクの⽅が簡単そう→早く収束しそう ¤ 補助タスクが先に最適解に到達してるのにマルチタスク学習を続けると、 過学習となってしまい、メインタスクに悪影響を与えることになる →タスクによって学習をhaltするtask-wise early stopping ¤ ⾃動的にタスクを停⽌する基準 Facial Landmark Detection by Deep Multi-task Learning 7 of the training process, the TCDCN is constrained by all tasks to avoid being trapped at a bad local minima. As training proceeds, certain auxiliary tasks are no longer beneficial to the main task after they reach their peak performance their learning process thus should be halted. Note that the regularization o↵ered by early stopping is di↵erent from weight regularization in Eq.(3). The latte globally helps to prevent over-fitting in each task through penalizing certain parameter configurations. In Section 4.2, we show that task-wise early stopping is critical for multi-task learning convergence even with weight regularization. Now we introduce a criterion to automatically determine when to stop learn ing an auxiliary task. Let Ea val and Ea tr be the values of the loss function of task a on the validation set and training set, respectively. We stop the task if its measure exceeds a threshold ✏ as below k · medt j=t kEa tr(j) Pt j=t k Ea tr(j) k · medt j=t kEa tr(j) · Ea val(t) minj=1..t Ea tr(j) a · minj=1..t Ea tr(j) > ✏, (5 where t denotes the current iteration and k controls a training strip of length k. The ‘med’ denotes the function for calculating median value. The first ter m in Eq.(5) represents the tendency of the training error. If the training erro drops rapidly within a period of length k, the value of the first term is small indicating that training can be continued as the task is still valuable; otherwise 閾値 訓練誤差の傾向 • 訓練データの⼀部kにおいて訓練誤差 が急激に落ちると値は⼩さくなる →⽌まらない 汎化誤差 • 訓練誤差に対する汎化誤差 • 汎化誤差と訓練誤差の差が⼤ きくなる→⽌まる
  • 16. Learning procedure ¤ 最急降下法で求める is the importance coe cient of a-th task’s er gradient descent. Its magnitude reveals that m longer impact. This strategy achieves satisfac volution network given multiple tasks. Its sup in Section 4.2. Learning procedure: We have discussed w iliary task during training before it over-fit stochastic gradient descent to update the w the network. For example, the weight matri Wr = ⌘ @Er @Wr with ⌘ being the learning ra tion), and @Er @Wr = (yr i (Wr ) T xi)xT i . Also, th weights can be calculated in a similar manne For the filters in the lower layer, we compute loss error back following the back-propagatio "1 (Ws2 )T "2 @ (u1) @u1 "2 (Ws3 )T "3 @ (u2 @u2 where "l is the error at the shared represent (Wr )T xi] + P a2A(p(ya i |xi; Wa ) ya i )Wa , w derivatives. The errors of the lower layers a instance, "l 1 = (Wsl )T "l @ (ul 1 ) @ul 1 , where @ ( @u function. Then, the gradient of the filter is o ⌦ represents the receptive field of the filter. ing an auxiliary task. Let Ea val and Ea tr be the values of the loss function of task a on the validation set and training set, respectively. We stop the task if its measure exceeds a threshold ✏ as below k · medt j=t kEa tr(j) Pt j=t k Ea tr(j) k · medt j=t kEa tr(j) · Ea val(t) minj=1..t Ea tr(j) a · minj=1..t Ea tr(j) > ✏, (5) where t denotes the current iteration and k controls a training strip of length k. The ‘med’ denotes the function for calculating median value. The first ter- m in Eq.(5) represents the tendency of the training error. If the training error drops rapidly within a period of length k, the value of the first term is small, indicating that training can be continued as the task is still valuable; otherwise, the first term is large, then the task is more likely to be stopped. The second term measures the generalization error compared to the training error. The a is the importance coe cient of a-th task’s error, which can be learned through gradient descent. Its magnitude reveals that more important task tends to have longer impact. This strategy achieves satisfactory results for learning deep con- volution network given multiple tasks. Its superior performance is demonstrated in Section 4.2. Learning procedure: We have discussed when and how to switch o↵ an aux- iliary task during training before it over-fits. For each iteration, we perform stochastic gradient descent to update the weights of the tasks and filters of the network. For example, the weight matrix of the main task is updated by Wr = ⌘ @Er @Wr with ⌘ being the learning rate (⌘ = 0.003 in our implementa- tion), and @Er @Wr = (yr i (Wr ) T xi)xT i . Also, the derivative of the auxiliary task’s weights can be calculated in a similar manner as @Ea @Wa = (p(ya i |xi; Wa ) ya i )xi. For the filters in the lower layer, we compute the gradients by propagating the loss error back following the back-propagation strategy as "1 (Ws2 )T "2 @ (u1) @u1 "2 (Ws3 )T "3 @ (u2) @u2 ... (Wsl )T "l @ (ul 1) @ul 1 "l , (6) where "l is the error at the shared representation layer and "l = (Wr )T [yr i (Wr )T xi] + P a2A(p(ya i |xi; Wa ) ya i )Wa , which is the integration of all tasks’ derivatives. The errors of the lower layers are computed following Eq.(6). For instance, "l 1 = (Wsl )T "l @ (ul 1 ) @ul 1 , where @ (u) @u is the gradient of the activation function. Then, the gradient of the filter is obtained by @E @Wsl = "l xl 1 ⌦ , where ⌦ represents the receptive field of the filter. メインタスク 補助タスク ts magnitude reveals that more important task tends to have is strategy achieves satisfactory results for learning deep con- given multiple tasks. Its superior performance is demonstrated dure: We have discussed when and how to switch o↵ an aux- training before it over-fits. For each iteration, we perform t descent to update the weights of the tasks and filters of example, the weight matrix of the main task is updated by with ⌘ being the learning rate (⌘ = 0.003 in our implementa- (yr i (Wr ) T xi)xT i . Also, the derivative of the auxiliary task’s culated in a similar manner as @Ea @Wa = (p(ya i |xi; Wa ) ya i )xi. he lower layer, we compute the gradients by propagating the owing the back-propagation strategy as )T "2 @ (u1) @u1 "2 (Ws3 )T "3 @ (u2) @u2 ... (Wsl )T "l @ (ul 1) @ul 1 "l , (6) ror at the shared representation layer and "l = (Wr )T [yr i (p(ya i |xi; Wa ) ya i )Wa , which is the integration of all tasks’ rrors of the lower layers are computed following Eq.(6). For Wsl )T "l @ (ul 1 ) @ul 1 , where @ (u) @u is the gradient of the activation he gradient of the filter is obtained by @E @Wsl = "l xl 1 ⌦ , where volution network in Section 4.2. Learning proce iliary task during stochastic gradien the network. For Wr = ⌘ @Er @Wr tion), and @Er @Wr = weights can be ca For the filters in loss error back fo "1 (Ws where "l is the er (Wr )T xi] + P a2A derivatives. The instance, "l 1 = ( function. Then, t ⌦ represents the gradient descent. Its magnitude reveals tha longer impact. This strategy achieves satis volution network given multiple tasks. Its s in Section 4.2. Learning procedure: We have discussed iliary task during training before it over- stochastic gradient descent to update the the network. For example, the weight ma Wr = ⌘ @Er @Wr with ⌘ being the learning tion), and @Er @Wr = (yr i (Wr ) T xi)xT i . Also, weights can be calculated in a similar man For the filters in the lower layer, we compu loss error back following the back-propagat "1 (Ws2 )T "2 @ (u1) @u1 "2 (Ws3 )T "3 @ @ where "l is the error at the shared represe (Wr )T xi] + P a2A(p(ya i |xi; Wa ) ya i )Wa , derivatives. The errors of the lower layers instance, "l 1 = (Wsl )T "l @ (ul 1 ) @ul 1 , where @ バックプロパゲーション reveals that more important task tends to have ieves satisfactory results for learning deep con- tasks. Its superior performance is demonstrated discussed when and how to switch o↵ an aux- re it over-fits. For each iteration, we perform update the weights of the tasks and filters of weight matrix of the main task is updated by he learning rate (⌘ = 0.003 in our implementa- i)xT i . Also, the derivative of the auxiliary task’s milar manner as @Ea @Wa = (p(ya i |xi; Wa ) ya i )xi. we compute the gradients by propagating the k-propagation strategy as (Ws3 )T "3 @ (u2) @u2 ... (Wsl )T "l @ (ul 1) @ul 1 "l , (6) red representation layer and "l = (Wr )T [yr i ya i )Wa , which is the integration of all tasks’ wer layers are computed following Eq.(6). For ) , where @ (u) @u is the gradient of the activation
  • 17. Experiments ¤ Network Structure ¤ Model training ¤ 学習するデータセット:10,000 outdoor face images from the web ¤ 移動とか回転、ズームはあまり気にしないで収集 ¤ テストデータ:AFLWとAFL ¤ Evaluation metrics ¤ 平均エラー率 ¤ 正解と推定したlandmarkの距離を計算し、⽬の間隔で正規化 ¤ 誤り率 ¤ 10%を越えると誤りと判断
  • 18. the Effectiveness of Learning with Related Task ¤ AFLWで評価 ¤ 左が各landmarkのエラー率、右が全部のlandmarkの失敗率 ¤ 補助タスクによって確かにエラー率も失敗率も下がっている ¤ 全部の補助タスクを利⽤すると、失敗率を10%も改善できる ¤ poseが⼀番効いてるっぽい Facial Landmark Detection by Deep Multi-task Learning 9 6 8 10 12 left eye right eye nose left mouth corner right mouth corner meanerror(%) FLD FLD+gender FLD+glasses FLD+smile FLD+pose FLD+all 35.62 31.86 32.87 32.37 28.76 25.00 20 25 30 35 40 failurerate(%) Fig. 4. Comparison of di↵erent model variants of TCDCN: the mean error over di↵erent landmarks, and the overall failure rate. 4.1 Evaluating the E↵ectiveness of Learning with Related Task To examine the influence of related tasks, we evaluate five variants of the pro- posed model. In particular, the first variant is trained only on facial landmark detection. We train another four model variants on facial landmark detection along with the auxiliary task of recognizing ‘pose’, ‘gender’, ‘wearing glasses’,
  • 19. FLD vs. FLD + smile smileがどのlandmarkで効果的かを検証 (a):⿐や⼝で効果がある ¤ smileは顔の下半分に該当するから (b):最終層の重みのピアソンの相関係数 ¤ ⼝と強い相関 10 Z. Zhang, P. Luo, C. C. Loy, and X. Tang 8 8.5 9 9.5 10 10.5 11 11.5 left eye right eye nose left mouth corner right mouth corner meanerror(%) FLD FLD+smile 0.11 0.32 0.17 0.22 0.40 left eye right eye nose left mouth corner right mouth corner correlation Landmarkdetectionweights (a) (b) Learned weights’ correlation with the weights of‘smiling’task Fig. 5. FLD vs. FLD+smile. The smiling attribute helps detection more on the nose and corners of mouth, than the centers of eyes, since ‘smiling’ mainly a↵ects the lower part of a face.
  • 20. FLD vs. FLD + pose ポーズの効果を検証 (a):どのポーズでもエラー率は下がっている (b):正解の改善率で⾒ても、どのポーズでもよくなっている Fig. 5. FLD vs. FLD+smile. The smiling attribute helps detection more on the nose and corners of mouth, than the centers of eyes, since ‘smiling’ mainly a↵ects the lower part of a face. 0 0.5 1 1.5 2 2.5 3 left profile left frontal right right profle accuracyimprovement(%) (a) 5 10 15 20 left profile left frontal right right profle meanerror(%) FLD FLD+pose (b) Fig. 6. FLD vs. FLD+pose. (a) Mean error in di↵erent poses, and (b) Accuracy im- provement by the FLD+pose in di↵erent poses. weight vectors, which are learned to predict the positions of the mouth’s corners have high correlation with the weights of ‘smiling’ inference. This demonstrates that TCDCN implicitly learns relationship between tasks. FLD vs. FLD+pose: As observed in Figure 6(a), detection errors of FLD
  • 21. The Benefits of Task-wise Early Stopping (a):task-wise early stoppingでかなりエラーが落ちている (b):訓練誤差・汎化誤差がearly stoppingで⼩さくなっている Facial Landmark Detection by Deep Multi- stop ‘glasses’ stop ‘gender’ stop ‘smile’ stop ‘pose’6 8 10 12 14 16 left eye right eye nose left mouth corner right mouth corner meanerror(%) FLD+all FLD+all with task-wise early-stopping Fig. 7. (a) Task-wise early stopping leads to substantially lower di↵erent landmarks. (b) Its benefit is also reflected on the trainin convergence rate. The error is measured in L2-norm with respec of the 10 coordinates values (normalized to [0,1]) for the 5 landm 4.3 Comparison with the Cascaded CNN [21] Although both the TCDCN and the cascaded CNN [21] a we show that the proposed model can achieve better detect significantly lower computational cost. We use the full mo the publicly available binary code of the cascaded CNN in t Landmark localization accuracy: Similar to Section 4.1
  • 22. Comparison with the Cascaded CNN ¤ 訓練データを同じにしてAFLWでテスト ¤ 異なる点は、マルチタスク学習を利⽤しているかどうかという点 ¤ 4つのlandmarkでcascaded CNNを上回る ¤ 全体的にはcascaded CNNに勝っている that we use the same 10,000 training faces as in the cascaded CNN method. Thus the only di↵erence is that we exploit a multi-task learning approach. It is observed from Figure 8 that our method performs better in four out of five landmarks, and the overall accuracy is superior to that of cascaded CNN. (a) (b) 7 8 9 10 11 left eye right eye nose left mouth corner right mouth corner meanerror(%) cascaded CNN Ours 10 20 30 40 50 left eye right eye nose left mouth corner right mouth corner failurerate(%) Fig. 8. The proposed TCDCN vs. cascaded CNN [21]: (a) mean error over di↵erent landmarks and (b) the overall failure rate. Computational e ciency: Suppose the computation time of a 2D-convolution operation is ⌧, the total time cost for a CNN with L layers can be approximated by PL l=1 s2 l qlql 1⌧, where s2 is the 2D size of the input feature map for l-th layer, and q is the number of filters. The algorithm complexity of a CNN is thus O(s2 q2 ), directly related to the input image size and number of filters. Note that
  • 23. Comparison with other State-of-the-art Methods ¤ AFLWでの結果 ¤ 他の既存研究の結果を全て上回っている ¤ AFWでの結果 ¤ AFLWと同様 12 Z. Zhang, P. Luo, C. C. Loy, and X. Tang 5 10 15 20 25 left eye right eye nose left mouth corner right mouth corner meanerror(%) TSPM ESR CDM Luxand RCPR SDM Ours 15.9 13.0 13.1 12.4 11.6 8.5 8.0 5 10 15 20 meanerror(%) 5 10 15 20 25 left eye right eye nose left mouth corner right mouth corner meanerror(%) 14.3 12.2 11.1 10.4 9.3 8.8 8.2 5 10 15 20 meanerror(%) AFLWAFW Fig. 9. Comparison with RCPR [3], TSPM [32], CDM [27], Luxand [18], and SDM [25] on AFLW [11] (the first row) and AFW [32] (the second row) datasets. The left sub- figures show the mean errors on di↵erent landmarks, while the right subfigures show the overall errors. multiple CNNs in di↵erent cascaded layers (23 CNNs in its implementation). Hence, TCDCN has much lower computational cost. The cascaded CNN requires
  • 24. Comparison with other State-of-the-art Methods ⾊々な画像の結果 ¤ 1⾏⽬:メガネかけてる ¤ 2⾏⽬:ポーズのバリエーション ¤ 3⾏⽬: ¤ 1,2列⽬:光の当たり⽅が違う ¤ 3列⽬:画像の質が悪い ¤ 4,5列⽬:異なる表情 ¤ 6~8列⽬:間違った例(⾚が間違った部分) Facial Landmark Detection by Deep Multi-task Learning 13 0’ NS NG F30’ NS G F 60’ NS NG F 30’ S NG F-30’ NS NG F 0’ NS G M 60’ NS NG F -30’ NS G M-30’ S G M -30’ S NG F 0’ NS NG F 0’ S NG F -30’ NS NG M60’ S NG M60’ NS NG M 0’ NS NG M0’ S NG F 0’ S NG M 0’ NS NG M0’ NS NG M 0’ NS NG M -30’ NS NG F 0’ S NG F0’ NS NG F Fig. 10. Example detections by the proposed model on AFLW [11] and AFW [32] images. The labels below each image denote the tagging results for the related tasks: (0 , ±30 , ±60 ) for pose; S/NS = smiling/not-smiling; G/NG = with-glasses/without- glasses; M/F = male/female. Red rectangles indicate wrong tagging. 4.5 TCDCN for Robust Initialization This section shows that the TCDCN can be used to generate a good initialization to improve the state-of-the-art method, owing to its accuracy and e ciency. We take RCPR [3] as an example. Instead of drawing training samples randomly as initialization as did in [3], we initialize RCPR by first applying TCDCN on the
  • 25. TCDCN for Robust Initialization ¤ TCDCNはよい初期化を得る⼿法としても利⽤できる ¤ 既存研究であるRCPRについて、TCDCNでの初期化をしたものとしな かったもので⽐較 (a):相対的な改善(改善後のエラー/元のエラー) (b):改善の可視化(上が普通のRCPR、下がTCDCNで初期化したRCPR) 14 Z. Zhang, P. Luo, C. C. Loy, and X. Tang 1 23 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 0 5 10 15 20 1 6 11 16 21 26 relativeimprovment(%) landmarks (a) (b) Fig. 11. Initialization with our five-landmark estimation for RCPR [3] on COFW dataset [3]. (a) shows the relative improvement on each landmark (relative improvement = reduced error original error ). (b) visualizes the improvement. The upper row depicts the results of RCPR [3], while the lower row shows the improved results by our initialization. heterogeneous but subtly correlated tasks, such as appearance attribute, expres- sion, demographic, and head pose. The proposed Tasks-Constrained DCN allows errors of related tasks to be back-propagated in deep hidden layers for construct- ing a shared representation to be relevant to the main task. We have shown that
  • 26. Conclusion ¤ ヘテロだが相互に関連のあるタスクを同時に学習することで、よりロ バストなlandmark detectionができることを⽰した ¤ TCDCNは関連したタスクのエラーをバックプロパゲーションによっ て、共通した表現を学習できる ¤ task-wise early stoppingがモデルの収束を確実にするために重要 ¤ マルチタスク学習によって顔の状態にかなりロバストなモデルを作成 できた ¤ Future work:より密なlandmark detectionへの適⽤、他の画像認識 問題へのディープマルチタスク学習の適⽤
  • 27. 感想 ¤ CNNのマルチタスク学習の⽅法がわかってよかった ¤ この⽅法は知らなかった(新しい?) ¤ CNN+線形識別器だったが、CNN+CNNでも良さそう ¤ 個⼈的には、論⽂の書き⽅が参考になりそうだった ¤ 新しい⼿法が多くて⾊々実験している点が今書いている論⽂とにている