SlideShare une entreprise Scribd logo
1  sur  87
Télécharger pour lire hors ligne
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
(Google scholar 21305)
• Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on
information theory, 52(4), 1289-1306. ( 19534)
• Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive
field properties by learning a sparse code for natural images. Nature,
381(6583), 607. ( 4765)
• Candes, E. J., & Tao, T. (2005). Decoding by linear programming. IEEE
transactions on information theory, 51(12), 4203-4215. ( 5488)
• Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation
when p is much larger than n. The Annals of Statistics, 2313-2351. (
2603)
• Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles:
Exact signal reconstruction from highly incomplete frequency information.
IEEE Transactions on information theory, 52(2), 489-509. ( 12285)
•
•
•
•
•
•
•
•
•
•
•
•
OR
SP (e.g. Uber)
FD
Macroscopic Fundamental
Diagram (MFD)
Built environment
•
•
•
•
•
•
•
O(eN)
(Cover and van Campenhout, 1977)
•
•
•
•
•
•
•
•
•
•
•
2
0
-2
0.5
0
0
1
0.7
0
0
-2
0.7
3
0
0.5
0.5
0
0
2
0
0
0
0.5
0
4
Σ =# = (0,0,0,0,0)
µ
•
Σ
•
Σ
•
•
•
•
•
•
•
•
•
•
•
•
•
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives.
IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828.
8.2.
s is
uto-
ared
r f
ostly
sful
PSD
ding
gni-
lied
106],
k of
1.2).
wing
usly
hðtÞ
k2
2;
LEARNING
Another important perspective on representation learning is
based on the geometric notion of manifold. Its premise is the
manifold hypothesis, according to which real-world data
presented in high-dimensional spaces are expected to
concentrate in the vicinity of a manifold M of much lower
dimensionality dM, embedded in high-dimensional input
space IRdx
. This prior seems particularly well suited for AI
tasks such as those involving images, sounds, or text, for
which most uniformly sampled input configurations are
unlike natural stimuli. As soon as there is a notion of
“representation,” one can think of a manifold by consider-
ing the variations in input space which are captured by or
reflected (by corresponding changes) in the learned repre-
sentation. To first approximation, some directions are well
preserved (the tangent directions of the manifold), while
others are not (directions orthogonal to the manifolds). With
this perspective, the primary unsupervised learning task is
then seen as modeling the structure of the data-supporting
manifold.18
The associated representation being learned can
be associated with an intrinsic coordinate system on the
embedded manifold. The archetypal manifold modeling
algorithm is, not surprisingly, also the archetypal low-
Bengio et al. (2013)
多様体とは?(感覚的説明)
• 見かけは違うが、実質的にはd次元ユーク
リッド空間で表現できるような図形
• 「局所的に地図が書けるような図形」とも言え
る(例:地球表面)
3次元中に埋め込まれた、1次元多様体 同じく、2次元多様体(「スイスロール」)
多様体とは?(感覚的説明)
• 見かけは違うが、実質的にはd次元ユーク
リッド空間で表現できるような図形
• 「局所的に地図が書けるような図形」とも言え
る(例:地球表面)
3次元中に埋め込まれた、1次元多様体 同じく、2次元多様体(「スイスロール」)
• x A y
• y = A x A
y x
•
•
A
y x
• || y - A x ||2 x
•
β
i=1
1
n
E{xi,yi}n
i=1
2
n
i=1
l(yi, x⊤
i
ˆβ) + 2|
= E{xi,yi}n
i=1
EX,Y 2l(Y, X⊤ ˆβ)
ˆx = arg min
x
1
2
||y − Ax||2
+ λ||x||
[1] Alonso, W.: Location and Land Use, Harvard University Press, 1964.
[2] Mills, E.S.: An aggregative model of resource allocation in a metro
Economic Review, Vol.57, No.2, pp.197–210, 1967.
[3] Muth, R.F.: Cities and Housing, University of Chicago Press, 1969.
[4] Bairoch, P.: Cities and Economic Development: From the Dawn of
University of Chicago Press, 1988.
[5] Hohenberg, P., Lees, L.H.: The Making of Urban Europe (1000-195
i=1
= E{xi,yi}n
i=1
EX,Y 2l(Y, X⊤ ˆβ) + O(
1
n2
) (3)
ˆx = arg min
x
1
2
||y − Ax||2
+ λ||x|| (4)
ˆx = arg min
x
1
2
||y − Ax||2
(5)
||x|| ≤ k (6)
Land Use, Harvard University Press, 1964.
ve model of resource allocation in a metropolitan area, American
No.2, pp.197–210, 1967.
ousing, University of Chicago Press, 1969.
Economic Development: From the Dawn of History to the Present,
subject to
( ) = * |,-|
.
-/)
( 0 = * ,-
0
.
-/)
( 2 = max |,-|
( ) + 7 ( 0
0
•
•
• β L0
•
•
•
•
•
•
min
β
2
n
i=1
l(yi, x⊤
i β) + 2||β||0
Location and Land Use, Harvard University Press, 1964.
An aggregative model of resource allocation in a metropolitan area, Amer
view, Vol.57, No.2, pp.197–210, 1967.
Cities and Housing, University of Chicago Press, 1969.
Cities and Economic Development: From the Dawn of History to the Pres
Chicago Press, 1988.
P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard Univer
determination of bid rents through bidding procedures, Journal of Urban E
+ 2||β||0 (1)
versity Press, 1964.
llocation in a metropolitan area, American
.
icago Press, 1969.
From the Dawn of History to the Present,
an Europe (1000-1950), Harvard University
: 2017 9 21
l(yi, x⊤
i β) + 2||β||0 (1)
1
n
E{xi,yi}n
i=1
2
n
i=1
l(yi, x⊤
i
ˆβ) + 2||ˆβ||0 (2)
= E{xi,yi}n
i=1
EX,Y 2l(Y, X⊤ ˆβ) + O(
1
n2
) (3)
on and Land Use, Harvard University Press, 1964.
ggregative model of resource allocation in a metropolitan area, American
Vol.57, No.2, pp.197–210, 1967.
and Housing, University of Chicago Press, 1969.
•
•
•
•
Bayesでは事後確率は
観測データの確率×事前確率
事後確率を最大化するパラメタηを求めたい
ここで対数尤度にしてみると、次のように解釈で
|log|logmaxargˆ
||maxargˆ
PXP
PXP パラは事前分布のハイパー
損失関数 正則化項
Bayesでは事後確率は
観測データの確率×事前確率
事後確率を最大化するパラメタηを求めたい
ここで対数尤度にしてみると、次のように解釈でき
|log|logmaxargˆ
||maxargˆ
PXP
PXP パラメは事前分布のハイパー
損失関数 正則化項
Bayesでは事後確率は
観測データの確率×事前確率
事後確率を最大化するパラメタηを求めたい
ここで対数尤度にしてみると、次のように解釈できる
|log|logmaxargˆ
||maxargˆ
PXP
PXP パラメタは事前分布のハイパー
損失関数 正則化項
ノルムによる正則化項    
とすると  事前分布の重みをここで、
   
  
も同様にすると事前分布
L2
2
),(
2
1
maxarg
,0
2
1
),(
2
1
minarg
),|(log),|(logminarg
2/),|(log
,|
2/),()1,),(|(log)1,|(log
)1,0()(
2
2
2
wwwx
wwwx
ww,x
www
w
wxwxw,x
wx
w
w
w
T
i
ii
T
i
ii
i
ii
T
i
ii
i
ii
i
ii
φy
φy
pyp
p
p
φyφyNyp
Nφy
事前分布のwの
分散:λー1 とも見
える。
例:事前分布がLaplace分布、事後分布が正規分布
  
も同様にすると分布の事前分布は期待値
)|(log),|(logminarg
2
)|(log
2
exp
4
|0
2/),()1,),(|(log)1,|(log
)1,0()(
2
ww,x
w
w
w
w
wxwxw,x
wx
ii
i
ii
i
ii
i
ii
pyp
p
pLaplace
φyφyNyp
Nφy
例:事前分布がLaplace分布、事後分布が正規分布
ノルムによる正則化項     
  
も同様にすると分布の事前分布は期待値
L1
2
),(
2
1
minarg
)|(log),|(logminarg
2
)|(log
2
exp
4
|0
2/),()1,),(|(log)1,|(log
)1,0()(
2
2
wwx
ww,x
w
w
w
w
wxwxw,x
wx
w
w
i
ii
i
ii
i
ii
i
ii
i
ii
φy
pyp
p
pLaplace
φyφyNyp
Nφy
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
||x|| ≤ k
w
L(w)
f(w) = L(w) + λ||w||1
n
Rd
f(w) = min
w∈Rd
L(w) + λ||w||1 (
Use, Harvard University Press, 1964.
odel of resource allocation in a metropolitan area, Ameri
, pp.197–210, 1967.
||x|| ≤ k (6
w (7
L(w) (8
f(w) = L(w) + λ||w||1 (9
n
Rd
f(w) = min
w∈Rd
L(w) + λ||w||1 (10
Use, Harvard University Press, 1964.
odel of resource allocation in a metropolitan area, America
, pp.197–210, 1967.
ˆx = arg min
x
1
2
||y − Ax||2
||x|| ≤ k
w
L(w)
f(w) = L(w) + λ||w||1
min
w∈Rd
f(w) = min
w∈Rd
L(w) + λ||w||1
and Land Use, Harvard University Press, 1964.
ˆx = arg min
x
1
2
||y − Ax||2
||x|| ≤ k
w
L(w)
f(w) = L(w) + λ||w||1
min
w∈Rd
f(w) = min
w∈Rd
L(w) + λ||w||1
• η
• η
• wj ηj
•
•
w
L(w)
f(w) = L(w) + λ||w||1
min
w∈Rd
f(w) = min
w∈Rd
L(w) + λ||w||1
||w||1 =
d
j=1
|wj| =
1
2
d
j=1
min
η∈Rd:ηj ≥0
w2
j
ηj
+ ηj
W.: Location and Land Use, Harvard University Press, 1964.
S.: An aggregative model of resource allocation in a metropolitan area, A
1
w
L(w)
f(w) = L(w) + λ||w||1
min
w∈Rd
f(w) = min
w∈Rd
L(w) + λ||w||1
||w||1 =
d
j=1
|wj| =
1
2
d
j=1
min
η∈Rd:ηj ≥0
w2
j
ηj
+ ηj
w2
j
ηj
+ ηj ≥ 2||w||1
1
L(w
f(w) = L(w) + λ||w|
min
w∈Rd
f(w) = min
w∈Rd
L(w) + λ||w|
||w||1 =
d
j=1
|wj| =
1
2
d
j=1
min
η∈Rd:ηj ≥0
w
ηj
w2
j
ηj
+ ηj ≥ 2||w||1
ηj = |wj|
1
• η
•
•
•
min
w∈Rd
L(w) + λ||w||1 = min
w,η∈Rd,ηj ≥0
L(w) +
λ
2
d
j=1
w2
j
ηj
+
λ
2
d
j=1
ηj
W.: Location and Land Use, Harvard University Press, 1964.
.S.: An aggregative model of resource allocation in a metropolitan area,
ic Review, Vol.57, No.2, pp.197–210, 1967.
R.F.: Cities and Housing, University of Chicago Press, 1969.
P.: Cities and Economic Development: From the Dawn of History to the
ty of Chicago Press, 1988.
erg, P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard U
985.
Y.: A determination of bid rents through bidding procedures, Journal of U
||w||1 =
d
j=1
|wj| =
1
2
d
j=1
min
η∈Rd:ηj ≥0
w2
j
ηj
+ ηj ≥ 2||w||1
ηj = |wj|
1
min
w∈Rd
L(w) + λ||w||1 = min
w,η∈Rd,ηj ≥0
L(w) +
λ
2
d
j=1
w2
j
ηj
+
λ
2
d
j=1
ηj (14)
1. j = 1, . . . , d η1
j = 1
2.
a wt
wt
= arg min
w∈Rd
⎛
⎝L(w) +
λ
2
d
j=1
w2
j
ηt
j
⎞
⎠ (15)
b ηt+1
j
ηt+1
j = |wt
j| j = 1, . . . , d (16)
[1] Alonso, W.: Location and Land Use, Harvard University Press, 1964.
•
•
•
•
•
wt
= arg min
w∈Rd
⎛
⎝L(w) +
λ
2
d
j=1
w2
j
ηt
j
⎞
⎠
1
ηt+1
j = |wt
j| j = 1, . . . , d
proxg(y) = arg min
w∈Rd
1
2
||y − w||2
2 + g(w)
W.: Location and Land Use, Harvard University Press, 1964.
S.: An aggregative model of resource allocation in a metropolitan area, A
Review, Vol.57, No.2, pp.197–210, 1967.
F.: Cities and Housing, University of Chicago Press, 1969.
•
•
•
•
j j
proxg(y) = arg min
w∈Rd
1
2
||y − w||2
2 + g(w)
proxl1
λ (y) = arg min
w∈Rd
1
2
||y − w||2
2 + λ||w||1
proxl1
λ (y)
j
=
⎧
⎨
⎩
yj + λ, if yj  −λ
0, if − λ ≤ yj ≤ λ j = 1, . . . , d
yj − λ, if yj  λ
: Location and Land Use, Harvard University Press, 1964.
An aggregative model of resource allocation in a metropolitan area, Am
Review, Vol.57, No.2, pp.197–210, 1967.
: Cities and Housing, University of Chicago Press, 1969.
j
ηt+1
j = |wt
j| j = 1, . . . , d (
proxg(y) = arg min
w∈Rd
1
2
||y − w||2
2 + g(w) (
proxl1
λ (y) = arg min
w∈Rd
1
2
||y − w||2
2 + λ||w||1 (
proxl1
λ (y)
j
=
⎧
⎨
⎩
yj + λ, if yj  −λ
0, if − λ ≤ yj ≤ λ j = 1, . . . , d
yj − λ, if yj  λ
, W.: Location and Land Use, Harvard University Press, 1964.
E.S.: An aggregative model of resource allocation in a metropolitan area, Ameri
mic Review, Vol.57, No.2, pp.197–210, 1967.
R.F.: Cities and Housing, University of Chicago Press, 1969.
z
ST(z)
λ
-λ
λ
λ
λ
•
•
• j
•
•
•
prox 1
λ (y) = arg min
w∈Rd 2
||y − w||2 + λ||w||1 (
proxl1
λ (y)
j
=
⎧
⎨
⎩
yj + λ, if yj  −λ
0, if − λ ≤ yj ≤ λ j = 1, . . . , d
yj − λ, if yj  λ
1
2
||y − w||2
2 + λ||w||1 =
d
j=1
1
2
(yj − wj)2
+ λ|wj| (
yj − wj ∈ λ∂|wj| j = 1, . . . , d (
, W.: Location and Land Use, Harvard University Press, 1964.
E.S.: An aggregative model of resource allocation in a metropolitan area, Ameri
mic Review, Vol.57, No.2, pp.197–210, 1967.
R.F.: Cities and Housing, University of Chicago Press, 1969.
h, P.: Cities and Economic Development: From the Dawn of History to the Prese
sity of Chicago Press, 1988.
prox 1
λ (y) = arg min
w∈Rd 2
||y − w||2
2 + λ||w||1
proxl1
λ (y)
j
=
⎧
⎨
⎩
yj + λ, if yj  −λ
0, if − λ ≤ yj ≤ λ j = 1, . . . , d
yj − λ, if yj  λ
1
2
||y − w||2
2 + λ||w||1 =
d
j=1
1
2
(yj − wj)2
+ λ|wj|
yj − wj ∈ λ∂|wj| j = 1, . . . , d
Location and Land Use, Harvard University Press, 1964.
An aggregative model of resource allocation in a metropolitan area, Am
Review, Vol.57, No.2, pp.197–210, 1967.
roxl1
λ (y) = arg min
w∈Rd
1
2
||y − w||2
2 + λ||w||1 (18)
y)
j
=
⎧
⎨
⎩
yj + λ, if yj  −λ
0, if − λ ≤ yj ≤ λ j = 1, . . . , d
yj − λ, if yj  λ
− w||2
2 + λ||w||1 =
d
j=1
1
2
(yj − wj)2
+ λ|wj| (19)
yj − wj ∈ λ∂|wj| j = 1, . . . , d (20)
∂|w| =
⎧
⎨
⎩
−1, if w  0
[−1, 1], if w = 0
1, if w  0
w
|w|
• f wt
• w0
• ηt
•
•
•
yj − wj ∈ λ∂|wj| j = 1, . . . , d
∂|w| =
⎧
⎨
⎩
−1, if w  0
[−1, 1], if w = 0
1, if w  0
wt+1
= arg min
w
∇L(wt
)(w − wt
) + λ||w||1 +
1
2ηt
||w − wt
||2
2
so, W.: Location and Land Use, Harvard University Press, 1964.
s, E.S.: An aggregative model of resource allocation in a metropolitan area, Ame
nomic Review, Vol.57, No.2, pp.197–210, 1967.
h, R.F.: Cities and Housing, University of Chicago Press, 1969.
2
yj − wj ∈ λ∂|wj| j = 1, . . . , d
∂|w| =
⎧
⎨
⎩
−1, if w  0
[−1, 1], if w = 0
1, if w  0
wt+1
= arg min
w
∇L(wt
)(w − wt
) + λ||w||1 +
1
2ηt
||w − wt
||2
2
wt+1
= proxl1
λ,ηt
wt
− ηt∇L(wt
)
W.: Location and Land Use, Harvard University Press, 1964.
•
wt+1
= arg min
w
∇L(wt
)(w − wt
) + λ||w||1 +
1
2ηt
||w − w
wt+1
= proxl1
λ,ηt
wt
− ηt∇L(wt
)
[1] Alonso, W.: Location and Land Use, Harvard University Press, 1964.
2
• X L
•
•
•
•
•
1, if w  0
wt+1
= arg min
w
∇L(wt
)(w − wt
) + λ||w||1 +
1
2ηt
||w − wt
||2
2 (21)
wt+1
= proxl1
λ,ηt
wt
− ηt∇L(wt
) (22)
min
x∈Rn
f(x) (23)
2
gj(x) = 0 j = 1, . . . , p (24)
g(x) = (g1(x), . . . , gp(x))⊤
(25)
W.: Location and Land Use, Harvard University Press, 1964.
S.: An aggregative model of resource allocation in a metropolitan area, American
gj(x) = 0 j = 1, . . . , p
g(x) = (g1(x), . . . , gp(x))⊤
nso, W.: Location and Land Use, Harvard University Press, 1964.
s, E.S.: An aggregative model of resource allocation in a metropolitan area, Am
nomic Review, Vol.57, No.2, pp.197–210, 1967.
s.t. gj(x) = 0 j = 1, . . . , p (2
g(x) = (g1(x), . . . , gp(x))⊤
(2
Lρ(x, y) = f(x) + y⊤
g(x) +
ρ
2
||g(x)||2
2 (2
•
•
•
• x*
y*
g(x) = (g1(x), . . . , gp(x))⊤
(25)
Lρ(x, y) = f(x) + y⊤
g(x) +
ρ
2
||g(x)||2
2 (26)
Location and Land Use, Harvard University Press, 1964.
An aggregative model of resource allocation in a metropolitan area, American
Review, Vol.57, No.2, pp.197–210, 1967.
: Cities and Housing, University of Chicago Press, 1969.
: Cities and Economic Development: From the Dawn of History to the Present,
of Chicago Press, 1988.
P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard University
A determination of bid rents through bidding procedures, Journal of Urban Eco-
.27, Issue.2, pp.188–211, 1990.
gj(x) = 0 j = 1, . . . , p (24)
g(x) = (g1(x), . . . , gp(x))⊤
(25)
Lρ(x, y) = f(x) + y⊤
g(x) +
ρ
2
||g(x)||2
2 (26)
min
x∈Rn
max
y∈Rp
Lρ(x, y) (27)
.: Location and Land Use, Harvard University Press, 1964.
.: An aggregative model of resource allocation in a metropolitan area, American
Review, Vol.57, No.2, pp.197–210, 1967.
s.t.
gj(x) = 0 j = 1, . . . , p
g(x) = (g1(x), . . . , gp(x))
Lρ(x, y) = f(x) + y⊤
g(x) +
ρ
2
|
min
x∈Rn
max
y∈Rp
Lρ(x, y)
∇g1(x∗
), . . . , ∇gp(x∗
)
gj(x) = 0 j = 1, . . . , p (24)
g(x) = (g1(x), . . . , gp(x))⊤
(25)
Lρ(x, y) = f(x) + y⊤
g(x) +
ρ
2
||g(x)||2
2 (26)
min
x∈Rn
max
y∈Rp
Lρ(x, y) (27)
∇g1(x∗
), . . . , ∇gp(x∗
) (28)
∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0 (29)
gj(x∗
) = 0, j = 1, . . . , p (30)
(3.1)
(3.2)
(3.3)
•
• x*
•
• x* x
y* y
• y*
x*
• x* y*
∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
gj(x∗
) = 0, j = 1, . . . , p
∇xLρ(x, y) = ∇f(x) +
p
j=1
yj∇gj(x) + ρ
p
j=1
gj(x)∇gj(x)
2
min
x∈Rn
max
y∈Rp
Lρ(x, y)
∇g1(x∗
), . . . , ∇gp(x∗
)
∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
gj(x∗
) = 0, j = 1, . . . , p
∇g1(x∗
), . . . , ∇gp(x∗
)
∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
gj(x∗
) = 0, j = 1, . . . , p
∇xLρ(x, y) = ∇f(x) +
p
j=1
yj∇gj(x) + ρ
p
j=1
gj(x)∇gj(x)
∇xLρ(x, y∗
)|x=x∗ = ∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
•
•
•
∇xLρ(x, y) = ∇f(x) +
j=1
yj∇gj(x) + ρ
j=1
gj(x)∇gj(x)
∇xLρ(x, y∗
)|x=x∗ = ∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
1. y0
2. xk+1 ||∇xLρk
(xk+1, yk)|| ≤ ϵk
ρk  0 ϵk ≥ 0 ϵk → 0
3. yk+1 ← yk + ρkg(xk+1)
4. k ← k + 1 2
•
•
• f
•
•
1. y0
2. xk+1 ||∇xLρk
(xk+1, yk)|| ≤ ϵk
ρk  0 ϵk ≥ 0 ϵk → 0
3. yk+1 ← yk + ρkg(xk+1)
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
3
1. y0
2. xk+1 ||∇xLρk
(xk+1, yk)|| ≤ ϵk
ρk  0 ϵk ≥ 0 ϵk → 0
3. yk+1 ← yk + ρkg(xk+1)
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
f∗
(s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
f∗
: Rn
→ R ∪ {+∞}
f → f∗
1. y0
2. xk+1 ||∇xLρk
(xk+1, yk)|| ≤ ϵk
ρk  0 ϵk ≥ 0 ϵk → 0
3. yk+1 ← yk + ρkg(xk+1)
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
f∗
(s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
f∗
: Rn
→ R ∪ {+∞}
f → f∗
∇xLρ(x, y∗
)|x=x∗ = ∇f(x∗
) +
j=1
y∗
j ∇gj(
1. y0
2. xk+1 ||∇xLρk
(xk+1, yk)|| ≤ ϵk
ρk  0 ϵk ≥ 0 ϵk → 0
3. yk+1 ← yk + ρkg(xk+1)
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
f∗
(s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
f∗
: Rn
→ R ∪ {+∞}
f → f∗
✲
x(1)
✲
x(2)
✲
x(3)
✲
✻
f
x(4)
✲
✻
f
x(5)
✲
✻
f
x(6)
34
✲
x
✻
y
f(x)
p
−f•(p)
✲
x
✻
y
•
• f g
•
•
•
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
f∗
(s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
f∗
: Rn
→ R ∪ {+∞}
f → f∗
X ∈ Rn×d
min
w∈Rd
(f(Xw) + g(w)) = min
α∈R
w∗
, α∗
w∗
∈ ∂g∗
(
α∗
∈ −∂f
3
→ R ∪ {+∞}
= sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
→ R ∪ {+∞}
n×d
min
w∈Rd
(f(Xw) + g(w)) = min
α∈Rn
−f∗
(−α) − g∗
(X⊤
α)
w∗
∈ ∂g∗
(X⊤
α∗
)
α∗
∈ −∂f(Xw∗
)
3
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
f∗
(s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
f∗
: Rn
→ R ∪ {+∞}
f → f∗
X ∈ Rn×d
min
w∈Rd
(f(Xw) + g(w)) = min
α∈Rn
−f∗
w∗
, α∗
w∗
∈ ∂g∗
(X⊤
α∗
α∗
∈ −∂f(Xw∗
3
2
+∞}
, x⟩ − f(x)|x ∈ Rn
}
{+∞}
min
w∈Rd
(f(Xw) + g(w)) = min
α∈Rn
−f∗
(−α) − g∗
(X⊤
α) (33)
w∗
∈ ∂g∗
(X⊤
α∗
) (34)
α∗
∈ −∂f(Xw∗
) (35)
(36)
3
• L
•
•
min
w∈Rd
fl(Xw) + λ||w||1 (37
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (38
min
w∈Rd
fl(Xw) + λ||w||1 (3
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (3
fl
λ|| ||1
λ
min
w∈Rd
fl(Xw) + λ||w||1
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α)
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
w∈Rd
fl(Xw) + λ||w||1 (
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) (
X⊤
α = v (
η
min
w∈Rd
fl(Xw) + λ||w||1 (37)
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (38)
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) (39)
X⊤
α = v (40)
s.t.
•
•
•
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (38)
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) (39)
X⊤
α = v (40)
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v||2
2 (41)
) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2 (42)
s.t.
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v)
X⊤
α = v
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v||2
2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v)
X⊤
α = v
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v||2
2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v)
X⊤
α = v
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v||2
2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
•
•
•
•
• (Tomioka and Sugiyama, 2009)
•
•
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
α∈Rn,v∈Rd
l ∞
2 2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
α, v
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
)
wt
wt+1
= wt
+ η X⊤
αt+1
− vt+1
α∈Rn,v∈Rd 2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
α∈Rn,v∈Rd
l |
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ
max
w∈Rd
min
α∈Rn,v
α, v
(αt+1
, vt+1
) = arg
wt
wt+1
= wt
+ η
α∈Rn,v∈Rd 2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
(3.4)
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v)
X⊤
α = v
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
α, v
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
)
wt
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v||2
2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv − ˆwt
||2
2 + δ||·||∞≤λ(v) + const.
wt
+ ηX⊤
α
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α −
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
α, v
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
wt
wt+1
= wt
+ η X⊤
αt+1
− vt+1
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv − ˆwt
||2
2 + δ||·||∞≤λ(
ˆwt
= wt
+ ηX⊤
α
• v
•
• v
α
• α α w
•
1. w0
2. φ(αt) αt+1
3. wt
wt+1
= wt
+ η X⊤
αt+1
− vt+1
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv − ˆwt
||2
2 + δ||·||∞≤λ(v) + const.
+ ηX⊤
α
wt+1
= wt
+ η X⊤
αt+1
− vt+1
(45
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv − ˆwt
||2
2 + δ||·||∞≤λ(v) + const. (46
X⊤
α
ˆwt
− ηvt+1
= proxl1
λ,η( ˆwt
) (47
wt+1
= wt
+ η X⊤
αt+1
− vt+1
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv − ˆwt
||2
2 + δ||·||∞≤λ(v) + const.
wt
+ ηX⊤
α
ˆwt
− ηvt+1
= proxl1
λ,η( ˆwt
)
φt(α) = f∗
l (−α) +
1
2ηt
proxl1
λ,ηt
( ˆwt
+ ηtX⊤
α)
2
2
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
) (44
wt+1
= wt
+ η X⊤
αt+1
− vt+1
(45
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv − ˆwt
||2
2 + δ||·||∞≤λ(v) + const. (46
X⊤
α
ˆwt
− ηvt+1
= proxl1
λ,η( ˆwt
) (47
φt(α) = f∗
l (−α) +
1
2ηt
proxl1
λ,ηt
( ˆwt
+ ηtX⊤
α)
2
2
(48
wt+1
= proxl1
λ,ηt
wt
+ ηtX⊤
αt+1
(49
•
•
• α, v, w
•
Eckstein and Bertsekas(1992) Boyd et al. (2010)
•
•
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − v||2
2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
max
w∈Rd
min
α∈Rn,v∈Rd
Lη(α, v, w)
(αt+1
, vt+1
) = arg min
α∈Rn,v∈Rd
Lη(α, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
η l
α, v
(α
wt
αt+1
= arg min
α∈Rn
Lη(α, vt
, wt
) (
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, wt
) (
wt+1
= wt
+ η X⊤
αt+1
− vt+1
(
•
•
•
•
https://en.wikipedia.org/wiki/Coordinate_descent
•
• βj λ βk
•
•
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
ˆβlasso
= arg min
β
⎧
⎨
⎩
1
2
d
i=1
(yi − β0 −
n
j=1
xij · βj)2
+ λ
n
j=1
|βj|
⎫
⎬
⎭
R(˜β(λ), βj) =
1
2
d
i=1
⎛
⎝yi −
k̸=j
xik · ˜βk(λ) − xij · βj
⎞
⎠
2
+ λ
k̸=i
| ˜βk(λ)| + λ|βj|
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, w
wt+1
= wt
+ η X⊤
αt+1
− v
ˆβlasso
= arg min
β
⎧
⎨
⎩
1
2
d
i=1
(yi − β0 −
n
j=1
xij ·
R(˜β(λ), βj) =
1
2
d
i=1
⎛
⎝yi −
k̸=j
xik · ˜βk(λ) − xij · βj
αt+1
= arg min
α∈Rn
Lη(α, vt
, wt
)
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
ˆβlasso
= arg min
β
⎧
⎨
⎩
1
2
d
i=1
(yi − β0 −
n
j=1
xij · βj)2
+ λ
n
j=1
|βj|
⎫
⎬
⎭
R(˜β(λ), βj) =
1
2
d
i=1
⎛
⎝yi −
k̸=j
xik · ˜βk(λ) − xij · βj
⎞
⎠
2
+ λ
k̸=i
| ˜βk(λ)| + λ|βj|
α∈Rn
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, wt
) (51)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
(52)
o
= arg min
β
⎧
⎨
⎩
1
2
d
i=1
(yi − β0 −
n
j=1
xij · βj)2
+ λ
n
j=1
|βj|
⎫
⎬
⎭
(53)
=
1
2
d
i=1
⎛
⎝yi −
k̸=j
xik · ˜βk(λ) − xij · βj
⎞
⎠
2
+ λ
k̸=i
| ˜βk(λ)| + λ|βj| (54)
yi − ˜y
(j)
i = yi −
k̸=j
xik
˜βk(λ) (55)
αt+1
= arg min
α∈Rn
Lη(α, vt
, wt
) (50)
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, wt
) (51)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
(52)
ˆβlasso
= arg min
β
⎧
⎨
⎩
1
2
d
i=1
(yi − β0 −
n
j=1
xij · βj)2
+ λ
n
j=1
|βj|
⎫
⎬
⎭
(53)
R(˜β(λ), βj) =
1
2
d
i=1
⎛
⎝yi −
k̸=j
xik · ˜βk(λ) − xij · βj
⎞
⎠
2
+ λ
k̸=i
| ˜βk(λ)| + λ|βj| (54)
yi − ˜y
(j)
i = yi −
k̸=j
xik
˜βk(λ) (55)
˜β(λ) ← TH
d
i=1
xij(yij − ˜β
(j)
i ), λ (56)
TH( )
• Fu(1998) Daubechies et al. (2004)
Friedman et al. (2007) Wu and Lange (2008)
•
•
Friedman et al., 2010)
•
•
•
•
•
(Beck and Tetruashvili, 2013)
•
•
•
•
•
•
•
51
52
•
–
–
–
53
•
–
•
•
•
•
54
•
–
–
–
•
•
– µ
S S-1=Q
55
x1
x2
x3
x4
x5
•
–
–
–
56
x1
x2
e.g. x1, x2 1, 2
x2
P(x2)
x2
P(x2)
x1
x2
P(x1)
(Gaussian Markov Random Field)
•
• µ, Q
• x = (x1,…,xL)
xo, xu
•
57
p(xu | xo,µ,Θ) = N(x | µ,Θ−1
) δ(yi − xi )
i∈O
∏ dxoxo
∫
= N(xu | µu −Θuu
−1
Θuo (xo −µo ),Θuu
−1
)
Θ =
Θuu Θou
Θuo Θoo

#
$
$
%

'
'
µ =
µu
µo
!

#
#
$
%


δ(⋅)
ˆxu = argmax
xu
N(xu | µu −Θuu
−1
Θuo (xo −µo ),Θuu
−1
)
(1)
–
–
Gaussian Graphical Model (GGM)
( , 2014; Kataoka et al., 2014)
–
–
–
(Graphical Lasso; GL)
–
–
58
•
–
•
–
– Graphical Lasso (Friedman et al., 2007)
59
:	0
0	
Θ =
Θ =
60
• x
•
–
– V+2 ( V + V2/2)
Θ µ
Z(β, γ, α) = exp
1
2
βT
Θ−1
β
∞
−∞
exp −
1
2
(x − µ)T
Θ(x − µ) dx (4.16)
(3.2)(3 (2))
Z(β, γ, α) = exp
1
2
βT
Θ−1
β (2π)Ndet(Θ−1) (4.17)
(4.16) (4.17)
p(x|β, γ, α) =
1
(2π)Ndet(Θ−1)
exp −
1
2
(x − µ)T
Θ(x − µ) (4.18)
GGM(4.11)
Kataoka et al. GGM
GGM
Θij ≡
ε + ∂(i)
−1
0
%

''
(
'
'
i = j
(i, j) E
otherwise
µ ≡
1
η
Θ−1
β i
G(V, E) GGM
p(x|β, η) ∝ exp βT
x −
ηϵ
2 i∈V
x2
i −
η
2 (i,j)∈E
(xi − xj)2
(4.19)
β η
ϵ
(4.19) (4.11) ηϵ γi
η α (4.19)
i
(4.19) GGM x
∂(i)
(4.19)
exp βx −
ηϵ
2 i∈V
x2
i −
η
2 (i,j)∈E
(xi − xj)2
= exp βx −
η
2 i∈V
ϵ + |∂(i)| x2
i + η
(i,j)∈E
xixj (4.20
= exp −
η
2
(x − µ)T
Θ(x − µ) +
η
2
βT
Θ−1
β
p(x|β, η) =
ηN det C
(2π)N
exp −
η
2
(x − µ)T
Θ(x − µ)
Θ
Θij =
⎧
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪
ϵ + |∂(i)| i = j
−1 (i, j) ∈ E or (j, i) ∈ E
(2)
(3)
27
zd
ln p(zd
|θ)
ln p(zd
|θ) = βT
zd
−
ηϵ
2 i∈V
zi
d2
−
η
2 (i,j)∈E
zd
i − zd
j
2
−
1
2η
βT
Θ−1
β −
|V|
2
ln η + const
(4.26)
yd
xd
p(xd
u|yd
, θ)
p(xd
u|yd
, θ) exp
i∈Ud
βi(xd
u)i −
ηd
ϵ
2
i∈Ud
(xd
u)i −
η
2
(i,j)∈Ωd
1
(yd
i − (xd
u)j)2
−
η
2
(i,j)∈Ωd
3
(xd
u)i − (xd
u)j
2
(4.27)
p(xd
u|yd
, θ) =
η|Ud|
det Θud
(2π)|Ud|
exp −
η
2
xd
u − µud
T
ΘUd xd
u − µud (4.28)
•
•
–
–
– yd xd
•
–
–
61
θ0
Estep Mstep
Estep Q(θ, θold
)
Q(θ|θold
) =
x1
u x2
u
. . .
xD
u
ln
D
d=1
p(zd
|Θ)
D
d=1
p(xd
u|yd
, θold
)dx1
udx2
u . . . dx
=
d xd
u
ln p(zd
|θ)p(xd
u|yd
, θold
)dxd
u
Estep Mstep
Estep (4.25)
Estep Q(θ, θ )
Q(θ|θold
) =
x1
u x2
u
. . .
xD
u
ln
D
d=1
p(zd
|Θ)
D
d=1
p(xd
u|yd
, θold
)dx1
udx2
u . . . d
=
d xd
u
ln p(zd
|θ)p(xd
u|yd
, θold
)dxd
u
Estep Mstep
Estep (4.25)
27
zd
ln p(zd
|θ)
ln p(zd
|θ) = βT
zd
−
ηϵ
2 i∈V
zi
d2
−
η
2 (i,j)∈E
zd
i − zd
j
2
−
1
2η
βT
Θ−1
β −
|V|
2
ln η + const
(4.26)
yd
xd
p(xd
u|yd
, θ)
p(xd
u|yd
, θ) exp
i∈Ud
βi(xd
u)i −
ηd
ϵ
2
i∈Ud
(xd
u)i −
η
2
(i,j)∈Ωd
1
(yd
i − (xd
u)j)2
−
η
2
(i,j)∈Ωd
3
(xd
u)i − (xd
u)j
2
(4.27)
27
zd
ln p(zd
|θ)
ln p(zd
|θ) = βT
zd
−
ηϵ
2 i∈V
zi
d2
−
η
2 (i,j)∈E
zd
i − zd
j
2
−
1
2η
βT
Θ−1
β −
|V|
2
ln η + const
(4.26)
yd
xd
p(xd
u|yd
, θ)
p(xd
u|yd
, θ) exp
i∈Ud
βi(xd
u)i −
ηd
ϵ
2
i∈Ud
(xd
u)i −
η
2
(i,j)∈Ωd
1
(yd
i − (xd
u)j)2
−
η
2
(i,j)∈Ωd
3
(xd
u)i − (xd
u)j
2
(4.27)
p(xd
u|yd
, θ) =
η|Ud|
det Θud
(2π)|Ud|
exp −
η
2
xd
u − µud
T
ΘUd xd
u − µud (4.28)
27
zd
ln p(zd
|θ)
ln p(zd
|θ) = βT
zd
−
ηϵ
2 i∈V
zi
d2
−
η
2 (i,j)∈E
zd
i − zd
j
2
−
1
2η
βT
Θ−1
β −
|V|
2
ln η + const
(4.26)
yd
xd
p(xd
u|yd
, θ)
p(xd
u|yd
, θ) exp
i∈Ud
βi(xd
u)i −
ηd
ϵ
2
i∈Ud
(xd
u)i −
η
2
(i,j)∈Ωd
1
(yd
i − (xd
u)j)2
−
η
2
(i,j)∈Ωd
3
(xd
u)i − (xd
u)j
2
u
η ud
Θud :=
⎧
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
ϵ + |∂(i)| i ∈ Ud
−1 (i, j) ∈ Ωd
3
or (j, i) ∈ Ωd
3
0 otherwise
(4.25) (4.26) (4.28) Q(θ|θold
)
Mstep 4.3.2 Q(θ|θold
)
θ
ˆθ = arg max
θ
Q(θ|θold
)
28
E-step Q(θ|θold
)
∂Q(β, η|βold
, ηold
)
∂βi
∝
1
D
d
Ed[zi] − Θ−1
β
i
(4.29a)
E-step Q(θ|θold
)
∂Q(β, η|βold
, ηold
)
∂βi
∝
1
D
d
Ed[zi] − Θ−1
β
∂Q(β, η|βold
, ηold
)
∂η
∝
(i,j)∈E
1
D
d
Ed[zizj] −
N
2η
−
1
2 i∈V
(ϵ + |∂(i)closed form
(4)
•
•
–
•
62
3.2
p(x|µ, Θ) :=
1
Z(Θ)
exp −
1
2
(x − µ)T
Θ(x − µ)
Σ
Θ
ln p(Θ, µ) =
D
2
log det Θ −
1
2
d
(xd
− µ)T
Θ(xd
− µ) + const
xd
D D d
xd
= (xd
1
, xd
2
, ..., xd
|V|
)
2 µ, Θ µ
15
p(x|µ, Θ) :=
Z(Θ)
exp −
2
(x − µ)T
Θ(x − µ) (4.1)
Σ
Θ
ln p(Θ, µ) =
D
2
log det Θ −
1
2
d
(xd
− µ)T
Θ(xd
− µ) + const (4.2)
xd
D D d
xd
= (xd
1
, xd
2
, ..., xd
|V|
)
2 µ, Θ µ
15
1
(4.2)
Θ∗
= arg max
Θ
log det Θ − tr(SΘ) − ρ||Θ||1 (4.31)
||Θ||1
|V|×|V|
i,j=1
|Θij| µ
(4.7) Θ
(4.31) L1
Θij = 0 4.3.2
Θ = Θij
j
|V|
∑
i
|V|
∑
Q1
Q2
(5)
•
–
–
•
Friedman
et al. (2007) Graphical Lasso
– L1
•
GL
63
• Q
• S G
• Q-1* W
–
64
(4.21) L1
(4.31) Θ
∂
∂Θ
ln p(Θ) = Θ−1
− S − ρ Γ (4.32)
Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1]
(4.31) (4.32)
Θ−1
− S − ρ Γ = 0 (4.3
Σ = Θ−1
W Θ,W
Θ =
Θ11 θ12
θT
12
θ22
, W =
W11 w12
wT
12
w22
, S =
S11 s12
sT
12
s22
(4.3
θ12, w12, s12
θ22, w22, s22
(4.33) GL
Γij =
sign(Θij )
∈ [−1,1]
%

'
('
if Qij ≠ 0
if Qij = 0
Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1]
(4.31) (4.32)
Θ−1
− S − ρ Γ = 0 (4.33)
Σ = Θ−1
W Θ,W
Θ =
Θ11 θ12
θT
12
θ22
, W =
W11 w12
wT
12
w22
, S =
S11 s12
sT
12
s22
(4.34)
θ12, w12, s12
θ22, w22, s22
(4.33) GL
Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1]
(4.31) (4.32)
Θ−1
− S − ρ Γ = 0 (4.33)
Σ = Θ−1
W Θ,W
Θ =
Θ11 θ12
θT
12
θ22
, W =
W11 w12
wT
12
w22
, S =
S11 s12
sT
12
s22
(4.34)
θ12, w12, s12
θ22, w22, s22
(4.33) GL
Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1]
(4.31) (4.32)
Θ−1
− S − ρ Γ = 0 (4.33
Σ = Θ−1
W Θ,W
S
Θ =
Θ11 θ12
θT
12
θ22
, W =
W11 w12
wT
12
w22
, S =
S11 s12
sT
12
s22
(4.34
θ12, w12, s12
θ22, w22, s22
(4.33) GL
11W 12w
22w
12
T
w
(6)
•
• (6) (7)
• WQ=I
–
65
L1
−1
b = W −1/2
11
s12
35) (4.34) (4.33
w12 − s12 − ρ γ12 = 0
W Θ = I
1 θ12
2
θ22
=
W11Θ11 + w12θT
12
W11θ12 + θ22w12
θT
12
W + θwT
12
wT
12
θ + w22θ22
=
I 0
0T
1
∂
∂β
1
2
W 1/2
11
β − b
2
+ ρ||β||1 = 0 (4.35
β L1 β
W −1
11
w12 β ∈ R|V|−1
b = W −1/2
11
s12
(4.35) (4.34) (4.33)
w12 − s12 − ρ γ12 = 0 (4.36
W Θ = I
W11 w12
wT
12
w22
Θ11 θ12
θT
12
θ22
=
W11Θ11 + w12θT
12
W11θ12 + θ22w12
θT
12
W + θwT
12
wT
12
θ + w22θ22
=
I 0
0T
1
(4.37
W11θ12 + θ22w12 = 0
4.35) (4.34) (4.33)
w12 − s12 − ρ γ12 = 0 (4.36
W Θ = I
Θ11 θ12
θT
12
θ22
=
W11Θ11 + w12θT
12
W11θ12 + θ22w12
θT
12
W + θwT
12
wT
12
θ + w22θ22
=
I 0
0T
1
(4.37
W11θ12 + θ22w12 = 0
(7)
(8)
L1
(4.31) Θ
∂
∂Θ
ln p(Θ) = Θ−1
− S − ρ Γ (4.32)
Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1]
(4.31) (4.32)
Θ−1
− S − ρ Γ = 0 (4.
Σ = Θ−1
W Θ,W
=
Θ11 θ12
θT
12
θ22
, W =
W11 w12
wT
12
w22
, S =
S11 s12
sT
12
s22
(4.
(6 )
•
• b (7), (8)
•
66
β ≡ W11
−1
w12 b ≡ W11
−1/2
s12
,	(4.36)
W11β − s12 − ρ γ12 = 0 (4.39
Θ θ22  0 sign(θ12)
ign(β) (4.39)
=
∂
∂β
1
2
βT
W11β − βT
s12 + ρ||β||1
=
∂ 1
W 1/2
11
β − W 1/2
11
s12
2
− βT
s12 + βT
s12 −
1
W −1
11 s2
12 + ρ||β||1
31
θ12 = −θ22W −1
11 w12 = −θ22β (4.38)
(4.36)
W11β − s12 − ρ γ12 = 0 (4.39)
Θ θ  0 sign(θ ) =
11
β = W −1
11
w12 (4.36)
W11β − s12 − ρ γ12 = 0
Θ θ22  0 sign(
−sign(W −1
11
w12) = −sign(β) (4.39)
W11β − s12 − ρ γ12 =
∂
∂β
1
2
βT
W11β − βT
s12 + ρ||β||1
=
∂
∂β
1
2
W 1/2
11
β − W 1/2
11
s12
2
− βT
s12 + βT
s12 −
1
2
W −1
11 s2
12 +
=
∂
∂β
1
2
W 1/2
11
β − b
2
+ ρ||β||2
1 = 0
(4.35)
(9)
(10)
θ12 = −θ22W −1
11 w12 = −θ22β (4.38)
β = W −1
11
w12 (4.36)
W11β − s12 − ρ γ12 = 0 (4.39)
Θ θ22  0 sign(θ12) =
−sign(W −1
11
w12) = −sign(β) (4.39)
W11β − s12 − ρ γ12 =
∂
∂β
1
2
βT
W11β − βT
s12 + ρ||β||1
=
∂
∂β
1
2
W 1/2
11
β − W 1/2
11
s12
2
− βT
s12 + βT
s12 −
1
2
W −1
11 s2
12 + ρ||β||1
=
∂
∂β
1
2
W 1/2
11
β − b
2
+ ρ||β||2
1 = 0 (4.40)
(4.35)
Σ W w12 β
θ12 = −θ22W −1
11 w12 = −θ22β (4.38)
β = W −1
11
w12 (4.36)
W11β − s12 − ρ γ12 = 0 (4.39)
Θ θ22  0 sign(θ12) =
−sign(W −1
11
w12) = −sign(β) (4.39)
W11β − s12 − ρ γ12 =
∂
∂β
1
2
βT
W11β − βT
s12 + ρ||β||1
=
∂
∂β
1
2
W 1/2
11
β − W 1/2
11
s12
2
− βT
s12 + βT
s12 −
1
2
W −1
11 s2
12 + ρ||β||1
=
∂
∂β
1
2
W 1/2
11
β − b
2
+ ρ||β||2
1 = 0 (4.40)
(4.35)b
(11)
•
• GL W
Mizumder and Hastie (2012)
67
11W
12w
22w
12
T
w
W = S + rI
W (11) b
W
ˆβ w12 = W11
ˆβ
ˆw12
•
–
•
–
–
– Θ ← Θold
68
Q(Θ | Θold
) = ln p(xu, y | Θ)p(xu | y,Θold
)dxuxu
∫
(4.2)
Θ∗
= arg max
Θ
log det Θ − tr(SΘ) − ρ||Θ||1
||Θ||1
|V|×|V|
i,j=1
|Θij|
•
–
–
– 2*1183 + 1183*1182/2 70
•
–
–
–
–
–
69
70
71
41
0-5%
5-10%
10-50%
50-100%
72
43
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8 9 10 11
freqency
speed (km/h)
0 10 20 30 40 50 60 70 80 90 100
– 5–5
40
60
80
100
120
140
freqency
3
( 5–4) ( 5–5)
( 5–6)
0
100
200
300
400
500
600
700
800
900
1000
1 2 3 4 5 6 7 8 9 10 11
freqency
speed (km/h)
0 10 20 30 40 50 60 70 80 90 100
– 5–4
5–7
43
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8 9 10 11
freqency
speed (km/h)
0 10 20 30 40 50 60 70 80 90 100
– 5–5
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9 10 11
freqency
speed (km/h)
0 10 20 30 40 50 60 70 80 90 100
•
–
–
–
•
–
–
–
•
73
80 km/h over
60-80 km/h
40-60 km/h
20-40 km/h
0-20 km/h
74
80 km/h over
60-80 km/h
40-60 km/h
20-40 km/h
0-20 km/h
75
•
76
•
•
→ 77
46
0
5
10
15
20
25
30
0.1 0.05 0.01 0.005 0.001
計算時間(hour)
正則化パラメータρ
GGM: 25時間42分
78
79
80
•
–
0.1 over
0.1 ~ 0.05
-0.01 ~ -0.02
-0.02 under
81
•
–
– 82
•
1.
2. GGM
Graphical Lasso
3. EM
GGM, GL EM
4.
5. GL
•
–
– 83
1. Kataoka, S., Yasuda, M., Furtlehner, C., and Tanaka, K., : Traffic
data reconstruction based on Markov random field modeling,
Inverse Problems, 30025003, 2014.
2. Freedman, J., Hastie, T. and Tibshirani, R., :Sparse inverse
covariance estimation with the graphical lasso, Biostatistics, 9, 3,
pp. 432-441, 2008.
3. Mazumder, R., and Hastie, T. : The graphical lasso: New insights
and alternatives. Electronic journal of statistics, 6, pp. 2125-2149,
2012.
4. Dempster, A. P., Laird, N. M., and Rubin, D. B., :Maximum
Likelihood from Incomplete Data via the EM Algorithm, Journal
of the Royal Statistical Society. Series B (Methodological), 39, 1,
pp.1-38, 1977.
5. , , , :
, 12
ITS 2014 Peer-Review Proceedings, CD-ROM,
2014.
84
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
(Google scholar 21305)
• Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on
information theory, 52(4), 1289-1306. ( 19534)
• Olshausen, B. A.,  Field, D. J. (1996). Emergence of simple-cell receptive
field properties by learning a sparse code for natural images. Nature,
381(6583), 607. ( 4765)
• Candes, E. J.,  Tao, T. (2005). Decoding by linear programming. IEEE
transactions on information theory, 51(12), 4203-4215. ( 5488)
• Candes, E.,  Tao, T. (2007). The Dantzig selector: Statistical estimation
when p is much larger than n. The Annals of Statistics, 2313-2351. (
2603)
• Candès, E. J., Romberg, J.,  Tao, T. (2006). Robust uncertainty principles:
Exact signal reconstruction from highly incomplete frequency information.
IEEE Transactions on information theory, 52(2), 489-509. ( 12285)
•
•
•
•
•
•
•
•
• Rish and Grabarnik, Sparse Modeling Theory, Algorithms, and Applications,
CRC Press, 2014.
•
• Elder and Kutyniok, Compressed Sensing Theory and Applications, Cambridge
University Press, 2012.
•
• Cover, T. M.,  Van Campenhout, J. M. (1977). On the possible orderings in the measurement selection
problem. IEEE Transactions on Systems, Man, and Cybernetics, 7(9), 657-661.
• Bengio, Y., Courville, A.,  Vincent, P. (2013). Representation learning: A review and new perspectives.
IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828.
• Tomioka, R.,  Sugiyama, M. (2009). Dual-augmented Lagrangian method for efficient sparse
reconstruction. IEEE Signal Processing Letters, 16(12), 1067-1070.
• Eckstein, J.,  Bertsekas, D. P. (1992). On the Douglas—Rachford splitting method and the proximal
point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293-318.
• Boyd, S., Parikh, N., Chu, E., Peleato, B.,  Eckstein, J. (2011). Distributed optimization and statistical
learning via the alternating direction method of multipliers. Foundations and Trends® in Machine
Learning, 3(1), 1-122.
• Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso. Journal of computational and
graphical statistics, 7(3), 397-416.
• Daubechies, I., Defrise, M.,  De Mol, C. (2004). An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Communications on pure and applied mathematics, 57(11), 1413-
1457.
• Friedman, J., Hastie, T., Höfling, H.,  Tibshirani, R. (2007). Pathwise coordinate optimization. The
Annals of Applied Statistics, 1(2), 302-332.
• Wu, T. T.,  Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. The
Annals of Applied Statistics, 224-244.
• Beck, A.,  Tetruashvili, L. (2013). On the convergence of block coordinate descent type methods.
SIAM journal on Optimization, 23(4), 2037-2060.

Contenu connexe

Tendances

[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習Deep Learning JP
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門hoxo_m
 
強化学習その3
強化学習その3強化学習その3
強化学習その3nishio
 
【解説】 一般逆行列
【解説】 一般逆行列【解説】 一般逆行列
【解説】 一般逆行列Kenjiro Sugimoto
 
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 Ken'ichi Matsui
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoderKazuki Nitta
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明Haruka Ozaki
 
最適化計算の概要まとめ
最適化計算の概要まとめ最適化計算の概要まとめ
最適化計算の概要まとめYuichiro MInato
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割tn1031
 
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Sliced Wasserstein Distance for Learning Gaussian Mixture ModelsSliced Wasserstein Distance for Learning Gaussian Mixture Models
Sliced Wasserstein Distance for Learning Gaussian Mixture ModelsFujimoto Keisuke
 
画像認識のための深層学習
画像認識のための深層学習画像認識のための深層学習
画像認識のための深層学習Saya Katafuchi
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化yosupo
 
初めてのグラフカット
初めてのグラフカット初めてのグラフカット
初めてのグラフカットTsubasa Hirakawa
 
クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式Hiroshi Nakagawa
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章Kota Matsui
 

Tendances (20)

[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習[DL輪読会]Deep Learning 第15章 表現学習
[DL輪読会]Deep Learning 第15章 表現学習
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
強化学習その3
強化学習その3強化学習その3
強化学習その3
 
【解説】 一般逆行列
【解説】 一般逆行列【解説】 一般逆行列
【解説】 一般逆行列
 
グラフと木
グラフと木グラフと木
グラフと木
 
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
 
ELBO型VAEのダメなところ
ELBO型VAEのダメなところELBO型VAEのダメなところ
ELBO型VAEのダメなところ
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明
 
最適化計算の概要まとめ
最適化計算の概要まとめ最適化計算の概要まとめ
最適化計算の概要まとめ
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割
 
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Sliced Wasserstein Distance for Learning Gaussian Mixture ModelsSliced Wasserstein Distance for Learning Gaussian Mixture Models
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
 
深層学習 第6章
深層学習 第6章深層学習 第6章
深層学習 第6章
 
画像認識のための深層学習
画像認識のための深層学習画像認識のための深層学習
画像認識のための深層学習
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化
 
初めてのグラフカット
初めてのグラフカット初めてのグラフカット
初めてのグラフカット
 
クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式
 
双対性
双対性双対性
双対性
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章
 

Similaire à スパースモデリング

Introducing Geodesign: The Concept
Introducing Geodesign: The ConceptIntroducing Geodesign: The Concept
Introducing Geodesign: The ConceptEsri
 
Geodesign: Past, Present, and Future
Geodesign: Past, Present, and FutureGeodesign: Past, Present, and Future
Geodesign: Past, Present, and FutureEsri
 
Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...
Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...
Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...Stephen Graham
 
01 intro urban_geog
01 intro urban_geog01 intro urban_geog
01 intro urban_geogyseokho
 
My Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in GeographyMy Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in GeographyProf Ashis Sarkar
 
Roche_Medias Situes & Mobilites Partagees
Roche_Medias Situes & Mobilites PartageesRoche_Medias Situes & Mobilites Partagees
Roche_Medias Situes & Mobilites PartageesUniversité Laval
 
contribution of mathematicians.pdf
contribution of mathematicians.pdfcontribution of mathematicians.pdf
contribution of mathematicians.pdfKrishnankuttyAP
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
Sampling and Probability in Geography
Sampling and Probability in Geography Sampling and Probability in Geography
Sampling and Probability in Geography Prof Ashis Sarkar
 
Lecture 12 Theories of Urban Spatial Design
Lecture 12 Theories of Urban Spatial DesignLecture 12 Theories of Urban Spatial Design
Lecture 12 Theories of Urban Spatial Designrohayah3
 
Geography and its Contemporary Ambience
Geography and its Contemporary AmbienceGeography and its Contemporary Ambience
Geography and its Contemporary AmbienceIndianJournalofSpati
 
地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-
地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-
地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-Hirosaji
 
TS2-5: Jie Jiang from Japan Advanced Institute of Science and Technology
TS2-5: Jie Jiang from Japan Advanced Institute of Science and TechnologyTS2-5: Jie Jiang from Japan Advanced Institute of Science and Technology
TS2-5: Jie Jiang from Japan Advanced Institute of Science and TechnologyJawad Haqbeen
 
1 s2.0-s0198971512001093-main
1 s2.0-s0198971512001093-main1 s2.0-s0198971512001093-main
1 s2.0-s0198971512001093-mainLENIN Quintero
 
Spatial analysis on the provision of urban amenities and their deficiencies
Spatial analysis on the provision of urban amenities and their deficienciesSpatial analysis on the provision of urban amenities and their deficiencies
Spatial analysis on the provision of urban amenities and their deficienciesAlexander Decker
 
AP Human Geography: Unit 1 - Introduction to Geography
AP Human Geography: Unit 1 - Introduction to GeographyAP Human Geography: Unit 1 - Introduction to Geography
AP Human Geography: Unit 1 - Introduction to GeographyDaniel Eiland
 
Introto geography[1]
Introto geography[1]Introto geography[1]
Introto geography[1]AvonnaSwartz
 
GEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realitiesGEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realitiesProf Ashis Sarkar
 

Similaire à スパースモデリング (20)

Introducing Geodesign: The Concept
Introducing Geodesign: The ConceptIntroducing Geodesign: The Concept
Introducing Geodesign: The Concept
 
Geodesign: Past, Present, and Future
Geodesign: Past, Present, and FutureGeodesign: Past, Present, and Future
Geodesign: Past, Present, and Future
 
Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...
Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...
Graham, Stephen, and Patsy Healey. "Relational concepts of space and place: i...
 
01 intro urban_geog
01 intro urban_geog01 intro urban_geog
01 intro urban_geog
 
My Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in GeographyMy Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in Geography
 
Roche_Medias Situes & Mobilites Partagees
Roche_Medias Situes & Mobilites PartageesRoche_Medias Situes & Mobilites Partagees
Roche_Medias Situes & Mobilites Partagees
 
contribution of mathematicians.pdf
contribution of mathematicians.pdfcontribution of mathematicians.pdf
contribution of mathematicians.pdf
 
The history of geographic information systems invention and re invention of t...
The history of geographic information systems invention and re invention of t...The history of geographic information systems invention and re invention of t...
The history of geographic information systems invention and re invention of t...
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Sampling and Probability in Geography
Sampling and Probability in Geography Sampling and Probability in Geography
Sampling and Probability in Geography
 
Lecture 12 Theories of Urban Spatial Design
Lecture 12 Theories of Urban Spatial DesignLecture 12 Theories of Urban Spatial Design
Lecture 12 Theories of Urban Spatial Design
 
Geography and its Contemporary Ambience
Geography and its Contemporary AmbienceGeography and its Contemporary Ambience
Geography and its Contemporary Ambience
 
地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-
地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-
地図と認知プロセス -人間は地図を使って認知地図を獲得できるか-
 
mapping_systems
mapping_systemsmapping_systems
mapping_systems
 
TS2-5: Jie Jiang from Japan Advanced Institute of Science and Technology
TS2-5: Jie Jiang from Japan Advanced Institute of Science and TechnologyTS2-5: Jie Jiang from Japan Advanced Institute of Science and Technology
TS2-5: Jie Jiang from Japan Advanced Institute of Science and Technology
 
1 s2.0-s0198971512001093-main
1 s2.0-s0198971512001093-main1 s2.0-s0198971512001093-main
1 s2.0-s0198971512001093-main
 
Spatial analysis on the provision of urban amenities and their deficiencies
Spatial analysis on the provision of urban amenities and their deficienciesSpatial analysis on the provision of urban amenities and their deficiencies
Spatial analysis on the provision of urban amenities and their deficiencies
 
AP Human Geography: Unit 1 - Introduction to Geography
AP Human Geography: Unit 1 - Introduction to GeographyAP Human Geography: Unit 1 - Introduction to Geography
AP Human Geography: Unit 1 - Introduction to Geography
 
Introto geography[1]
Introto geography[1]Introto geography[1]
Introto geography[1]
 
GEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realitiesGEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realities
 

Dernier

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 

Dernier (20)

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 

スパースモデリング

  • 1.
  • 2.
  • 3.
  • 7. • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288. (Google scholar 21305) • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on information theory, 52(4), 1289-1306. ( 19534) • Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607. ( 4765) • Candes, E. J., & Tao, T. (2005). Decoding by linear programming. IEEE transactions on information theory, 51(12), 4203-4215. ( 5488) • Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351. ( 2603) • Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2), 489-509. ( 12285)
  • 8.
  • 10. OR SP (e.g. Uber) FD Macroscopic Fundamental Diagram (MFD) Built environment
  • 13. • • O(eN) (Cover and van Campenhout, 1977) • • • • • • • •
  • 20. • • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828. 8.2. s is uto- ared r f ostly sful PSD ding gni- lied 106], k of 1.2). wing usly hðtÞ k2 2; LEARNING Another important perspective on representation learning is based on the geometric notion of manifold. Its premise is the manifold hypothesis, according to which real-world data presented in high-dimensional spaces are expected to concentrate in the vicinity of a manifold M of much lower dimensionality dM, embedded in high-dimensional input space IRdx . This prior seems particularly well suited for AI tasks such as those involving images, sounds, or text, for which most uniformly sampled input configurations are unlike natural stimuli. As soon as there is a notion of “representation,” one can think of a manifold by consider- ing the variations in input space which are captured by or reflected (by corresponding changes) in the learned repre- sentation. To first approximation, some directions are well preserved (the tangent directions of the manifold), while others are not (directions orthogonal to the manifolds). With this perspective, the primary unsupervised learning task is then seen as modeling the structure of the data-supporting manifold.18 The associated representation being learned can be associated with an intrinsic coordinate system on the embedded manifold. The archetypal manifold modeling algorithm is, not surprisingly, also the archetypal low- Bengio et al. (2013) 多様体とは?(感覚的説明) • 見かけは違うが、実質的にはd次元ユーク リッド空間で表現できるような図形 • 「局所的に地図が書けるような図形」とも言え る(例:地球表面) 3次元中に埋め込まれた、1次元多様体 同じく、2次元多様体(「スイスロール」) 多様体とは?(感覚的説明) • 見かけは違うが、実質的にはd次元ユーク リッド空間で表現できるような図形 • 「局所的に地図が書けるような図形」とも言え る(例:地球表面) 3次元中に埋め込まれた、1次元多様体 同じく、2次元多様体(「スイスロール」)
  • 21. • x A y • y = A x A y x • • A y x
  • 22. • || y - A x ||2 x • β i=1 1 n E{xi,yi}n i=1 2 n i=1 l(yi, x⊤ i ˆβ) + 2| = E{xi,yi}n i=1 EX,Y 2l(Y, X⊤ ˆβ) ˆx = arg min x 1 2 ||y − Ax||2 + λ||x|| [1] Alonso, W.: Location and Land Use, Harvard University Press, 1964. [2] Mills, E.S.: An aggregative model of resource allocation in a metro Economic Review, Vol.57, No.2, pp.197–210, 1967. [3] Muth, R.F.: Cities and Housing, University of Chicago Press, 1969. [4] Bairoch, P.: Cities and Economic Development: From the Dawn of University of Chicago Press, 1988. [5] Hohenberg, P., Lees, L.H.: The Making of Urban Europe (1000-195 i=1 = E{xi,yi}n i=1 EX,Y 2l(Y, X⊤ ˆβ) + O( 1 n2 ) (3) ˆx = arg min x 1 2 ||y − Ax||2 + λ||x|| (4) ˆx = arg min x 1 2 ||y − Ax||2 (5) ||x|| ≤ k (6) Land Use, Harvard University Press, 1964. ve model of resource allocation in a metropolitan area, American No.2, pp.197–210, 1967. ousing, University of Chicago Press, 1969. Economic Development: From the Dawn of History to the Present, subject to
  • 23. ( ) = * |,-| . -/) ( 0 = * ,- 0 . -/) ( 2 = max |,-| ( ) + 7 ( 0 0
  • 24. • • • β L0 • • • • • • min β 2 n i=1 l(yi, x⊤ i β) + 2||β||0 Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, Amer view, Vol.57, No.2, pp.197–210, 1967. Cities and Housing, University of Chicago Press, 1969. Cities and Economic Development: From the Dawn of History to the Pres Chicago Press, 1988. P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard Univer determination of bid rents through bidding procedures, Journal of Urban E + 2||β||0 (1) versity Press, 1964. llocation in a metropolitan area, American . icago Press, 1969. From the Dawn of History to the Present, an Europe (1000-1950), Harvard University : 2017 9 21 l(yi, x⊤ i β) + 2||β||0 (1) 1 n E{xi,yi}n i=1 2 n i=1 l(yi, x⊤ i ˆβ) + 2||ˆβ||0 (2) = E{xi,yi}n i=1 EX,Y 2l(Y, X⊤ ˆβ) + O( 1 n2 ) (3) on and Land Use, Harvard University Press, 1964. ggregative model of resource allocation in a metropolitan area, American Vol.57, No.2, pp.197–210, 1967. and Housing, University of Chicago Press, 1969.
  • 25. • • • • Bayesでは事後確率は 観測データの確率×事前確率 事後確率を最大化するパラメタηを求めたい ここで対数尤度にしてみると、次のように解釈で |log|logmaxargˆ ||maxargˆ PXP PXP パラは事前分布のハイパー 損失関数 正則化項 Bayesでは事後確率は 観測データの確率×事前確率 事後確率を最大化するパラメタηを求めたい ここで対数尤度にしてみると、次のように解釈でき |log|logmaxargˆ ||maxargˆ PXP PXP パラメは事前分布のハイパー 損失関数 正則化項 Bayesでは事後確率は 観測データの確率×事前確率 事後確率を最大化するパラメタηを求めたい ここで対数尤度にしてみると、次のように解釈できる |log|logmaxargˆ ||maxargˆ PXP PXP パラメタは事前分布のハイパー 損失関数 正則化項 ノルムによる正則化項     とすると  事前分布の重みをここで、        も同様にすると事前分布 L2 2 ),( 2 1 maxarg ,0 2 1 ),( 2 1 minarg ),|(log),|(logminarg 2/),|(log ,| 2/),()1,),(|(log)1,|(log )1,0()( 2 2 2 wwwx wwwx ww,x www w wxwxw,x wx w w w T i ii T i ii i ii T i ii i ii i ii φy φy pyp p p φyφyNyp Nφy 事前分布のwの 分散:λー1 とも見 える。 例:事前分布がLaplace分布、事後分布が正規分布    も同様にすると分布の事前分布は期待値 )|(log),|(logminarg 2 )|(log 2 exp 4 |0 2/),()1,),(|(log)1,|(log )1,0()( 2 ww,x w w w w wxwxw,x wx ii i ii i ii i ii pyp p pLaplace φyφyNyp Nφy 例:事前分布がLaplace分布、事後分布が正規分布 ノルムによる正則化項         も同様にすると分布の事前分布は期待値 L1 2 ),( 2 1 minarg )|(log),|(logminarg 2 )|(log 2 exp 4 |0 2/),()1,),(|(log)1,|(log )1,0()( 2 2 wwx ww,x w w w w wxwxw,x wx w w i ii i ii i ii i ii i ii φy pyp p pLaplace φyφyNyp Nφy
  • 27. • • • • • • ||x|| ≤ k w L(w) f(w) = L(w) + λ||w||1 n Rd f(w) = min w∈Rd L(w) + λ||w||1 ( Use, Harvard University Press, 1964. odel of resource allocation in a metropolitan area, Ameri , pp.197–210, 1967. ||x|| ≤ k (6 w (7 L(w) (8 f(w) = L(w) + λ||w||1 (9 n Rd f(w) = min w∈Rd L(w) + λ||w||1 (10 Use, Harvard University Press, 1964. odel of resource allocation in a metropolitan area, America , pp.197–210, 1967. ˆx = arg min x 1 2 ||y − Ax||2 ||x|| ≤ k w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1 and Land Use, Harvard University Press, 1964. ˆx = arg min x 1 2 ||y − Ax||2 ||x|| ≤ k w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1
  • 28. • η • η • wj ηj • • w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1 ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w2 j ηj + ηj W.: Location and Land Use, Harvard University Press, 1964. S.: An aggregative model of resource allocation in a metropolitan area, A 1 w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1 ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w2 j ηj + ηj w2 j ηj + ηj ≥ 2||w||1 1 L(w f(w) = L(w) + λ||w| min w∈Rd f(w) = min w∈Rd L(w) + λ||w| ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w ηj w2 j ηj + ηj ≥ 2||w||1 ηj = |wj| 1
  • 29. • η • • • min w∈Rd L(w) + λ||w||1 = min w,η∈Rd,ηj ≥0 L(w) + λ 2 d j=1 w2 j ηj + λ 2 d j=1 ηj W.: Location and Land Use, Harvard University Press, 1964. .S.: An aggregative model of resource allocation in a metropolitan area, ic Review, Vol.57, No.2, pp.197–210, 1967. R.F.: Cities and Housing, University of Chicago Press, 1969. P.: Cities and Economic Development: From the Dawn of History to the ty of Chicago Press, 1988. erg, P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard U 985. Y.: A determination of bid rents through bidding procedures, Journal of U ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w2 j ηj + ηj ≥ 2||w||1 ηj = |wj| 1 min w∈Rd L(w) + λ||w||1 = min w,η∈Rd,ηj ≥0 L(w) + λ 2 d j=1 w2 j ηj + λ 2 d j=1 ηj (14) 1. j = 1, . . . , d η1 j = 1 2. a wt wt = arg min w∈Rd ⎛ ⎝L(w) + λ 2 d j=1 w2 j ηt j ⎞ ⎠ (15) b ηt+1 j ηt+1 j = |wt j| j = 1, . . . , d (16) [1] Alonso, W.: Location and Land Use, Harvard University Press, 1964.
  • 30. • • • • • wt = arg min w∈Rd ⎛ ⎝L(w) + λ 2 d j=1 w2 j ηt j ⎞ ⎠ 1 ηt+1 j = |wt j| j = 1, . . . , d proxg(y) = arg min w∈Rd 1 2 ||y − w||2 2 + g(w) W.: Location and Land Use, Harvard University Press, 1964. S.: An aggregative model of resource allocation in a metropolitan area, A Review, Vol.57, No.2, pp.197–210, 1967. F.: Cities and Housing, University of Chicago Press, 1969.
  • 31. • • • • j j proxg(y) = arg min w∈Rd 1 2 ||y − w||2 2 + g(w) proxl1 λ (y) = arg min w∈Rd 1 2 ||y − w||2 2 + λ||w||1 proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ : Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, Am Review, Vol.57, No.2, pp.197–210, 1967. : Cities and Housing, University of Chicago Press, 1969. j ηt+1 j = |wt j| j = 1, . . . , d ( proxg(y) = arg min w∈Rd 1 2 ||y − w||2 2 + g(w) ( proxl1 λ (y) = arg min w∈Rd 1 2 ||y − w||2 2 + λ||w||1 ( proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ , W.: Location and Land Use, Harvard University Press, 1964. E.S.: An aggregative model of resource allocation in a metropolitan area, Ameri mic Review, Vol.57, No.2, pp.197–210, 1967. R.F.: Cities and Housing, University of Chicago Press, 1969. z ST(z) λ -λ λ λ λ
  • 32. • • • j • • • prox 1 λ (y) = arg min w∈Rd 2 ||y − w||2 + λ||w||1 ( proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ 1 2 ||y − w||2 2 + λ||w||1 = d j=1 1 2 (yj − wj)2 + λ|wj| ( yj − wj ∈ λ∂|wj| j = 1, . . . , d ( , W.: Location and Land Use, Harvard University Press, 1964. E.S.: An aggregative model of resource allocation in a metropolitan area, Ameri mic Review, Vol.57, No.2, pp.197–210, 1967. R.F.: Cities and Housing, University of Chicago Press, 1969. h, P.: Cities and Economic Development: From the Dawn of History to the Prese sity of Chicago Press, 1988. prox 1 λ (y) = arg min w∈Rd 2 ||y − w||2 2 + λ||w||1 proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ 1 2 ||y − w||2 2 + λ||w||1 = d j=1 1 2 (yj − wj)2 + λ|wj| yj − wj ∈ λ∂|wj| j = 1, . . . , d Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, Am Review, Vol.57, No.2, pp.197–210, 1967. roxl1 λ (y) = arg min w∈Rd 1 2 ||y − w||2 2 + λ||w||1 (18) y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ − w||2 2 + λ||w||1 = d j=1 1 2 (yj − wj)2 + λ|wj| (19) yj − wj ∈ λ∂|wj| j = 1, . . . , d (20) ∂|w| = ⎧ ⎨ ⎩ −1, if w 0 [−1, 1], if w = 0 1, if w 0 w |w|
  • 33. • f wt • w0 • ηt • • • yj − wj ∈ λ∂|wj| j = 1, . . . , d ∂|w| = ⎧ ⎨ ⎩ −1, if w 0 [−1, 1], if w = 0 1, if w 0 wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − wt ||2 2 so, W.: Location and Land Use, Harvard University Press, 1964. s, E.S.: An aggregative model of resource allocation in a metropolitan area, Ame nomic Review, Vol.57, No.2, pp.197–210, 1967. h, R.F.: Cities and Housing, University of Chicago Press, 1969. 2 yj − wj ∈ λ∂|wj| j = 1, . . . , d ∂|w| = ⎧ ⎨ ⎩ −1, if w 0 [−1, 1], if w = 0 1, if w 0 wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − wt ||2 2 wt+1 = proxl1 λ,ηt wt − ηt∇L(wt ) W.: Location and Land Use, Harvard University Press, 1964.
  • 34. • wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − w wt+1 = proxl1 λ,ηt wt − ηt∇L(wt ) [1] Alonso, W.: Location and Land Use, Harvard University Press, 1964. 2
  • 35. • X L • • • • • 1, if w 0 wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − wt ||2 2 (21) wt+1 = proxl1 λ,ηt wt − ηt∇L(wt ) (22) min x∈Rn f(x) (23) 2 gj(x) = 0 j = 1, . . . , p (24) g(x) = (g1(x), . . . , gp(x))⊤ (25) W.: Location and Land Use, Harvard University Press, 1964. S.: An aggregative model of resource allocation in a metropolitan area, American gj(x) = 0 j = 1, . . . , p g(x) = (g1(x), . . . , gp(x))⊤ nso, W.: Location and Land Use, Harvard University Press, 1964. s, E.S.: An aggregative model of resource allocation in a metropolitan area, Am nomic Review, Vol.57, No.2, pp.197–210, 1967. s.t. gj(x) = 0 j = 1, . . . , p (2 g(x) = (g1(x), . . . , gp(x))⊤ (2 Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (2
  • 36. • • • • x* y* g(x) = (g1(x), . . . , gp(x))⊤ (25) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (26) Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, American Review, Vol.57, No.2, pp.197–210, 1967. : Cities and Housing, University of Chicago Press, 1969. : Cities and Economic Development: From the Dawn of History to the Present, of Chicago Press, 1988. P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard University A determination of bid rents through bidding procedures, Journal of Urban Eco- .27, Issue.2, pp.188–211, 1990. gj(x) = 0 j = 1, . . . , p (24) g(x) = (g1(x), . . . , gp(x))⊤ (25) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (26) min x∈Rn max y∈Rp Lρ(x, y) (27) .: Location and Land Use, Harvard University Press, 1964. .: An aggregative model of resource allocation in a metropolitan area, American Review, Vol.57, No.2, pp.197–210, 1967. s.t. gj(x) = 0 j = 1, . . . , p g(x) = (g1(x), . . . , gp(x)) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 | min x∈Rn max y∈Rp Lρ(x, y) ∇g1(x∗ ), . . . , ∇gp(x∗ ) gj(x) = 0 j = 1, . . . , p (24) g(x) = (g1(x), . . . , gp(x))⊤ (25) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (26) min x∈Rn max y∈Rp Lρ(x, y) (27) ∇g1(x∗ ), . . . , ∇gp(x∗ ) (28) ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 (29) gj(x∗ ) = 0, j = 1, . . . , p (30) (3.1) (3.2) (3.3)
  • 37. • • x* • • x* x y* y • y* x* • x* y* ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 gj(x∗ ) = 0, j = 1, . . . , p ∇xLρ(x, y) = ∇f(x) + p j=1 yj∇gj(x) + ρ p j=1 gj(x)∇gj(x) 2 min x∈Rn max y∈Rp Lρ(x, y) ∇g1(x∗ ), . . . , ∇gp(x∗ ) ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 gj(x∗ ) = 0, j = 1, . . . , p ∇g1(x∗ ), . . . , ∇gp(x∗ ) ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 gj(x∗ ) = 0, j = 1, . . . , p ∇xLρ(x, y) = ∇f(x) + p j=1 yj∇gj(x) + ρ p j=1 gj(x)∇gj(x) ∇xLρ(x, y∗ )|x=x∗ = ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0
  • 38. • • • ∇xLρ(x, y) = ∇f(x) + j=1 yj∇gj(x) + ρ j=1 gj(x)∇gj(x) ∇xLρ(x, y∗ )|x=x∗ = ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 2
  • 39. • • • f • • 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} 3 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ ∇xLρ(x, y∗ )|x=x∗ = ∇f(x∗ ) + j=1 y∗ j ∇gj( 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ ✲ x(1) ✲ x(2) ✲ x(3) ✲ ✻ f x(4) ✲ ✻ f x(5) ✲ ✻ f x(6) 34 ✲ x ✻ y f(x) p −f•(p) ✲ x ✻ y
  • 40. • • f g • • • 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ X ∈ Rn×d min w∈Rd (f(Xw) + g(w)) = min α∈R w∗ , α∗ w∗ ∈ ∂g∗ ( α∗ ∈ −∂f 3 → R ∪ {+∞} = sup{⟨s, x⟩ − f(x)|x ∈ Rn } → R ∪ {+∞} n×d min w∈Rd (f(Xw) + g(w)) = min α∈Rn −f∗ (−α) − g∗ (X⊤ α) w∗ ∈ ∂g∗ (X⊤ α∗ ) α∗ ∈ −∂f(Xw∗ ) 3 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ X ∈ Rn×d min w∈Rd (f(Xw) + g(w)) = min α∈Rn −f∗ w∗ , α∗ w∗ ∈ ∂g∗ (X⊤ α∗ α∗ ∈ −∂f(Xw∗ 3 2 +∞} , x⟩ − f(x)|x ∈ Rn } {+∞} min w∈Rd (f(Xw) + g(w)) = min α∈Rn −f∗ (−α) − g∗ (X⊤ α) (33) w∗ ∈ ∂g∗ (X⊤ α∗ ) (34) α∗ ∈ −∂f(Xw∗ ) (35) (36) 3
  • 41. • L • • min w∈Rd fl(Xw) + λ||w||1 (37 max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (38 min w∈Rd fl(Xw) + λ||w||1 (3 max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (3 fl λ|| ||1 λ min w∈Rd fl(Xw) + λ||w||1 max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min w∈Rd fl(Xw) + λ||w||1 ( max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) ( δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) ( X⊤ α = v ( η min w∈Rd fl(Xw) + λ||w||1 (37) max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (38) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) (39) X⊤ α = v (40) s.t.
  • 42. • • • max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (38) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) (39) X⊤ α = v (40) min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 (41) ) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 (42) s.t. min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w)
  • 43. • • • • • (Tomioka and Sugiyama, 2009) • • Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α∈Rn,v∈Rd l ∞ 2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α, v (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt wt+1 = wt + η X⊤ αt+1 − vt+1 α∈Rn,v∈Rd 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 α∈Rn,v∈Rd l | Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ max w∈Rd min α∈Rn,v α, v (αt+1 , vt+1 ) = arg wt wt+1 = wt + η α∈Rn,v∈Rd 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 (3.4) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α, v (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. wt + ηX⊤ α min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α, v (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt wt wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ( ˆwt = wt + ηX⊤ α
  • 44. • v • • v α • α α w • 1. w0 2. φ(αt) αt+1 3. wt wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. + ηX⊤ α wt+1 = wt + η X⊤ αt+1 − vt+1 (45 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. (46 X⊤ α ˆwt − ηvt+1 = proxl1 λ,η( ˆwt ) (47 wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. wt + ηX⊤ α ˆwt − ηvt+1 = proxl1 λ,η( ˆwt ) φt(α) = f∗ l (−α) + 1 2ηt proxl1 λ,ηt ( ˆwt + ηtX⊤ α) 2 2 (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) (44 wt+1 = wt + η X⊤ αt+1 − vt+1 (45 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. (46 X⊤ α ˆwt − ηvt+1 = proxl1 λ,η( ˆwt ) (47 φt(α) = f∗ l (−α) + 1 2ηt proxl1 λ,ηt ( ˆwt + ηtX⊤ α) 2 2 (48 wt+1 = proxl1 λ,ηt wt + ηtX⊤ αt+1 (49
  • 45. • • • α, v, w • Eckstein and Bertsekas(1992) Boyd et al. (2010) • • min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 η l α, v (α wt αt+1 = arg min α∈Rn Lη(α, vt , wt ) ( vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) ( wt+1 = wt + η X⊤ αt+1 − vt+1 (
  • 47. • • βj λ βk • • vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| vt+1 = arg min v∈Rd Lη(αt+1 , v, w wt+1 = wt + η X⊤ αt+1 − v ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj αt+1 = arg min α∈Rn Lη(α, vt , wt ) vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| α∈Rn vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) (51) wt+1 = wt + η X⊤ αt+1 − vt+1 (52) o = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ (53) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| (54) yi − ˜y (j) i = yi − k̸=j xik ˜βk(λ) (55) αt+1 = arg min α∈Rn Lη(α, vt , wt ) (50) vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) (51) wt+1 = wt + η X⊤ αt+1 − vt+1 (52) ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ (53) R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| (54) yi − ˜y (j) i = yi − k̸=j xik ˜βk(λ) (55) ˜β(λ) ← TH d i=1 xij(yij − ˜β (j) i ), λ (56) TH( )
  • 48. • Fu(1998) Daubechies et al. (2004) Friedman et al. (2007) Wu and Lange (2008) • • Friedman et al., 2010) • • • • • (Beck and Tetruashvili, 2013)
  • 49.
  • 51. 51
  • 52. 52
  • 56. • – – – 56 x1 x2 e.g. x1, x2 1, 2 x2 P(x2) x2 P(x2) x1 x2 P(x1)
  • 57. (Gaussian Markov Random Field) • • µ, Q • x = (x1,…,xL) xo, xu • 57 p(xu | xo,µ,Θ) = N(x | µ,Θ−1 ) δ(yi − xi ) i∈O ∏ dxoxo ∫ = N(xu | µu −Θuu −1 Θuo (xo −µo ),Θuu −1 ) Θ = Θuu Θou Θuo Θoo # $ $ % ' ' µ = µu µo ! # # $ % δ(⋅) ˆxu = argmax xu N(xu | µu −Θuu −1 Θuo (xo −µo ),Θuu −1 ) (1)
  • 58. – – Gaussian Graphical Model (GGM) ( , 2014; Kataoka et al., 2014) – – – (Graphical Lasso; GL) – – 58
  • 59. • – • – – Graphical Lasso (Friedman et al., 2007) 59 : 0 0 Θ = Θ =
  • 60. 60 • x • – – V+2 ( V + V2/2) Θ µ Z(β, γ, α) = exp 1 2 βT Θ−1 β ∞ −∞ exp − 1 2 (x − µ)T Θ(x − µ) dx (4.16) (3.2)(3 (2)) Z(β, γ, α) = exp 1 2 βT Θ−1 β (2π)Ndet(Θ−1) (4.17) (4.16) (4.17) p(x|β, γ, α) = 1 (2π)Ndet(Θ−1) exp − 1 2 (x − µ)T Θ(x − µ) (4.18) GGM(4.11) Kataoka et al. GGM GGM Θij ≡ ε + ∂(i) −1 0 % '' ( ' ' i = j (i, j) E otherwise µ ≡ 1 η Θ−1 β i G(V, E) GGM p(x|β, η) ∝ exp βT x − ηϵ 2 i∈V x2 i − η 2 (i,j)∈E (xi − xj)2 (4.19) β η ϵ (4.19) (4.11) ηϵ γi η α (4.19) i (4.19) GGM x ∂(i) (4.19) exp βx − ηϵ 2 i∈V x2 i − η 2 (i,j)∈E (xi − xj)2 = exp βx − η 2 i∈V ϵ + |∂(i)| x2 i + η (i,j)∈E xixj (4.20 = exp − η 2 (x − µ)T Θ(x − µ) + η 2 βT Θ−1 β p(x|β, η) = ηN det C (2π)N exp − η 2 (x − µ)T Θ(x − µ) Θ Θij = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪ ϵ + |∂(i)| i = j −1 (i, j) ∈ E or (j, i) ∈ E (2) (3)
  • 61. 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 (4.27) p(xd u|yd , θ) = η|Ud| det Θud (2π)|Ud| exp − η 2 xd u − µud T ΘUd xd u − µud (4.28) • • – – – yd xd • – – 61 θ0 Estep Mstep Estep Q(θ, θold ) Q(θ|θold ) = x1 u x2 u . . . xD u ln D d=1 p(zd |Θ) D d=1 p(xd u|yd , θold )dx1 udx2 u . . . dx = d xd u ln p(zd |θ)p(xd u|yd , θold )dxd u Estep Mstep Estep (4.25) Estep Q(θ, θ ) Q(θ|θold ) = x1 u x2 u . . . xD u ln D d=1 p(zd |Θ) D d=1 p(xd u|yd , θold )dx1 udx2 u . . . d = d xd u ln p(zd |θ)p(xd u|yd , θold )dxd u Estep Mstep Estep (4.25) 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 (4.27) 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 (4.27) p(xd u|yd , θ) = η|Ud| det Θud (2π)|Ud| exp − η 2 xd u − µud T ΘUd xd u − µud (4.28) 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 u η ud Θud := ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ ϵ + |∂(i)| i ∈ Ud −1 (i, j) ∈ Ωd 3 or (j, i) ∈ Ωd 3 0 otherwise (4.25) (4.26) (4.28) Q(θ|θold ) Mstep 4.3.2 Q(θ|θold ) θ ˆθ = arg max θ Q(θ|θold ) 28 E-step Q(θ|θold ) ∂Q(β, η|βold , ηold ) ∂βi ∝ 1 D d Ed[zi] − Θ−1 β i (4.29a) E-step Q(θ|θold ) ∂Q(β, η|βold , ηold ) ∂βi ∝ 1 D d Ed[zi] − Θ−1 β ∂Q(β, η|βold , ηold ) ∂η ∝ (i,j)∈E 1 D d Ed[zizj] − N 2η − 1 2 i∈V (ϵ + |∂(i)closed form (4)
  • 62. • • – • 62 3.2 p(x|µ, Θ) := 1 Z(Θ) exp − 1 2 (x − µ)T Θ(x − µ) Σ Θ ln p(Θ, µ) = D 2 log det Θ − 1 2 d (xd − µ)T Θ(xd − µ) + const xd D D d xd = (xd 1 , xd 2 , ..., xd |V| ) 2 µ, Θ µ 15 p(x|µ, Θ) := Z(Θ) exp − 2 (x − µ)T Θ(x − µ) (4.1) Σ Θ ln p(Θ, µ) = D 2 log det Θ − 1 2 d (xd − µ)T Θ(xd − µ) + const (4.2) xd D D d xd = (xd 1 , xd 2 , ..., xd |V| ) 2 µ, Θ µ 15 1 (4.2) Θ∗ = arg max Θ log det Θ − tr(SΘ) − ρ||Θ||1 (4.31) ||Θ||1 |V|×|V| i,j=1 |Θij| µ (4.7) Θ (4.31) L1 Θij = 0 4.3.2 Θ = Θij j |V| ∑ i |V| ∑ Q1 Q2 (5)
  • 63. • – – • Friedman et al. (2007) Graphical Lasso – L1 • GL 63
  • 64. • Q • S G • Q-1* W – 64 (4.21) L1 (4.31) Θ ∂ ∂Θ ln p(Θ) = Θ−1 − S − ρ Γ (4.32) Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.3 Σ = Θ−1 W Θ,W Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.3 θ12, w12, s12 θ22, w22, s22 (4.33) GL Γij = sign(Θij ) ∈ [−1,1] % ' (' if Qij ≠ 0 if Qij = 0 Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.33) Σ = Θ−1 W Θ,W Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.34) θ12, w12, s12 θ22, w22, s22 (4.33) GL Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.33) Σ = Θ−1 W Θ,W Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.34) θ12, w12, s12 θ22, w22, s22 (4.33) GL Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.33 Σ = Θ−1 W Θ,W S Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.34 θ12, w12, s12 θ22, w22, s22 (4.33) GL 11W 12w 22w 12 T w (6)
  • 65. • • (6) (7) • WQ=I – 65 L1 −1 b = W −1/2 11 s12 35) (4.34) (4.33 w12 − s12 − ρ γ12 = 0 W Θ = I 1 θ12 2 θ22 = W11Θ11 + w12θT 12 W11θ12 + θ22w12 θT 12 W + θwT 12 wT 12 θ + w22θ22 = I 0 0T 1 ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||1 = 0 (4.35 β L1 β W −1 11 w12 β ∈ R|V|−1 b = W −1/2 11 s12 (4.35) (4.34) (4.33) w12 − s12 − ρ γ12 = 0 (4.36 W Θ = I W11 w12 wT 12 w22 Θ11 θ12 θT 12 θ22 = W11Θ11 + w12θT 12 W11θ12 + θ22w12 θT 12 W + θwT 12 wT 12 θ + w22θ22 = I 0 0T 1 (4.37 W11θ12 + θ22w12 = 0 4.35) (4.34) (4.33) w12 − s12 − ρ γ12 = 0 (4.36 W Θ = I Θ11 θ12 θT 12 θ22 = W11Θ11 + w12θT 12 W11θ12 + θ22w12 θT 12 W + θwT 12 wT 12 θ + w22θ22 = I 0 0T 1 (4.37 W11θ12 + θ22w12 = 0 (7) (8) L1 (4.31) Θ ∂ ∂Θ ln p(Θ) = Θ−1 − S − ρ Γ (4.32) Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4. Σ = Θ−1 W Θ,W = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4. (6 )
  • 66. • • b (7), (8) • 66 β ≡ W11 −1 w12 b ≡ W11 −1/2 s12 , (4.36) W11β − s12 − ρ γ12 = 0 (4.39 Θ θ22 0 sign(θ12) ign(β) (4.39) = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ 1 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 W −1 11 s2 12 + ρ||β||1 31 θ12 = −θ22W −1 11 w12 = −θ22β (4.38) (4.36) W11β − s12 − ρ γ12 = 0 (4.39) Θ θ 0 sign(θ ) = 11 β = W −1 11 w12 (4.36) W11β − s12 − ρ γ12 = 0 Θ θ22 0 sign( −sign(W −1 11 w12) = −sign(β) (4.39) W11β − s12 − ρ γ12 = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 2 W −1 11 s2 12 + = ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||2 1 = 0 (4.35) (9) (10) θ12 = −θ22W −1 11 w12 = −θ22β (4.38) β = W −1 11 w12 (4.36) W11β − s12 − ρ γ12 = 0 (4.39) Θ θ22 0 sign(θ12) = −sign(W −1 11 w12) = −sign(β) (4.39) W11β − s12 − ρ γ12 = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 2 W −1 11 s2 12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||2 1 = 0 (4.40) (4.35) Σ W w12 β θ12 = −θ22W −1 11 w12 = −θ22β (4.38) β = W −1 11 w12 (4.36) W11β − s12 − ρ γ12 = 0 (4.39) Θ θ22 0 sign(θ12) = −sign(W −1 11 w12) = −sign(β) (4.39) W11β − s12 − ρ γ12 = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 2 W −1 11 s2 12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||2 1 = 0 (4.40) (4.35)b (11)
  • 67. • • GL W Mizumder and Hastie (2012) 67 11W 12w 22w 12 T w W = S + rI W (11) b W ˆβ w12 = W11 ˆβ ˆw12
  • 68. • – • – – – Θ ← Θold 68 Q(Θ | Θold ) = ln p(xu, y | Θ)p(xu | y,Θold )dxuxu ∫ (4.2) Θ∗ = arg max Θ log det Θ − tr(SΘ) − ρ||Θ||1 ||Θ||1 |V|×|V| i,j=1 |Θij|
  • 69. • – – – 2*1183 + 1183*1182/2 70 • – – – – – 69
  • 70. 70
  • 72. 72 43 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100 – 5–5 40 60 80 100 120 140 freqency 3 ( 5–4) ( 5–5) ( 5–6) 0 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100 – 5–4 5–7 43 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100 – 5–5 0 20 40 60 80 100 120 140 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100
  • 74. 80 km/h over 60-80 km/h 40-60 km/h 20-40 km/h 0-20 km/h 74
  • 75. 80 km/h over 60-80 km/h 40-60 km/h 20-40 km/h 0-20 km/h 75
  • 77. • • → 77 46 0 5 10 15 20 25 30 0.1 0.05 0.01 0.005 0.001 計算時間(hour) 正則化パラメータρ GGM: 25時間42分
  • 78. 78
  • 79. 79
  • 80. 80
  • 81. • – 0.1 over 0.1 ~ 0.05 -0.01 ~ -0.02 -0.02 under 81
  • 83. • 1. 2. GGM Graphical Lasso 3. EM GGM, GL EM 4. 5. GL • – – 83
  • 84. 1. Kataoka, S., Yasuda, M., Furtlehner, C., and Tanaka, K., : Traffic data reconstruction based on Markov random field modeling, Inverse Problems, 30025003, 2014. 2. Freedman, J., Hastie, T. and Tibshirani, R., :Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 3, pp. 432-441, 2008. 3. Mazumder, R., and Hastie, T. : The graphical lasso: New insights and alternatives. Electronic journal of statistics, 6, pp. 2125-2149, 2012. 4. Dempster, A. P., Laird, N. M., and Rubin, D. B., :Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39, 1, pp.1-38, 1977. 5. , , , : , 12 ITS 2014 Peer-Review Proceedings, CD-ROM, 2014. 84
  • 85. • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288. (Google scholar 21305) • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on information theory, 52(4), 1289-1306. ( 19534) • Olshausen, B. A., Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607. ( 4765) • Candes, E. J., Tao, T. (2005). Decoding by linear programming. IEEE transactions on information theory, 51(12), 4203-4215. ( 5488) • Candes, E., Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351. ( 2603) • Candès, E. J., Romberg, J., Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2), 489-509. ( 12285)
  • 86. • • • • • • • • • Rish and Grabarnik, Sparse Modeling Theory, Algorithms, and Applications, CRC Press, 2014. • • Elder and Kutyniok, Compressed Sensing Theory and Applications, Cambridge University Press, 2012. •
  • 87. • Cover, T. M., Van Campenhout, J. M. (1977). On the possible orderings in the measurement selection problem. IEEE Transactions on Systems, Man, and Cybernetics, 7(9), 657-661. • Bengio, Y., Courville, A., Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828. • Tomioka, R., Sugiyama, M. (2009). Dual-augmented Lagrangian method for efficient sparse reconstruction. IEEE Signal Processing Letters, 16(12), 1067-1070. • Eckstein, J., Bertsekas, D. P. (1992). On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293-318. • Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1), 1-122. • Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso. Journal of computational and graphical statistics, 7(3), 397-416. • Daubechies, I., Defrise, M., De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics, 57(11), 1413- 1457. • Friedman, J., Hastie, T., Höfling, H., Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302-332. • Wu, T. T., Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 224-244. • Beck, A., Tetruashvili, L. (2013). On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4), 2037-2060.