Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society. Serie...
•
•
•
•
•
•
•
•
•
•
•
•
OR
SP (e.g. Uber)
FD
Macroscopic Fundamental
Diagram (MFD)
Built environment
•
•
•
•
•
•
•
O(eN)
(Cover and van Campenhout, 1977)
•
•
•
•
•
•
•
•
•
•
•
2
0
-2
0.5
0
0
1
0.7
0
0
-2
0.7
3
0
0.5
0.5
0
0
2
0
0
0
0.5
0
4
Σ =# = (0,0,0,0,0)
µ
•
Σ
•
Σ
•
•
•
•
•
•
•
•
•
•
•
•
•
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives.
IEEE transact...
• x A y
• y = A x A
y x
•
•
A
y x
• || y - A x ||2 x
•
β
i=1
1
n
E{xi,yi}n
i=1
2
n
i=1
l(yi, x⊤
i
ˆβ) + 2|
= E{xi,yi}n
i=1
EX,Y 2l(Y, X⊤ ˆβ)
ˆx = arg min
x
...
( ) = * |,-|
.
-/)
( 0 = * ,-
0
.
-/)
( 2 = max |,-|
( ) + 7 ( 0
0
•
•
• β L0
•
•
•
•
•
•
min
β
2
n
i=1
l(yi, x⊤
i β) + 2||β||0
Location and Land Use, Harvard University Press, 1964.
An agg...
•
•
•
•
Bayesでは事後確率は
観測データの確率×事前確率
事後確率を最大化するパラメタηを求めたい
ここで対数尤度にしてみると、次のように解釈で
|log|logmaxargˆ
||maxargˆ
PXP
PXP パラは事前分布のハ...
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
||x|| ≤ k
w
L(w)
f(w) = L(w) + λ||w||1
n
Rd
f(w) = min
w∈Rd
L(w) + λ||w||1 (
Use, Harvard University Press, 19...
• η
• η
• wj ηj
•
•
w
L(w)
f(w) = L(w) + λ||w||1
min
w∈Rd
f(w) = min
w∈Rd
L(w) + λ||w||1
||w||1 =
d
j=1
|wj| =
1
2
d
j=1
m...
• η
•
•
•
min
w∈Rd
L(w) + λ||w||1 = min
w,η∈Rd,ηj ≥0
L(w) +
λ
2
d
j=1
w2
j
ηj
+
λ
2
d
j=1
ηj
W.: Location and Land Use, Ha...
•
•
•
•
•
wt
= arg min
w∈Rd
⎛
⎝L(w) +
λ
2
d
j=1
w2
j
ηt
j
⎞
⎠
1
ηt+1
j = |wt
j| j = 1, . . . , d
proxg(y) = arg min
w∈Rd
1...
•
•
•
•
j j
proxg(y) = arg min
w∈Rd
1
2
||y − w||2
2 + g(w)
proxl1
λ (y) = arg min
w∈Rd
1
2
||y − w||2
2 + λ||w||1
proxl1
...
•
•
• j
•
•
•
prox 1
λ (y) = arg min
w∈Rd 2
||y − w||2 + λ||w||1 (
proxl1
λ (y)
j
=
⎧
⎨
⎩
yj + λ, if yj  −λ
0, if − λ ≤ yj...
• f wt
• w0
• ηt
•
•
•
yj − wj ∈ λ∂|wj| j = 1, . . . , d
∂|w| =
⎧
⎨
⎩
−1, if w  0
[−1, 1], if w = 0
1, if w  0
wt+1
= arg ...
•
wt+1
= arg min
w
∇L(wt
)(w − wt
) + λ||w||1 +
1
2ηt
||w − w
wt+1
= proxl1
λ,ηt
wt
− ηt∇L(wt
)
[1] Alonso, W.: Location a...
• X L
•
•
•
•
•
1, if w  0
wt+1
= arg min
w
∇L(wt
)(w − wt
) + λ||w||1 +
1
2ηt
||w − wt
||2
2 (21)
wt+1
= proxl1
λ,ηt
wt
−...
•
•
•
• x*
y*
g(x) = (g1(x), . . . , gp(x))⊤
(25)
Lρ(x, y) = f(x) + y⊤
g(x) +
ρ
2
||g(x)||2
2 (26)
Location and Land Use, ...
•
• x*
•
• x* x
y* y
• y*
x*
• x* y*
∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
gj(x∗
) = 0, j = 1, . . . , p
∇xLρ(x, y) = ∇f(x) +
...
•
•
•
∇xLρ(x, y) = ∇f(x) +
j=1
yj∇gj(x) + ρ
j=1
gj(x)∇gj(x)
∇xLρ(x, y∗
)|x=x∗ = ∇f(x∗
) +
p
j=1
y∗
j ∇gj(x∗
) = 0
1. y0
2....
•
•
• f
•
•
1. y0
2. xk+1 ||∇xLρk
(xk+1, yk)|| ≤ ϵk
ρk  0 ϵk ≥ 0 ϵk → 0
3. yk+1 ← yk + ρkg(xk+1)
4. k ← k + 1
f : Rn
→ R ∪...
•
• f g
•
•
•
4. k ← k + 1
f : Rn
→ R ∪ {+∞}
f∗
(s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn
}
f∗
: Rn
→ R ∪ {+∞}
f → f∗
X ∈ Rn×d
min
w∈...
• L
•
•
min
w∈Rd
fl(Xw) + λ||w||1 (37
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (38
min
w∈Rd
fl(Xw) + λ||w||1 (3
max
α∈Rn
−f∗
...
•
•
•
max
α∈Rn
−f∗
l (−α) − δ||·||∞≤λ(X⊤
α) (38)
δ||·||∞≤λ(v) =
0, if ||v||∞ ≤ λ
+∞, if otherwise
min
α∈Rn,v∈Rd
f∗
l (−α) ...
•
•
•
•
• (Tomioka and Sugiyama, 2009)
•
•
Lη(α, v, w) = f∗
l (−α) + δ||·||∞≤λ(v) + w⊤
(X⊤
α − v) +
η
2
||X⊤
α − v||2
2
ma...
• v
•
• v
α
• α α w
•
1. w0
2. φ(αt) αt+1
3. wt
wt+1
= wt
+ η X⊤
αt+1
− vt+1
min
v∈Rd
Lη(α, v, wt
) = min
v∈Rd
1
2η
||ηv −...
•
•
• α, v, w
•
Eckstein and Bertsekas(1992) Boyd et al. (2010)
•
•
min
α∈Rn,v∈Rd
f∗
l (−α) + δ||·||∞≤λ(v) +
η
2
||X⊤
α − ...
•
•
•
•
https://en.wikipedia.org/wiki/Coordinate_descent
•
• βj λ βk
•
•
vt+1
= arg min
v∈Rd
Lη(αt+1
, v, wt
)
wt+1
= wt
+ η X⊤
αt+1
− vt+1
ˆβlasso
= arg min
β
⎧
⎨
⎩
1
2
d
i=1
(yi...
• Fu(1998) Daubechies et al. (2004)
Friedman et al. (2007) Wu and Lange (2008)
•
•
Friedman et al., 2010)
•
•
•
•
•
(Beck ...
•
•
•
•
•
•
•
51
52
•
–
–
–
53
•
–
•
•
•
•
54
•
–
–
–
•
•
– µ
S S-1=Q
55
x1
x2
x3
x4
x5
•
–
–
–
56
x1
x2
e.g. x1, x2 1, 2
x2
P(x2)
x2
P(x2)
x1
x2
P(x1)
(Gaussian Markov Random Field)
•
• µ, Q
• x = (x1,…,xL)
xo, xu
•
57
p(xu | xo,µ,Θ) = N(x | µ,Θ−1
) δ(yi − xi )
i∈O
∏ dxoxo...
–
–
Gaussian Graphical Model (GGM)
( , 2014; Kataoka et al., 2014)
–
–
–
(Graphical Lasso; GL)
–
–
58
•
–
•
–
– Graphical Lasso (Friedman et al., 2007)
59
:	0
0	
Θ =
Θ =
60
• x
•
–
– V+2 ( V + V2/2)
Θ µ
Z(β, γ, α) = exp
1
2
βT
Θ−1
β
∞
−∞
exp −
1
2
(x − µ)T
Θ(x − µ) dx (4.16)
(3.2)(3 (2))
Z(β...
27
zd
ln p(zd
|θ)
ln p(zd
|θ) = βT
zd
−
ηϵ
2 i∈V
zi
d2
−
η
2 (i,j)∈E
zd
i − zd
j
2
−
1
2η
βT
Θ−1
β −
|V|
2
ln η + const
(4...
•
•
–
•
62
3.2
p(x|µ, Θ) :=
1
Z(Θ)
exp −
1
2
(x − µ)T
Θ(x − µ)
Σ
Θ
ln p(Θ, µ) =
D
2
log det Θ −
1
2
d
(xd
− µ)T
Θ(xd
− µ) ...
•
–
–
•
Friedman
et al. (2007) Graphical Lasso
– L1
•
GL
63
• Q
• S G
• Q-1* W
–
64
(4.21) L1
(4.31) Θ
∂
∂Θ
ln p(Θ) = Θ−1
− S − ρ Γ (4.32)
Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1,...
•
• (6) (7)
• WQ=I
–
65
L1
−1
b = W −1/2
11
s12
35) (4.34) (4.33
w12 − s12 − ρ γ12 = 0
W Θ = I
1 θ12
2
θ22
=
W11Θ11 + w12θ...
•
• b (7), (8)
•
66
β ≡ W11
−1
w12 b ≡ W11
−1/2
s12
,	(4.36)
W11β − s12 − ρ γ12 = 0 (4.39
Θ θ22  0 sign(θ12)
ign(β) (4.39)...
•
• GL W
Mizumder and Hastie (2012)
67
11W
12w
22w
12
T
w
W = S + rI
W (11) b
W
ˆβ w12 = W11
ˆβ
ˆw12
•
–
•
–
–
– Θ ← Θold
68
Q(Θ | Θold
) = ln p(xu, y | Θ)p(xu | y,Θold
)dxuxu
∫
(4.2)
Θ∗
= arg max
Θ
log det Θ − tr(SΘ) − ρ||...
•
–
–
– 2*1183 + 1183*1182/2 70
•
–
–
–
–
–
69
70
71
41
0-5%
5-10%
10-50%
50-100%
72
43
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8 9 10 11
freqency
speed (km/h)
0 10 20 30 40 50 60 70 80 90 100
– 5–5
4...
•
–
–
–
•
–
–
–
•
73
80 km/h over
60-80 km/h
40-60 km/h
20-40 km/h
0-20 km/h
74
80 km/h over
60-80 km/h
40-60 km/h
20-40 km/h
0-20 km/h
75
•
76
•
•
→ 77
46
0
5
10
15
20
25
30
0.1 0.05 0.01 0.005 0.001
計算時間(hour)
正則化パラメータρ
GGM: 25時間42分
78
79
80
•
–
0.1 over
0.1 ~ 0.05
-0.01 ~ -0.02
-0.02 under
81
•
–
– 82
•
1.
2. GGM
Graphical Lasso
3. EM
GGM, GL EM
4.
5. GL
•
–
– 83
1. Kataoka, S., Yasuda, M., Furtlehner, C., and Tanaka, K., : Traffic
data reconstruction based on Markov random field mod...
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society. Serie...
•
•
•
•
•
•
•
•
• Rish and Grabarnik, Sparse Modeling Theory, Algorithms, and Applications,
CRC Press, 2014.
•
• Elder and...
• Cover, T. M.,  Van Campenhout, J. M. (1977). On the possible orderings in the measurement selection
problem. IEEE Transa...
スパースモデリング
スパースモデリング
スパースモデリング
Prochain SlideShare
Chargement dans…5
×

スパースモデリング

2017/10/17における交通・都市理論ドクター勉強会@金沢大での発表資料です.

  • Soyez le premier à commenter

スパースモデリング

  1. 1. •
  2. 2. • • • • • • • • • • •
  3. 3. • • • • • •
  4. 4. • • • • • • • • • • •
  5. 5. • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288. (Google scholar 21305) • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on information theory, 52(4), 1289-1306. ( 19534) • Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607. ( 4765) • Candes, E. J., & Tao, T. (2005). Decoding by linear programming. IEEE transactions on information theory, 51(12), 4203-4215. ( 5488) • Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351. ( 2603) • Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2), 489-509. ( 12285)
  6. 6. • • • • • • • • • • • •
  7. 7. OR SP (e.g. Uber) FD Macroscopic Fundamental Diagram (MFD) Built environment
  8. 8. • •
  9. 9. • • •
  10. 10. • • O(eN) (Cover and van Campenhout, 1977) • • • • • • • •
  11. 11. • • • 2 0 -2 0.5 0 0 1 0.7 0 0 -2 0.7 3 0 0.5 0.5 0 0 2 0 0 0 0.5 0 4 Σ =# = (0,0,0,0,0)
  12. 12. µ •
  13. 13. Σ •
  14. 14. Σ •
  15. 15. • • • • • • • •
  16. 16. • •
  17. 17. • • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828. 8.2. s is uto- ared r f ostly sful PSD ding gni- lied 106], k of 1.2). wing usly hðtÞ k2 2; LEARNING Another important perspective on representation learning is based on the geometric notion of manifold. Its premise is the manifold hypothesis, according to which real-world data presented in high-dimensional spaces are expected to concentrate in the vicinity of a manifold M of much lower dimensionality dM, embedded in high-dimensional input space IRdx . This prior seems particularly well suited for AI tasks such as those involving images, sounds, or text, for which most uniformly sampled input configurations are unlike natural stimuli. As soon as there is a notion of “representation,” one can think of a manifold by consider- ing the variations in input space which are captured by or reflected (by corresponding changes) in the learned repre- sentation. To first approximation, some directions are well preserved (the tangent directions of the manifold), while others are not (directions orthogonal to the manifolds). With this perspective, the primary unsupervised learning task is then seen as modeling the structure of the data-supporting manifold.18 The associated representation being learned can be associated with an intrinsic coordinate system on the embedded manifold. The archetypal manifold modeling algorithm is, not surprisingly, also the archetypal low- Bengio et al. (2013) 多様体とは?(感覚的説明) • 見かけは違うが、実質的にはd次元ユーク リッド空間で表現できるような図形 • 「局所的に地図が書けるような図形」とも言え る(例:地球表面) 3次元中に埋め込まれた、1次元多様体 同じく、2次元多様体(「スイスロール」) 多様体とは?(感覚的説明) • 見かけは違うが、実質的にはd次元ユーク リッド空間で表現できるような図形 • 「局所的に地図が書けるような図形」とも言え る(例:地球表面) 3次元中に埋め込まれた、1次元多様体 同じく、2次元多様体(「スイスロール」)
  18. 18. • x A y • y = A x A y x • • A y x
  19. 19. • || y - A x ||2 x • β i=1 1 n E{xi,yi}n i=1 2 n i=1 l(yi, x⊤ i ˆβ) + 2| = E{xi,yi}n i=1 EX,Y 2l(Y, X⊤ ˆβ) ˆx = arg min x 1 2 ||y − Ax||2 + λ||x|| [1] Alonso, W.: Location and Land Use, Harvard University Press, 1964. [2] Mills, E.S.: An aggregative model of resource allocation in a metro Economic Review, Vol.57, No.2, pp.197–210, 1967. [3] Muth, R.F.: Cities and Housing, University of Chicago Press, 1969. [4] Bairoch, P.: Cities and Economic Development: From the Dawn of University of Chicago Press, 1988. [5] Hohenberg, P., Lees, L.H.: The Making of Urban Europe (1000-195 i=1 = E{xi,yi}n i=1 EX,Y 2l(Y, X⊤ ˆβ) + O( 1 n2 ) (3) ˆx = arg min x 1 2 ||y − Ax||2 + λ||x|| (4) ˆx = arg min x 1 2 ||y − Ax||2 (5) ||x|| ≤ k (6) Land Use, Harvard University Press, 1964. ve model of resource allocation in a metropolitan area, American No.2, pp.197–210, 1967. ousing, University of Chicago Press, 1969. Economic Development: From the Dawn of History to the Present, subject to
  20. 20. ( ) = * |,-| . -/) ( 0 = * ,- 0 . -/) ( 2 = max |,-| ( ) + 7 ( 0 0
  21. 21. • • • β L0 • • • • • • min β 2 n i=1 l(yi, x⊤ i β) + 2||β||0 Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, Amer view, Vol.57, No.2, pp.197–210, 1967. Cities and Housing, University of Chicago Press, 1969. Cities and Economic Development: From the Dawn of History to the Pres Chicago Press, 1988. P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard Univer determination of bid rents through bidding procedures, Journal of Urban E + 2||β||0 (1) versity Press, 1964. llocation in a metropolitan area, American . icago Press, 1969. From the Dawn of History to the Present, an Europe (1000-1950), Harvard University : 2017 9 21 l(yi, x⊤ i β) + 2||β||0 (1) 1 n E{xi,yi}n i=1 2 n i=1 l(yi, x⊤ i ˆβ) + 2||ˆβ||0 (2) = E{xi,yi}n i=1 EX,Y 2l(Y, X⊤ ˆβ) + O( 1 n2 ) (3) on and Land Use, Harvard University Press, 1964. ggregative model of resource allocation in a metropolitan area, American Vol.57, No.2, pp.197–210, 1967. and Housing, University of Chicago Press, 1969.
  22. 22. • • • • Bayesでは事後確率は 観測データの確率×事前確率 事後確率を最大化するパラメタηを求めたい ここで対数尤度にしてみると、次のように解釈で |log|logmaxargˆ ||maxargˆ PXP PXP パラは事前分布のハイパー 損失関数 正則化項 Bayesでは事後確率は 観測データの確率×事前確率 事後確率を最大化するパラメタηを求めたい ここで対数尤度にしてみると、次のように解釈でき |log|logmaxargˆ ||maxargˆ PXP PXP パラメは事前分布のハイパー 損失関数 正則化項 Bayesでは事後確率は 観測データの確率×事前確率 事後確率を最大化するパラメタηを求めたい ここで対数尤度にしてみると、次のように解釈できる |log|logmaxargˆ ||maxargˆ PXP PXP パラメタは事前分布のハイパー 損失関数 正則化項 ノルムによる正則化項     とすると  事前分布の重みをここで、        も同様にすると事前分布 L2 2 ),( 2 1 maxarg ,0 2 1 ),( 2 1 minarg ),|(log),|(logminarg 2/),|(log ,| 2/),()1,),(|(log)1,|(log )1,0()( 2 2 2 wwwx wwwx ww,x www w wxwxw,x wx w w w T i ii T i ii i ii T i ii i ii i ii φy φy pyp p p φyφyNyp Nφy 事前分布のwの 分散:λー1 とも見 える。 例:事前分布がLaplace分布、事後分布が正規分布    も同様にすると分布の事前分布は期待値 )|(log),|(logminarg 2 )|(log 2 exp 4 |0 2/),()1,),(|(log)1,|(log )1,0()( 2 ww,x w w w w wxwxw,x wx ii i ii i ii i ii pyp p pLaplace φyφyNyp Nφy 例:事前分布がLaplace分布、事後分布が正規分布 ノルムによる正則化項         も同様にすると分布の事前分布は期待値 L1 2 ),( 2 1 minarg )|(log),|(logminarg 2 )|(log 2 exp 4 |0 2/),()1,),(|(log)1,|(log )1,0()( 2 2 wwx ww,x w w w w wxwxw,x wx w w i ii i ii i ii i ii i ii φy pyp p pLaplace φyφyNyp Nφy
  23. 23. • • • • • • • • • •
  24. 24. • • • • • • ||x|| ≤ k w L(w) f(w) = L(w) + λ||w||1 n Rd f(w) = min w∈Rd L(w) + λ||w||1 ( Use, Harvard University Press, 1964. odel of resource allocation in a metropolitan area, Ameri , pp.197–210, 1967. ||x|| ≤ k (6 w (7 L(w) (8 f(w) = L(w) + λ||w||1 (9 n Rd f(w) = min w∈Rd L(w) + λ||w||1 (10 Use, Harvard University Press, 1964. odel of resource allocation in a metropolitan area, America , pp.197–210, 1967. ˆx = arg min x 1 2 ||y − Ax||2 ||x|| ≤ k w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1 and Land Use, Harvard University Press, 1964. ˆx = arg min x 1 2 ||y − Ax||2 ||x|| ≤ k w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1
  25. 25. • η • η • wj ηj • • w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1 ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w2 j ηj + ηj W.: Location and Land Use, Harvard University Press, 1964. S.: An aggregative model of resource allocation in a metropolitan area, A 1 w L(w) f(w) = L(w) + λ||w||1 min w∈Rd f(w) = min w∈Rd L(w) + λ||w||1 ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w2 j ηj + ηj w2 j ηj + ηj ≥ 2||w||1 1 L(w f(w) = L(w) + λ||w| min w∈Rd f(w) = min w∈Rd L(w) + λ||w| ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w ηj w2 j ηj + ηj ≥ 2||w||1 ηj = |wj| 1
  26. 26. • η • • • min w∈Rd L(w) + λ||w||1 = min w,η∈Rd,ηj ≥0 L(w) + λ 2 d j=1 w2 j ηj + λ 2 d j=1 ηj W.: Location and Land Use, Harvard University Press, 1964. .S.: An aggregative model of resource allocation in a metropolitan area, ic Review, Vol.57, No.2, pp.197–210, 1967. R.F.: Cities and Housing, University of Chicago Press, 1969. P.: Cities and Economic Development: From the Dawn of History to the ty of Chicago Press, 1988. erg, P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard U 985. Y.: A determination of bid rents through bidding procedures, Journal of U ||w||1 = d j=1 |wj| = 1 2 d j=1 min η∈Rd:ηj ≥0 w2 j ηj + ηj ≥ 2||w||1 ηj = |wj| 1 min w∈Rd L(w) + λ||w||1 = min w,η∈Rd,ηj ≥0 L(w) + λ 2 d j=1 w2 j ηj + λ 2 d j=1 ηj (14) 1. j = 1, . . . , d η1 j = 1 2. a wt wt = arg min w∈Rd ⎛ ⎝L(w) + λ 2 d j=1 w2 j ηt j ⎞ ⎠ (15) b ηt+1 j ηt+1 j = |wt j| j = 1, . . . , d (16) [1] Alonso, W.: Location and Land Use, Harvard University Press, 1964.
  27. 27. • • • • • wt = arg min w∈Rd ⎛ ⎝L(w) + λ 2 d j=1 w2 j ηt j ⎞ ⎠ 1 ηt+1 j = |wt j| j = 1, . . . , d proxg(y) = arg min w∈Rd 1 2 ||y − w||2 2 + g(w) W.: Location and Land Use, Harvard University Press, 1964. S.: An aggregative model of resource allocation in a metropolitan area, A Review, Vol.57, No.2, pp.197–210, 1967. F.: Cities and Housing, University of Chicago Press, 1969.
  28. 28. • • • • j j proxg(y) = arg min w∈Rd 1 2 ||y − w||2 2 + g(w) proxl1 λ (y) = arg min w∈Rd 1 2 ||y − w||2 2 + λ||w||1 proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ : Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, Am Review, Vol.57, No.2, pp.197–210, 1967. : Cities and Housing, University of Chicago Press, 1969. j ηt+1 j = |wt j| j = 1, . . . , d ( proxg(y) = arg min w∈Rd 1 2 ||y − w||2 2 + g(w) ( proxl1 λ (y) = arg min w∈Rd 1 2 ||y − w||2 2 + λ||w||1 ( proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ , W.: Location and Land Use, Harvard University Press, 1964. E.S.: An aggregative model of resource allocation in a metropolitan area, Ameri mic Review, Vol.57, No.2, pp.197–210, 1967. R.F.: Cities and Housing, University of Chicago Press, 1969. z ST(z) λ -λ λ λ λ
  29. 29. • • • j • • • prox 1 λ (y) = arg min w∈Rd 2 ||y − w||2 + λ||w||1 ( proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ 1 2 ||y − w||2 2 + λ||w||1 = d j=1 1 2 (yj − wj)2 + λ|wj| ( yj − wj ∈ λ∂|wj| j = 1, . . . , d ( , W.: Location and Land Use, Harvard University Press, 1964. E.S.: An aggregative model of resource allocation in a metropolitan area, Ameri mic Review, Vol.57, No.2, pp.197–210, 1967. R.F.: Cities and Housing, University of Chicago Press, 1969. h, P.: Cities and Economic Development: From the Dawn of History to the Prese sity of Chicago Press, 1988. prox 1 λ (y) = arg min w∈Rd 2 ||y − w||2 2 + λ||w||1 proxl1 λ (y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ 1 2 ||y − w||2 2 + λ||w||1 = d j=1 1 2 (yj − wj)2 + λ|wj| yj − wj ∈ λ∂|wj| j = 1, . . . , d Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, Am Review, Vol.57, No.2, pp.197–210, 1967. roxl1 λ (y) = arg min w∈Rd 1 2 ||y − w||2 2 + λ||w||1 (18) y) j = ⎧ ⎨ ⎩ yj + λ, if yj −λ 0, if − λ ≤ yj ≤ λ j = 1, . . . , d yj − λ, if yj λ − w||2 2 + λ||w||1 = d j=1 1 2 (yj − wj)2 + λ|wj| (19) yj − wj ∈ λ∂|wj| j = 1, . . . , d (20) ∂|w| = ⎧ ⎨ ⎩ −1, if w 0 [−1, 1], if w = 0 1, if w 0 w |w|
  30. 30. • f wt • w0 • ηt • • • yj − wj ∈ λ∂|wj| j = 1, . . . , d ∂|w| = ⎧ ⎨ ⎩ −1, if w 0 [−1, 1], if w = 0 1, if w 0 wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − wt ||2 2 so, W.: Location and Land Use, Harvard University Press, 1964. s, E.S.: An aggregative model of resource allocation in a metropolitan area, Ame nomic Review, Vol.57, No.2, pp.197–210, 1967. h, R.F.: Cities and Housing, University of Chicago Press, 1969. 2 yj − wj ∈ λ∂|wj| j = 1, . . . , d ∂|w| = ⎧ ⎨ ⎩ −1, if w 0 [−1, 1], if w = 0 1, if w 0 wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − wt ||2 2 wt+1 = proxl1 λ,ηt wt − ηt∇L(wt ) W.: Location and Land Use, Harvard University Press, 1964.
  31. 31. • wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − w wt+1 = proxl1 λ,ηt wt − ηt∇L(wt ) [1] Alonso, W.: Location and Land Use, Harvard University Press, 1964. 2
  32. 32. • X L • • • • • 1, if w 0 wt+1 = arg min w ∇L(wt )(w − wt ) + λ||w||1 + 1 2ηt ||w − wt ||2 2 (21) wt+1 = proxl1 λ,ηt wt − ηt∇L(wt ) (22) min x∈Rn f(x) (23) 2 gj(x) = 0 j = 1, . . . , p (24) g(x) = (g1(x), . . . , gp(x))⊤ (25) W.: Location and Land Use, Harvard University Press, 1964. S.: An aggregative model of resource allocation in a metropolitan area, American gj(x) = 0 j = 1, . . . , p g(x) = (g1(x), . . . , gp(x))⊤ nso, W.: Location and Land Use, Harvard University Press, 1964. s, E.S.: An aggregative model of resource allocation in a metropolitan area, Am nomic Review, Vol.57, No.2, pp.197–210, 1967. s.t. gj(x) = 0 j = 1, . . . , p (2 g(x) = (g1(x), . . . , gp(x))⊤ (2 Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (2
  33. 33. • • • • x* y* g(x) = (g1(x), . . . , gp(x))⊤ (25) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (26) Location and Land Use, Harvard University Press, 1964. An aggregative model of resource allocation in a metropolitan area, American Review, Vol.57, No.2, pp.197–210, 1967. : Cities and Housing, University of Chicago Press, 1969. : Cities and Economic Development: From the Dawn of History to the Present, of Chicago Press, 1988. P., Lees, L.H.: The Making of Urban Europe (1000-1950), Harvard University A determination of bid rents through bidding procedures, Journal of Urban Eco- .27, Issue.2, pp.188–211, 1990. gj(x) = 0 j = 1, . . . , p (24) g(x) = (g1(x), . . . , gp(x))⊤ (25) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (26) min x∈Rn max y∈Rp Lρ(x, y) (27) .: Location and Land Use, Harvard University Press, 1964. .: An aggregative model of resource allocation in a metropolitan area, American Review, Vol.57, No.2, pp.197–210, 1967. s.t. gj(x) = 0 j = 1, . . . , p g(x) = (g1(x), . . . , gp(x)) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 | min x∈Rn max y∈Rp Lρ(x, y) ∇g1(x∗ ), . . . , ∇gp(x∗ ) gj(x) = 0 j = 1, . . . , p (24) g(x) = (g1(x), . . . , gp(x))⊤ (25) Lρ(x, y) = f(x) + y⊤ g(x) + ρ 2 ||g(x)||2 2 (26) min x∈Rn max y∈Rp Lρ(x, y) (27) ∇g1(x∗ ), . . . , ∇gp(x∗ ) (28) ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 (29) gj(x∗ ) = 0, j = 1, . . . , p (30) (3.1) (3.2) (3.3)
  34. 34. • • x* • • x* x y* y • y* x* • x* y* ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 gj(x∗ ) = 0, j = 1, . . . , p ∇xLρ(x, y) = ∇f(x) + p j=1 yj∇gj(x) + ρ p j=1 gj(x)∇gj(x) 2 min x∈Rn max y∈Rp Lρ(x, y) ∇g1(x∗ ), . . . , ∇gp(x∗ ) ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 gj(x∗ ) = 0, j = 1, . . . , p ∇g1(x∗ ), . . . , ∇gp(x∗ ) ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 gj(x∗ ) = 0, j = 1, . . . , p ∇xLρ(x, y) = ∇f(x) + p j=1 yj∇gj(x) + ρ p j=1 gj(x)∇gj(x) ∇xLρ(x, y∗ )|x=x∗ = ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0
  35. 35. • • • ∇xLρ(x, y) = ∇f(x) + j=1 yj∇gj(x) + ρ j=1 gj(x)∇gj(x) ∇xLρ(x, y∗ )|x=x∗ = ∇f(x∗ ) + p j=1 y∗ j ∇gj(x∗ ) = 0 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 2
  36. 36. • • • f • • 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} 3 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ ∇xLρ(x, y∗ )|x=x∗ = ∇f(x∗ ) + j=1 y∗ j ∇gj( 1. y0 2. xk+1 ||∇xLρk (xk+1, yk)|| ≤ ϵk ρk 0 ϵk ≥ 0 ϵk → 0 3. yk+1 ← yk + ρkg(xk+1) 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ ✲ x(1) ✲ x(2) ✲ x(3) ✲ ✻ f x(4) ✲ ✻ f x(5) ✲ ✻ f x(6) 34 ✲ x ✻ y f(x) p −f•(p) ✲ x ✻ y
  37. 37. • • f g • • • 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ X ∈ Rn×d min w∈Rd (f(Xw) + g(w)) = min α∈R w∗ , α∗ w∗ ∈ ∂g∗ ( α∗ ∈ −∂f 3 → R ∪ {+∞} = sup{⟨s, x⟩ − f(x)|x ∈ Rn } → R ∪ {+∞} n×d min w∈Rd (f(Xw) + g(w)) = min α∈Rn −f∗ (−α) − g∗ (X⊤ α) w∗ ∈ ∂g∗ (X⊤ α∗ ) α∗ ∈ −∂f(Xw∗ ) 3 4. k ← k + 1 f : Rn → R ∪ {+∞} f∗ (s) = sup{⟨s, x⟩ − f(x)|x ∈ Rn } f∗ : Rn → R ∪ {+∞} f → f∗ X ∈ Rn×d min w∈Rd (f(Xw) + g(w)) = min α∈Rn −f∗ w∗ , α∗ w∗ ∈ ∂g∗ (X⊤ α∗ α∗ ∈ −∂f(Xw∗ 3 2 +∞} , x⟩ − f(x)|x ∈ Rn } {+∞} min w∈Rd (f(Xw) + g(w)) = min α∈Rn −f∗ (−α) − g∗ (X⊤ α) (33) w∗ ∈ ∂g∗ (X⊤ α∗ ) (34) α∗ ∈ −∂f(Xw∗ ) (35) (36) 3
  38. 38. • L • • min w∈Rd fl(Xw) + λ||w||1 (37 max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (38 min w∈Rd fl(Xw) + λ||w||1 (3 max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (3 fl λ|| ||1 λ min w∈Rd fl(Xw) + λ||w||1 max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min w∈Rd fl(Xw) + λ||w||1 ( max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) ( δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) ( X⊤ α = v ( η min w∈Rd fl(Xw) + λ||w||1 (37) max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (38) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) (39) X⊤ α = v (40) s.t.
  39. 39. • • • max α∈Rn −f∗ l (−α) − δ||·||∞≤λ(X⊤ α) (38) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) (39) X⊤ α = v (40) min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 (41) ) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 (42) s.t. min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w)
  40. 40. • • • • • (Tomioka and Sugiyama, 2009) • • Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α∈Rn,v∈Rd l ∞ 2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α, v (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt wt+1 = wt + η X⊤ αt+1 − vt+1 α∈Rn,v∈Rd 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 α∈Rn,v∈Rd l | Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ max w∈Rd min α∈Rn,v α, v (αt+1 , vt+1 ) = arg wt wt+1 = wt + η α∈Rn,v∈Rd 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 (3.4) δ||·||∞≤λ(v) = 0, if ||v||∞ ≤ λ +∞, if otherwise min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) X⊤ α = v min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α, v (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. wt + ηX⊤ α min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) α, v (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt wt wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ( ˆwt = wt + ηX⊤ α
  41. 41. • v • • v α • α α w • 1. w0 2. φ(αt) αt+1 3. wt wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. + ηX⊤ α wt+1 = wt + η X⊤ αt+1 − vt+1 (45 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. (46 X⊤ α ˆwt − ηvt+1 = proxl1 λ,η( ˆwt ) (47 wt+1 = wt + η X⊤ αt+1 − vt+1 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. wt + ηX⊤ α ˆwt − ηvt+1 = proxl1 λ,η( ˆwt ) φt(α) = f∗ l (−α) + 1 2ηt proxl1 λ,ηt ( ˆwt + ηtX⊤ α) 2 2 (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) (44 wt+1 = wt + η X⊤ αt+1 − vt+1 (45 min v∈Rd Lη(α, v, wt ) = min v∈Rd 1 2η ||ηv − ˆwt ||2 2 + δ||·||∞≤λ(v) + const. (46 X⊤ α ˆwt − ηvt+1 = proxl1 λ,η( ˆwt ) (47 φt(α) = f∗ l (−α) + 1 2ηt proxl1 λ,ηt ( ˆwt + ηtX⊤ α) 2 2 (48 wt+1 = proxl1 λ,ηt wt + ηtX⊤ αt+1 (49
  42. 42. • • • α, v, w • Eckstein and Bertsekas(1992) Boyd et al. (2010) • • min α∈Rn,v∈Rd f∗ l (−α) + δ||·||∞≤λ(v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 Lη(α, v, w) = f∗ l (−α) + δ||·||∞≤λ(v) + w⊤ (X⊤ α − v) + η 2 ||X⊤ α − v||2 2 max w∈Rd min α∈Rn,v∈Rd Lη(α, v, w) (αt+1 , vt+1 ) = arg min α∈Rn,v∈Rd Lη(α, v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 η l α, v (α wt αt+1 = arg min α∈Rn Lη(α, vt , wt ) ( vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) ( wt+1 = wt + η X⊤ αt+1 − vt+1 (
  43. 43. • • • • https://en.wikipedia.org/wiki/Coordinate_descent
  44. 44. • • βj λ βk • • vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| vt+1 = arg min v∈Rd Lη(αt+1 , v, w wt+1 = wt + η X⊤ αt+1 − v ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj αt+1 = arg min α∈Rn Lη(α, vt , wt ) vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) wt+1 = wt + η X⊤ αt+1 − vt+1 ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| α∈Rn vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) (51) wt+1 = wt + η X⊤ αt+1 − vt+1 (52) o = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ (53) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| (54) yi − ˜y (j) i = yi − k̸=j xik ˜βk(λ) (55) αt+1 = arg min α∈Rn Lη(α, vt , wt ) (50) vt+1 = arg min v∈Rd Lη(αt+1 , v, wt ) (51) wt+1 = wt + η X⊤ αt+1 − vt+1 (52) ˆβlasso = arg min β ⎧ ⎨ ⎩ 1 2 d i=1 (yi − β0 − n j=1 xij · βj)2 + λ n j=1 |βj| ⎫ ⎬ ⎭ (53) R(˜β(λ), βj) = 1 2 d i=1 ⎛ ⎝yi − k̸=j xik · ˜βk(λ) − xij · βj ⎞ ⎠ 2 + λ k̸=i | ˜βk(λ)| + λ|βj| (54) yi − ˜y (j) i = yi − k̸=j xik ˜βk(λ) (55) ˜β(λ) ← TH d i=1 xij(yij − ˜β (j) i ), λ (56) TH( )
  45. 45. • Fu(1998) Daubechies et al. (2004) Friedman et al. (2007) Wu and Lange (2008) • • Friedman et al., 2010) • • • • • (Beck and Tetruashvili, 2013)
  46. 46.
  47. 47. • • • • • •
  48. 48. 51
  49. 49. 52
  50. 50. • – – – 53
  51. 51. • – • • • • 54
  52. 52. • – – – • • – µ S S-1=Q 55 x1 x2 x3 x4 x5
  53. 53. • – – – 56 x1 x2 e.g. x1, x2 1, 2 x2 P(x2) x2 P(x2) x1 x2 P(x1)
  54. 54. (Gaussian Markov Random Field) • • µ, Q • x = (x1,…,xL) xo, xu • 57 p(xu | xo,µ,Θ) = N(x | µ,Θ−1 ) δ(yi − xi ) i∈O ∏ dxoxo ∫ = N(xu | µu −Θuu −1 Θuo (xo −µo ),Θuu −1 ) Θ = Θuu Θou Θuo Θoo # $ $ % ' ' µ = µu µo ! # # $ % δ(⋅) ˆxu = argmax xu N(xu | µu −Θuu −1 Θuo (xo −µo ),Θuu −1 ) (1)
  55. 55. – – Gaussian Graphical Model (GGM) ( , 2014; Kataoka et al., 2014) – – – (Graphical Lasso; GL) – – 58
  56. 56. • – • – – Graphical Lasso (Friedman et al., 2007) 59 : 0 0 Θ = Θ =
  57. 57. 60 • x • – – V+2 ( V + V2/2) Θ µ Z(β, γ, α) = exp 1 2 βT Θ−1 β ∞ −∞ exp − 1 2 (x − µ)T Θ(x − µ) dx (4.16) (3.2)(3 (2)) Z(β, γ, α) = exp 1 2 βT Θ−1 β (2π)Ndet(Θ−1) (4.17) (4.16) (4.17) p(x|β, γ, α) = 1 (2π)Ndet(Θ−1) exp − 1 2 (x − µ)T Θ(x − µ) (4.18) GGM(4.11) Kataoka et al. GGM GGM Θij ≡ ε + ∂(i) −1 0 % '' ( ' ' i = j (i, j) E otherwise µ ≡ 1 η Θ−1 β i G(V, E) GGM p(x|β, η) ∝ exp βT x − ηϵ 2 i∈V x2 i − η 2 (i,j)∈E (xi − xj)2 (4.19) β η ϵ (4.19) (4.11) ηϵ γi η α (4.19) i (4.19) GGM x ∂(i) (4.19) exp βx − ηϵ 2 i∈V x2 i − η 2 (i,j)∈E (xi − xj)2 = exp βx − η 2 i∈V ϵ + |∂(i)| x2 i + η (i,j)∈E xixj (4.20 = exp − η 2 (x − µ)T Θ(x − µ) + η 2 βT Θ−1 β p(x|β, η) = ηN det C (2π)N exp − η 2 (x − µ)T Θ(x − µ) Θ Θij = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪ ϵ + |∂(i)| i = j −1 (i, j) ∈ E or (j, i) ∈ E (2) (3)
  58. 58. 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 (4.27) p(xd u|yd , θ) = η|Ud| det Θud (2π)|Ud| exp − η 2 xd u − µud T ΘUd xd u − µud (4.28) • • – – – yd xd • – – 61 θ0 Estep Mstep Estep Q(θ, θold ) Q(θ|θold ) = x1 u x2 u . . . xD u ln D d=1 p(zd |Θ) D d=1 p(xd u|yd , θold )dx1 udx2 u . . . dx = d xd u ln p(zd |θ)p(xd u|yd , θold )dxd u Estep Mstep Estep (4.25) Estep Q(θ, θ ) Q(θ|θold ) = x1 u x2 u . . . xD u ln D d=1 p(zd |Θ) D d=1 p(xd u|yd , θold )dx1 udx2 u . . . d = d xd u ln p(zd |θ)p(xd u|yd , θold )dxd u Estep Mstep Estep (4.25) 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 (4.27) 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 (4.27) p(xd u|yd , θ) = η|Ud| det Θud (2π)|Ud| exp − η 2 xd u − µud T ΘUd xd u − µud (4.28) 27 zd ln p(zd |θ) ln p(zd |θ) = βT zd − ηϵ 2 i∈V zi d2 − η 2 (i,j)∈E zd i − zd j 2 − 1 2η βT Θ−1 β − |V| 2 ln η + const (4.26) yd xd p(xd u|yd , θ) p(xd u|yd , θ) exp i∈Ud βi(xd u)i − ηd ϵ 2 i∈Ud (xd u)i − η 2 (i,j)∈Ωd 1 (yd i − (xd u)j)2 − η 2 (i,j)∈Ωd 3 (xd u)i − (xd u)j 2 u η ud Θud := ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ ϵ + |∂(i)| i ∈ Ud −1 (i, j) ∈ Ωd 3 or (j, i) ∈ Ωd 3 0 otherwise (4.25) (4.26) (4.28) Q(θ|θold ) Mstep 4.3.2 Q(θ|θold ) θ ˆθ = arg max θ Q(θ|θold ) 28 E-step Q(θ|θold ) ∂Q(β, η|βold , ηold ) ∂βi ∝ 1 D d Ed[zi] − Θ−1 β i (4.29a) E-step Q(θ|θold ) ∂Q(β, η|βold , ηold ) ∂βi ∝ 1 D d Ed[zi] − Θ−1 β ∂Q(β, η|βold , ηold ) ∂η ∝ (i,j)∈E 1 D d Ed[zizj] − N 2η − 1 2 i∈V (ϵ + |∂(i)closed form (4)
  59. 59. • • – • 62 3.2 p(x|µ, Θ) := 1 Z(Θ) exp − 1 2 (x − µ)T Θ(x − µ) Σ Θ ln p(Θ, µ) = D 2 log det Θ − 1 2 d (xd − µ)T Θ(xd − µ) + const xd D D d xd = (xd 1 , xd 2 , ..., xd |V| ) 2 µ, Θ µ 15 p(x|µ, Θ) := Z(Θ) exp − 2 (x − µ)T Θ(x − µ) (4.1) Σ Θ ln p(Θ, µ) = D 2 log det Θ − 1 2 d (xd − µ)T Θ(xd − µ) + const (4.2) xd D D d xd = (xd 1 , xd 2 , ..., xd |V| ) 2 µ, Θ µ 15 1 (4.2) Θ∗ = arg max Θ log det Θ − tr(SΘ) − ρ||Θ||1 (4.31) ||Θ||1 |V|×|V| i,j=1 |Θij| µ (4.7) Θ (4.31) L1 Θij = 0 4.3.2 Θ = Θij j |V| ∑ i |V| ∑ Q1 Q2 (5)
  60. 60. • – – • Friedman et al. (2007) Graphical Lasso – L1 • GL 63
  61. 61. • Q • S G • Q-1* W – 64 (4.21) L1 (4.31) Θ ∂ ∂Θ ln p(Θ) = Θ−1 − S − ρ Γ (4.32) Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.3 Σ = Θ−1 W Θ,W Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.3 θ12, w12, s12 θ22, w22, s22 (4.33) GL Γij = sign(Θij ) ∈ [−1,1] % ' (' if Qij ≠ 0 if Qij = 0 Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.33) Σ = Θ−1 W Θ,W Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.34) θ12, w12, s12 θ22, w22, s22 (4.33) GL Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.33) Σ = Θ−1 W Θ,W Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.34) θ12, w12, s12 θ22, w22, s22 (4.33) GL Γ Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4.33 Σ = Θ−1 W Θ,W S Θ = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4.34 θ12, w12, s12 θ22, w22, s22 (4.33) GL 11W 12w 22w 12 T w (6)
  62. 62. • • (6) (7) • WQ=I – 65 L1 −1 b = W −1/2 11 s12 35) (4.34) (4.33 w12 − s12 − ρ γ12 = 0 W Θ = I 1 θ12 2 θ22 = W11Θ11 + w12θT 12 W11θ12 + θ22w12 θT 12 W + θwT 12 wT 12 θ + w22θ22 = I 0 0T 1 ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||1 = 0 (4.35 β L1 β W −1 11 w12 β ∈ R|V|−1 b = W −1/2 11 s12 (4.35) (4.34) (4.33) w12 − s12 − ρ γ12 = 0 (4.36 W Θ = I W11 w12 wT 12 w22 Θ11 θ12 θT 12 θ22 = W11Θ11 + w12θT 12 W11θ12 + θ22w12 θT 12 W + θwT 12 wT 12 θ + w22θ22 = I 0 0T 1 (4.37 W11θ12 + θ22w12 = 0 4.35) (4.34) (4.33) w12 − s12 − ρ γ12 = 0 (4.36 W Θ = I Θ11 θ12 θT 12 θ22 = W11Θ11 + w12θT 12 W11θ12 + θ22w12 θT 12 W + θwT 12 wT 12 θ + w22θ22 = I 0 0T 1 (4.37 W11θ12 + θ22w12 = 0 (7) (8) L1 (4.31) Θ ∂ ∂Θ ln p(Θ) = Θ−1 − S − ρ Γ (4.32) Θij 0 Γi,j = sign(Θij) Θij = 0 Γij ∈ [−1, 1] (4.31) (4.32) Θ−1 − S − ρ Γ = 0 (4. Σ = Θ−1 W Θ,W = Θ11 θ12 θT 12 θ22 , W = W11 w12 wT 12 w22 , S = S11 s12 sT 12 s22 (4. (6 )
  63. 63. • • b (7), (8) • 66 β ≡ W11 −1 w12 b ≡ W11 −1/2 s12 , (4.36) W11β − s12 − ρ γ12 = 0 (4.39 Θ θ22 0 sign(θ12) ign(β) (4.39) = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ 1 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 W −1 11 s2 12 + ρ||β||1 31 θ12 = −θ22W −1 11 w12 = −θ22β (4.38) (4.36) W11β − s12 − ρ γ12 = 0 (4.39) Θ θ 0 sign(θ ) = 11 β = W −1 11 w12 (4.36) W11β − s12 − ρ γ12 = 0 Θ θ22 0 sign( −sign(W −1 11 w12) = −sign(β) (4.39) W11β − s12 − ρ γ12 = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 2 W −1 11 s2 12 + = ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||2 1 = 0 (4.35) (9) (10) θ12 = −θ22W −1 11 w12 = −θ22β (4.38) β = W −1 11 w12 (4.36) W11β − s12 − ρ γ12 = 0 (4.39) Θ θ22 0 sign(θ12) = −sign(W −1 11 w12) = −sign(β) (4.39) W11β − s12 − ρ γ12 = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 2 W −1 11 s2 12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||2 1 = 0 (4.40) (4.35) Σ W w12 β θ12 = −θ22W −1 11 w12 = −θ22β (4.38) β = W −1 11 w12 (4.36) W11β − s12 − ρ γ12 = 0 (4.39) Θ θ22 0 sign(θ12) = −sign(W −1 11 w12) = −sign(β) (4.39) W11β − s12 − ρ γ12 = ∂ ∂β 1 2 βT W11β − βT s12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − W 1/2 11 s12 2 − βT s12 + βT s12 − 1 2 W −1 11 s2 12 + ρ||β||1 = ∂ ∂β 1 2 W 1/2 11 β − b 2 + ρ||β||2 1 = 0 (4.40) (4.35)b (11)
  64. 64. • • GL W Mizumder and Hastie (2012) 67 11W 12w 22w 12 T w W = S + rI W (11) b W ˆβ w12 = W11 ˆβ ˆw12
  65. 65. • – • – – – Θ ← Θold 68 Q(Θ | Θold ) = ln p(xu, y | Θ)p(xu | y,Θold )dxuxu ∫ (4.2) Θ∗ = arg max Θ log det Θ − tr(SΘ) − ρ||Θ||1 ||Θ||1 |V|×|V| i,j=1 |Θij|
  66. 66. • – – – 2*1183 + 1183*1182/2 70 • – – – – – 69
  67. 67. 70
  68. 68. 71 41 0-5% 5-10% 10-50% 50-100%
  69. 69. 72 43 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100 – 5–5 40 60 80 100 120 140 freqency 3 ( 5–4) ( 5–5) ( 5–6) 0 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100 – 5–4 5–7 43 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100 – 5–5 0 20 40 60 80 100 120 140 1 2 3 4 5 6 7 8 9 10 11 freqency speed (km/h) 0 10 20 30 40 50 60 70 80 90 100
  70. 70. • – – – • – – – • 73
  71. 71. 80 km/h over 60-80 km/h 40-60 km/h 20-40 km/h 0-20 km/h 74
  72. 72. 80 km/h over 60-80 km/h 40-60 km/h 20-40 km/h 0-20 km/h 75
  73. 73. • 76
  74. 74. • • → 77 46 0 5 10 15 20 25 30 0.1 0.05 0.01 0.005 0.001 計算時間(hour) 正則化パラメータρ GGM: 25時間42分
  75. 75. 78
  76. 76. 79
  77. 77. 80
  78. 78. • – 0.1 over 0.1 ~ 0.05 -0.01 ~ -0.02 -0.02 under 81
  79. 79. • – – 82
  80. 80. • 1. 2. GGM Graphical Lasso 3. EM GGM, GL EM 4. 5. GL • – – 83
  81. 81. 1. Kataoka, S., Yasuda, M., Furtlehner, C., and Tanaka, K., : Traffic data reconstruction based on Markov random field modeling, Inverse Problems, 30025003, 2014. 2. Freedman, J., Hastie, T. and Tibshirani, R., :Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 3, pp. 432-441, 2008. 3. Mazumder, R., and Hastie, T. : The graphical lasso: New insights and alternatives. Electronic journal of statistics, 6, pp. 2125-2149, 2012. 4. Dempster, A. P., Laird, N. M., and Rubin, D. B., :Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39, 1, pp.1-38, 1977. 5. , , , : , 12 ITS 2014 Peer-Review Proceedings, CD-ROM, 2014. 84
  82. 82. • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288. (Google scholar 21305) • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on information theory, 52(4), 1289-1306. ( 19534) • Olshausen, B. A., Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607. ( 4765) • Candes, E. J., Tao, T. (2005). Decoding by linear programming. IEEE transactions on information theory, 51(12), 4203-4215. ( 5488) • Candes, E., Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351. ( 2603) • Candès, E. J., Romberg, J., Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2), 489-509. ( 12285)
  83. 83. • • • • • • • • • Rish and Grabarnik, Sparse Modeling Theory, Algorithms, and Applications, CRC Press, 2014. • • Elder and Kutyniok, Compressed Sensing Theory and Applications, Cambridge University Press, 2012. •
  84. 84. • Cover, T. M., Van Campenhout, J. M. (1977). On the possible orderings in the measurement selection problem. IEEE Transactions on Systems, Man, and Cybernetics, 7(9), 657-661. • Bengio, Y., Courville, A., Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828. • Tomioka, R., Sugiyama, M. (2009). Dual-augmented Lagrangian method for efficient sparse reconstruction. IEEE Signal Processing Letters, 16(12), 1067-1070. • Eckstein, J., Bertsekas, D. P. (1992). On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293-318. • Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1), 1-122. • Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso. Journal of computational and graphical statistics, 7(3), 397-416. • Daubechies, I., Defrise, M., De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics, 57(11), 1413- 1457. • Friedman, J., Hastie, T., Höfling, H., Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302-332. • Wu, T. T., Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 224-244. • Beck, A., Tetruashvili, L. (2013). On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4), 2037-2060.

×