3. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
4. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
Lower semi-continuous: lim inf G(x) G(x0 )
x x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
5. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
Lower semi-continuous: lim inf G(x) G(x0 )
x x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
0 if x ⇥ C,
Indicator: C (x) =
+ otherwise.
(C closed and convex)
6. 1
Example: Regularization
Inverse problem: measurements y = Kf0 + w
f0 Kf0
K K : RN RP , P N
7. 1
Example: Regularization
Inverse problem: measurements y = Kf0 + w
f0 Kf0
K K : RN RP , P N
Model: f0 = x0 sparse in dictionary RN Q
,Q N.
x RQ f= x R N
K y = Kf RP
coe cients image observations
= K ⇥ ⇥ RP Q
8. 1
Example: Regularization
Inverse problem: measurements y = Kf0 + w
f0 Kf0
K K : RN RP , P N
Model: f0 = x0 sparse in dictionary RN Q
,Q N.
x RQ f= x R N
K y = Kf RP
coe cients image observations
= K ⇥ ⇥ RP Q
Sparse recovery: f = x where x solves
1
min ||y x||2 + ||x||1
x RN 2
Fidelity Regularization
9. 1
Example: Regularization
Inpainting: masking operator K
fi if i , c
(Kf )i =
0 otherwise.
K : RN RP P =| |
RN Q
translation invariant wavelet frame.
Orignal f0 = x0 y = x0 + w Recovery x
12. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
13. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x argmin G(x) 0 G(x )
x H
14. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x argmin G(x) 0 G(x )
x H U (x)
x
Monotone operator: U (x) = G(x)
(u, v) U (x) U (y), y x, v u 0
15. 1
Example: Regularization
1
x ⇥ argmin G(x) = ||y x||2 + ||x||1
x RQ 2
⇥G(x) = ( x y) + ⇥|| · ||1 (x)
sign(xi ) if xi ⇥= 0,
|| · ||1 (x)i =
[ 1, 1] if xi = 0.
16. 1
Example: Regularization
1
x ⇥ argmin G(x) = ||y x||2 + ||x||1
x RQ 2
⇥G(x) = ( x y) + ⇥|| · ||1 (x)
sign(xi ) if xi ⇥= 0, xi
|| · ||1 (x)i =
[ 1, 1] if xi = 0.
i
Support of the solution:
I = {i ⇥ {0, . . . , N 1} xi ⇤= 0}
17. 1
Example: Regularization
1
x ⇥ argmin G(x) = ||y x||2 + ||x||1
x RQ 2
⇥G(x) = ( x y) + ⇥|| · ||1 (x)
sign(xi ) if xi ⇥= 0, xi
|| · ||1 (x)i =
[ 1, 1] if xi = 0.
i
Support of the solution:
I = {i ⇥ {0, . . . , N 1} xi ⇤= 0}
First-order conditions: i, y x
i
s RN , ( x y) + s = 0
sI = sign(xI ),
||sI c || 1.
18. Example: Total Variation Denoising
Important: the optimization variable is f .
1
f ⇥ argmin ||y f ||2 + J(f )
f RN 2
Finite di erence gradient: :R N
R N 2 ( f )i R2
Discrete TV norm: J(f ) = ||( f )i ||
i
= 0 (noisy)
19. Example: Total Variation Denoising
1
f ⇥ argmin ||y f ||2 + J(f )
f RN 2
J(f ) = G( f ) G(u) = ||ui ||
i
Composition by linear maps: (J A) = A ( J) A
J(f ) = div ( G( f ))
ui
if ui ⇥= 0,
⇥G(u)i = ||ui ||
R2 || || 1 if ui = 0.
20. Example: Total Variation Denoising
1
f ⇥ argmin ||y f ||2 + J(f )
f RN 2
J(f ) = G( f ) G(u) = ||ui ||
i
Composition by linear maps: (J A) = A ( J) A
J(f ) = div ( G( f ))
ui
if ui ⇥= 0,
⇥G(u)i = ||ui ||
R2 || || 1 if ui = 0.
First-order conditions: v RN 2
, f = y + div(v)
fi
⇥i I, vi = || fi || ,
I = {i (⇥f )i = 0}
⇥i I c , ||vi || 1
27. Proximal Calculus
Separability: G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals: G(x) = || x y||2
2
Prox G = (Id + ) 1
= (Id + ) 1
Composition by tight frame: A A = Id
ProxG A (x) =A ProxG A + Id A A
28. Proximal Calculus
Separability: G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals: G(x) = || x y||2
2
Prox G = (Id + ) 1
= (Id + ) 1
Composition by tight frame: A A = Id
ProxG A (x) =A ProxG A + Id A A
x
Indicators: G(x) = C (x)
C
Prox G (x) = ProjC (x) ProjC (x)
= argmin ||x z||
z C
29. Prox and Subdifferential
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z) z = (Id + ⇥G) 1
(x)
Inverse of a set-valued mapping:
where x U (y) y U 1
(x)
Prox G = (Id + ⇥G) 1
is a single-valued mapping
30. Prox and Subdifferential
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z) z = (Id + ⇥G) 1
(x)
Inverse of a set-valued mapping:
where x U (y) y U 1
(x)
Prox G = (Id + ⇥G) 1
is a single-valued mapping
Fix point: x argmin G(x)
x
0 G(x ) x (Id + ⇥G)(x )
x⇥ = (Id + ⇥G) 1
(x⇥ ) = Prox G (x⇥ )
31. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
32. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
Sub-gradient descent: x( +1)
= x( )
v( ) , v( )
G(x( ) )
Theorem: If 1/⇥, x( )
x a solution.
Problem: slow.
33. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
Sub-gradient descent: x( +1)
= x( )
v( ) , v( )
G(x( ) )
Theorem: If 1/⇥, x( )
x a solution.
Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit]
Theorem: If c > 0, x( )
x a solution.
Prox G hard to compute.
36. Proximal Splitting Methods
Solve min E(x)
x H
Problem: Prox E is not available.
Splitting: E(x) = F (x) + Gi (x)
i
Smooth Simple
37. Proximal Splitting Methods
Solve min E(x)
x H
Problem: Prox E is not available.
Splitting: E(x) = F (x) + Gi (x)
i
Smooth Simple
F (x)
Iterative algorithms using:
Prox Gi (x)
solves
Forward-Backward: F + G
Douglas-Rachford: Gi
Primal-Dual: Gi A
Generalized FB: F+ Gi
38. Smooth + Simple Splitting
Inverse problem: measurements y = Kf0 + w
f0 Kf0
K K : RN RP , P N
Model: f0 = x0 sparse in dictionary .
Sparse recovery: f = x where x solves
min F (x) + G(x)
x RN
Smooth Simple
1
Data fidelity: F (x) = ||y x||2 =K ⇥
2
Regularization: G(x) = ||x||1 = |xi |
i
40. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
41. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
Projected gradient descent: G= C
42. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
Projected gradient descent: G= C
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
43. Example: L1 Regularization
1
min || x y||2 + ||x||1 min F (x) + G(x)
x 2 x
1
F (x) = || x y||2
2
F (x) = ( x y) L = || ||
G(x) = ||x||1
⇥
Prox G (x)i = max 0, 1 xi
|xi |
Forward-backward Iterative soft thresholding
44. Convergence Speed
min E(x) = F (x) + G(x)
x
F is L-Lipschitz.
G is simple.
Theorem: If L > 0, FB iterates x( )
satisfies
E(x( ) ) E(x ) C/
C degrades with L 0.
45. Multi-steps Accelerations
Beck-Teboule accelerated FB: t(0) = 1
✓ ◆
(`+1) (`) 1
x = Prox1/L y rF (y (`) )
L
1+ 1 + 4(t( ) )2
t( +1) =
2()
t 1 (
y ( +1)
=x( +1)
+ ( +1) (x +1)
x( ) )
t
(see also Nesterov method)
C
Theorem: If L > 0, E(x ( )
) E(x )
Complexity theory: optimal in a worse-case sense.
47. Douglas Rachford Scheme
min G1 (x) + G2 (x) ( )
x
Simple Simple
Douglas-Rachford iterations:
z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) )
2 2
x(⇥+1) = Prox G2 (z (⇥+1) )
Reflexive prox:
RProx G (x) = 2Prox G (x) x
48. Douglas Rachford Scheme
min G1 (x) + G2 (x) ( )
x
Simple Simple
Douglas-Rachford iterations:
z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) )
2 2
x(⇥+1) = Prox G2 (z (⇥+1) )
Reflexive prox:
RProx G (x) = 2Prox G (x) x
Theorem: If 0 < < 2 and ⇥ > 0,
x( )
x a solution of ( )
49. DR Fix Point Equation
min G1 (x) + G2 (x) 0 (G1 + G2 )(x)
x
z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x)
x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
50. DR Fix Point Equation
min G1 (x) + G2 (x) 0 (G1 + G2 )(x)
x
z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x)
x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
x = Prox G2 (2x z) = Prox G2 RProx G1 (z)
z = 2Prox G2 RProx G1 (y) (2x z)
z = 2Prox G2 RProx G1 (z) RProx G1 (z)
z = RProx G2 RProx G1 (z)
z= 1 z+ RProx G2 RProx G1 (z)
2 2
51. Example: Constrainted L1
min ||x||1 min G1 (x) + G2 (x)
x=y x
G1 (x) = iC (x), C = {x x = y}
Prox G1 (x) = ProjC (x) = x +
⇥
( ⇥
) 1
(y x)
G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi
|xi | i
e⇥cient if easy to invert.
52. Example: Constrainted L1
min ||x||1 min G1 (x) + G2 (x)
x=y x
G1 (x) = iC (x), C = {x x = y}
Prox G1 (x) = ProjC (x) = x +
⇥
( ⇥
) 1
(y x)
G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi
|xi | i
e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 )
1
Example: compressed sensing −1
0
R100 400
Gaussian matrix −2
−3 = 0.01
y = x0 ||x0 ||0 = 17 −4 =1
−5
= 10
50 100 150 200 250
53. More than 2 Functionals
min G1 (x) + . . . + Gk (x) each Fi is simple
x
min G(x1 , . . . , xk ) + C (x1 , . . . , xk )
x
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk ) Hk x1 = . . . = xk
54. More than 2 Functionals
min G1 (x) + . . . + Gk (x) each Fi is simple
x
min G(x1 , . . . , xk ) + C (x1 , . . . , xk )
x
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk ) Hk x1 = . . . = xk
G and C are simple:
Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i
1
Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x)
x ˜ where x =
˜ xi
k i
55. Auxiliary Variables
min G1 (x) + G2 A(x) Linear map A : E H.
x
min G(z) + C (z) G1 , G2 simple.
z⇥H E
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H E Ax = y}
56. Auxiliary Variables
min G1 (x) + G2 A(x) Linear map A : E H.
x
min G(z) + C (z) G1 , G2 simple.
z⇥H E
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H E Ax = y}
Prox G (x, y) = (Prox G1 (x), Prox G2 (y))
Prox C (x, y) = (x + A y , y
˜ y ) = (˜, A˜)
˜ x x
y = (Id + AA )
˜ 1
(Ax y)
where
x = (Id + A A)
˜ 1
(A y + x)
e cient if Id + AA or Id + A A easy to invert.
57. Example: TV Regularization
1 ||u||1 = ||ui ||
min ||Kf y||2 + ||⇥f ||1
f 2 i
min G1 (f ) + G2 (f )
x
G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui
||ui ||
1
G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1
K
2 G2
C = (f, u) ⇥ RN RN 2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
58. Example: TV Regularization
1 ||u||1 = ||ui ||
min ||Kf y||2 + ||⇥f ||1
f 2 i
min G1 (f ) + G2 (f )
x
G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui
||ui ||
1
G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1
K
2 G2
C = (f, u) ⇥ RN RN 2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
Compute the solution of: (Id + ˜
)f = div(u) + f
O(N log(N )) operations using FFT.
61. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
Smooth Simple
i = 1, . . . , n,
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x
(⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
62. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
Smooth Simple
i = 1, . . . , n,
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x (⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
63. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
Smooth Simple
i = 1, . . . , n,
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x (⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
n=1 Forward-backward.
F =0 Douglas-Rachford.
64. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
65. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
66. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
(2x zi F (x )) x n ⇥Gi (x )
x⇥ = Proxn Gi (2x⇥ zi F (x⇥ ))
zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
67. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
(2x zi F (x )) x n ⇥Gi (x )
x⇥ = Proxn Gi (2x⇥ zi F (x⇥ ))
zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
+ Fix point equation on (x , z1 , . . . , zn ).
68. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
(2) Bk
2
+ ` 1 `2
4
k=1 x 1,2
b B1 i b xi
⇥ x⇥⇥1 = i ⇥xi ⇥ b B i b xi2 +
i b xi
N: 256
b B2
b B
Image f = x Coe cients x.
69. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
Non-overlapping decomposition: B = B ... B
Towards More Complex Penalization
Towards More Complex Penalization
n
1 n
2 G(x) =4 x iBk
(2)
+ ` ` k=1 G 1,2
(x) Gi (x) = ||x[b] ||,
1 2 i=1 b Bi
b b 1b1 B1 i b xiixb xi
22
BB
⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
⇥= ++ +
i b i
⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ bb B B i
Bb xii2bi2xi2
bbx
i
N: 256
b b 2b2 B2 i
BB xi2 b2xi
b b xi
i
b B
Image f = x Coe cients x. Blocks B1 B1 B2
70. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
Non-overlapping decomposition: B = B ... B
Towards More Complex Penalization
Towards More Complex Penalization
n
1 n
2 G(x) =4 x iBk
(2)
+ ` ` k=1 G 1,2
(x) Gi (x) = ||x[b] ||,
1 2 i=1 b Bi
Each Gi is simple: b b 1b1 B1 i b xiixb xi
BB
22
⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
⇥ ⇥1 = i ⇥i i x +
i b i
⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
= Bb bx ++m
N: 256 ||x[b]b||B xi2 b2xi
2 2 B2
b B b i b b xi
i
b B
Image f = x Coe cients x. Blocks B1 B1 B2
71. Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
Deconv. x 2Inpaint. min 2 ⇥ ` `
x x
k=1 x+1,2` k=1
log10(E−E
2 1 `2
Numerical Illustration
log10(E−
1
Numerical Experiments Experiments
1
Numerical 1
TI (2)`2 4
0
||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K x= + `wavelets x
Bk 2
0
: 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2)
Deconvolution min 2 Y ⇥ K
tmin
−1 EFB
x 102
Deconvolution +GCP: 1` 4
−1
tEFB 2 +
10 40 20
368s
30 1 2 2 40k=1
`
x 1,2 1 k=1
20 30
3 iteration 3
# i
EFB iteration # EFB
log10(E−Emin)
log10(E−Emin)
PR PR
2 = convolution
2 = inpainting+convolution l1/l2
l1/l2
: 1.30e−03; CPλ2 : 1.30e−03; CP 2
λ
tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
tEFB: 161s; tPR: 173s; tCP N: 256
tEFB: 161s; noise: 0.025; :convol.: 2 190s
1
Numerical Experiments 2 1
EFB
it. N: 256
EFB
(4) Bk
Y ⇥P K +
0 0
log10(E−Emin)
3 3
1
PR
2 PR 16
onv. + Inpaint. minx
2
CP
2 30
2
x CP
`140`2 k=1 x 1,2
10 20 10 40 20 30
1 iteration #
1 iteration #
0 0 λ4 : 1.00e−03; λ4 : 1.00e−03;
l1/l2 l1/l2
tEFB: 283s; tPR: 298s; tCP: 368s
−1 noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
noise: convol.: 2
−1 it. #50; SNR: 21.80dB #50; SNR: 21.80dB
it.
10 20
iteration #
30
EFB
40 10 20
iteration #
30 40 x0
3
PR
min
2
CP
λ2 : 1.30e−03; λ2 : 1.30e−03;
l1/l2 l1/l2
1 noise: 0.025; convol.: 2 noise: 0.025; it. #50; SNR: 22.49dB
convol.: 2 it. #50; SNR: 22.49dB
10
0
log10
10 20
(E(x( ) ) #
iteration
30
E(x ))
y = x0 + w 40
x
4
75. Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) = sup u, x G(x) eu
x dom(G) G(x) S lop
G (u)
Example: quadratic functional
1 x
G(x) = Ax, x + x, b
2
1
G (u) = u b, A 1 (u b)
2
Moreau’s identity:
Prox G (x) = x ProxG/ (x/ )
G simple G simple
76. Indicator and Homogeneous
Positively 1-homogeneous functional: G( x) = |x|G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
77. Indicator and Homogeneous
Positively 1-homogeneous functional: G( x) = |x|G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
p
norms: G(x) = ||x||p 1 1
+ =1 1 p, q +
G (x) = ||x||q p q
78. Indicator and Homogeneous
Positively 1-homogeneous functional: G( x) = |x|G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
p
norms: G(x) = ||x||p 1 1
+ =1 1 p, q +
G (x) = ||x||q p q
Example: Proximal operator of norm
Prox ||·|| = Id Proj||·||1
Proj||·||1 (x)i = max 0, 1 xi
|xi |
for a well-chosen ⇥ = ⇥ (x, )
80. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
81. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
82. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
() A⇤ u? 2 @G1 (x? )
() x? 2 (@G1 ) 1
( A⇤ u? ) = @G⇤ ( A⇤ s? )
1
83. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
84. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
x? uniquely defined.
x? = rG? ( A⇤ u? )
1
G? is of class C 1 .
1
85. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
x? uniquely defined.
x? = rG? ( A⇤ u? )
1
G? is of class C 1 .
1
FB on the dual: min G1 (x) + G2 A(x)
x2H
= min G? ( A⇤ u) + G? (u)
1 2
u2L
Smooth Simple
⇣ ⌘
u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
2 1
86. Example: TV Denoising
1
min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2
f RN 2 ||u||
||u||1 = ||ui || ||u|| = max ||ui ||
i
i
Dual solution u Primal solution f = y + div(u )
[Chambolle 2004]
87. Example: TV Denoising
1
min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2
f RN 2 ||u||
||u||1 = ||ui || ||u|| = max ||ui ||
i
i
Dual solution u Primal solution f = y + div(u )
FB (aka projected gradient descent): [Chambolle 2004]
u( +1)
= Proj||·|| u( ) + (y + div(u( ) ))
ui
v = Proj||·|| (u) vi =
max(||ui ||/ , 1)
2 1
Convergence if < =
||div ⇥|| 4
88. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
89. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
z (`+1) = Prox G⇤
2
(z (`) + A(˜(`) )
x
x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) ))
˜
x( +1)
= x( +1)
+ (x( +1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
90. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
z (`+1) = Prox G⇤
2
(z (`) + A(˜(`) )
x
x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) ))
˜
x( +1)
= x( +1)
+ (x( +1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
Theorem: [Chambolle-Pock 2011]
If 0 1 and ⇥⇤ ||A||2 < 1 then
x( )
x minimizer of G1 + G2 A.
91. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
Highly structured (separability, p
norms, . . . ).
92. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex. b B1 i b xi
2
⇥ x⇥⇥1 = i ⇥xi ⇥ b B
2
i p xi +
Highly structured (separability, b
norms, . . . ).
b B2 i b xi2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Decomposition G = k Gk
93. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex. b B1 i b xi
2
⇥ x⇥⇥1 = i ⇥xi ⇥ b B
2
i p xi +
Highly structured (separability, b
norms, . . . ).
b B2 i b xi2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Open problems:
Decomposition G = k Gk
Less structured problems without smoothness.
Non-convex optimization.