Robust Sparse Analysis Recovery

Robust Sparse
Analysis Recovery
Joint work with:
Gabriel Peyré
Samuel Vaiter
Charles Dossal
Jalal Fadili

www.numerical-tours.com

Overview

• Synthesis vs. Analysis Regularization

• Risk Estimation

• Local Behavior of Sparse Regularization

• Robustness to Noise

• Numerical Illustrations

Inverse Problems
Recovering x0 RN from noisy observations
y = x0 + w RP

Inverse Problems
y = x0 + w RP

Examples: Inpainting, super-resolution, compressed-sensing

x0

Inverse Problems
y = x0 + w RP

Examples: Inpainting, super-resolution, compressed-sensing

x0
Regularized inversion:
1
x (y) 2 argmin ||y x||2 + J(x)
x2RN 2
Data ﬁdelity Regularity

Sparse Regularizations
Synthesis regularization

Coe cients Image x =
1
min ||y ⇥ ||2 + ⇥|| ||1
RQ 2
2

Synthesis regularization Analysis regularization

D

Coe cients Image x = Image x Correlations
1 1 =D x
min ||y ⇥ ||2 + ⇥|| ||1 min ||y x||2 + ||D x||1
RQ 2
2
x RN 2
2


D

1 1 =D x
min ||y ⇥ ||2 + ⇥|| ||1 min ||y x||2 + ||D x||1
RQ 2
2
x RN 2
2

=x D⇤ x
=

6= 0


D

1 1 =D x
min ||y ⇥ ||2 + ⇥|| ||1 min ||y x||2 + ||D x||1
RQ 2
2
x RN 2
2

=x D⇤ x
=

6= 0

Unless D = is orthogonal, produces di⇥erent results.

Variations and Stability
Observations: y = x0 + w
Recovery: 1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
0+

x2RN 2

x0+ (y) 2 argmin ||D⇤ x||1 (no noise) (P0 (y))
x=y

Recovery: 1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
0+

x2RN 2

x=y
Questions:
– Behavior of x (y) with respect to y and .
! Application: risk estimation (SURE, GCV, etc.)

Recovery: 1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
0+

x2RN 2

x=y
Questions:
– Criteria to ensure ||x (y) x0 || 6 C||w||
(with “reasonable” C)

Recovery: 1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
0+

x2RN 2

x=y
Questions:
– Criteria to ensure ||x (y) x0 || 6 C||w||
(with “reasonable” C)
Synthesis case (D = Id): works of Fuchs and Tropp.
Analysis case: [Nam et al. 2011] for w = 0.

Overview

• Inverse Problems

• Stein Unbiased Risk Estimators

• Theory: SURE for Sparse Regularization

• Practice: SURE for Iterative Algorithms

How to choose the value of the parameter ?
Risk Minimization
Risk-based selection of

I RiskAveragetorisk:
associated : measure of
) expected quality of x 2 )
R( the= Ew (||x (y) x?(y, 0)||wrt x ,
0
? ||x?(y, ) x0||2 .
R( ) = EwPlugin-estimator:
(y) = argmin R( ) x ? (y) (y)
I The optimal (theoretical) minimizes the risk.

The risk is unknown since it depends on x0.
Can we estimate the risk solely from x?(y, )?

Risk Minimization

R( the= Ew (||x (y) x?(y, 0)||wrt x ,
0
? ||x?(y, ) x0||2 .
(y) = argmin R( ) x ? (y) (y)

Ew is The risk is unknown ! usedepends on x0.
not accessible since it one observation.
But:
Can we estimate the risk solely from x?(y, )?

Risk Minimization

R( the= Ew (||x (y) x?(y, 0)||wrt x ,
0
? ||x?(y, ) x0||2 .
(y) = argmin R( ) x ? (y) (y)

Ew is The risk is unknown ! usedepends on x0.
not accessible since it one observation.
But:
x0 isCan weaccessible ! needs risk?(y, )?
not estimate the risk solely from x estimators.

Prediction Risk Estimation
Prediction: µ (y) = x (y)
Sensitivity analysis: if µ is weakly di↵erentiable
µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )

µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
Stein Unbiased Risk Estimator:
SURE (y) = ||y µ (y)||2 2
P +2 2
df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
SURE (y) = ||y µ (y)||2 2
P +2 2
df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 µ (y)||2 )

µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
SURE (y) = ||y µ (y)||2 2
P +2 2
df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 µ (y)||2 )

Other estimators: GCV, BIC, AIC, . . .
Requires (not always available)
SURE:
Unbiased and good practical performances

Generalized SURE
Problem: || x0 x (y)|| poor indicator of ||x0 x (y)||.
Generalized SURE: take into account risk on ker( )?

Generalized SURE
⇤ +
GSURE (y) = ||ˆ(y)
x x (y)||2 2
tr(( ) )+2 2
gdf (y)
⇤ +
Generalized df: gdf (y) = tr ( ) @µ (y)
Projker( )? = ⇧ = ⇤ ( ⇤ +
)
⇤ ⇤ +
ML estimator: x(y) =
ˆ ( ) y.

Generalized SURE
⇤ +
GSURE (y) = ||ˆ(y)
x x (y)||2 2
tr(( ) )+2 2
gdf (y)
⇤ +
Projker( )? = ⇧ = ⇤ ( ⇤ +
)
⇤ ⇤ +
ˆ ( ) y.

Theorem: [Eldar 09, Pesquet al. 09, Vonesh et al. 08]
Ew (GSURE (y)) = Ew (||⇧(x0 x (y))||2 )

Generalized SURE
⇤ +
GSURE (y) = ||ˆ(y)
x x (y)||2 2
tr(( ) )+2 2
gdf (y)
⇤ +
Projker( )? = ⇧ = ⇤ ( ⇤ +
)
⇤ ⇤ +
ˆ ( ) y.

Theorem: [Eldar 09, Pesquet al. 09, Vonesh et al. 08]
Ew (GSURE (y)) = Ew (||⇧(x0 x (y))||2 )

! How to compute @µ (y) ?

Overview



• SURE Unbiased Risk Estimator



Recovery:
1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2


Recovery:
1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2

Remark: x (y) not always unique but
µ (y) = x (y) always unique.


Recovery:
1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2

Remark: x (y) not always unique but
µ (y) = x (y) always unique.
Questions:
– When is y ! µ (y) di↵erentiable ?
– Formula for @µ (y).

TV-1D Polytope
TV-1D ball: B = {x ||D x||1 1} 1 1 0 0
0 1 1 0
Displayed in {x ⇥x, 1⇤ = 0} R3 D =
0 0 1 1
x (y) 2 argmin ||D⇤ x||1 1 0 0 1
y= x

Copyright c Mael

TV-1D Polytope
TV-1D ball: B = {x ||D x||1 1} 1 1 0 0
0 1 1 0
Displayed in {x ⇥x, 1⇤ = 0} R3 D =
0 0 1 1
x (y) 2 argmin ||D⇤ x||1 1 0 0 1
y= x
{x x
y=
x}

Copyright c Mael

Union of Subspaces Model
1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2

Support of the solution:
I = {i (D⇤ x (y))i 6= 0}
J = Ic

1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2
x
I = {i (D⇤ x (y))i 6= 0}
I
J = Ic

1-D total variation: D x = (xi xi 1 )i

1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2
x
I = {i (D⇤ x (y))i 6= 0}
I
J = Ic


Sub-space model: GJ = ker(DJ ) = Im(DJ )

1
x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y))
x2RN 2
x
I = {i (D⇤ x (y))i 6= 0}
I
J = Ic


Sub-space model: GJ = ker(DJ ) = Im(DJ )

Local well-posedness: ker( ) GJ = {0} (HJ )

Lemma: There exists a solution x? such that (HJ ) holds.

Local Sign Stability
1
x (y) 2 argminx || x y||2 + ||D⇤ x||1 (P (y))
2
Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H.
/

To be understood: there exists a solution with same sign.

2 1 2
1
2 1
0 2
1
2 1 2
dim(GJ )

1
2
/

⇤
Linearized problem: = sign(DI x (y))
1
x ¯ (¯) = argmin || x
ˆ y y ||2 + ¯ DI x, sI
¯
x GJ 2

2 1 2
1
2 1
0 2
1
2 1 2
dim(GJ )

1
2
/

⇤
1
x ¯ (¯) = argmin || x y ||2 + ¯ DI x, sI
ˆ y ¯
x GJ 2
= A[J] y ¯ DI sI
¯
1
A z = argmin || x||2
[J]
x, z 2 1 2
x GJ 2 1
2 1
0 2
1
2 1 2
dim(GJ )

1
2
/

⇤
1
x ¯ (¯) = argmin || x y ||2 + ¯ DI x, sI
ˆ y ¯
x GJ 2
= A[J] y ¯ DI sI
¯
1
A z = argmin || x||2
[J]
x, z 2 1 2
x GJ 2 1
2 1
0
Theorem: If (y, ) / H, for (¯, ¯ )
y 2
1
2 1
close to (y, ), x ¯ (¯) is a solution of P ¯ (¯).
ˆ y y 2
dim(GJ )

Local Affine Maps
Local parameterization: x ¯ (¯) = A[J]
ˆ y ¯
y ¯ A[J] DI sI

Under uniqueness assumption:
y 7! x (y)
are piecewise a ne functions.
7! x (y)

breaking points

x 0+ change of support of D⇤ x (y)

x k
=0
0 =0 k

Application to GSURE
⇤
For y 2 H, one has locally:
/ µ (y) = A[J] y + cst.

Corollary: Let I = supp(D⇤ x (y)) such that HJ holds.
df (y) = tr (µ (y)) = dim( A[J] ⇤ )
df (y) = ||x (y)||0 for D = Id (synthesis)
gdf (y) = tr(⇧A[J] )
are unbiased estimators of df and gdf.

⇤
/ µ (y) = A[J] y + cst.

df (y) = tr (µ (y)) = dim( A[J] ⇤ )
Trick: tr(A) = Ez (hAz, zi), z ⇠ N (0, IdP ).

⇤
/ µ (y) = A[J] y + cst.

df (y) = tr (µ (y)) = dim( A[J] ⇤ )
Trick: tr(A) = Ez (hAz, zi), z ⇠ N (0, IdP ).

Proposition: gdf (y) = Ez (h⌫(z), + zi), z ⇠ N (0, IdP )
✓ ⇤ ◆✓ ◆ ✓ ⇤ ◆
DJ ⌫(z) z
where ⌫(z) solves ⇤ =
DJ 0 ⌫
˜ 0
1
PK +
In practice: gdf (y) ⇡ K k=1 h⌫(zk ), zk i, zk ⇠ N (0, IdP ).

pressed-sensing using multi-scale wavelet thresholding
CS with Wavelets

Quadr
1.5
6
x 106
2.5 x 10
2 RP ⇥N 2.5
realization of a random vector.
P 1 N/4
=
(b) x?(y, ) at the optimal 2 4 6 8 10 12
Regularization parameter λ
D: wavelet TI redundant tight frame.
Compressed-sensing using multi-scale wavelet thresholding
2

Quadratic loss
2

Quadratic loss
6
x 10
2.5
+ Projection Risk
y
(c) xM L GSURE
(c) xM L True Risk
Quadratic loss
2
Quadratic loss

1.5
(c) xM L 1.5

1.5

1
x x ? (y)the optimal 1
2
14 ?2
6 8 4 10 12 6
?
(d) x (y, ) at the optimal
?
(d) (y, ) at 2 4
Regularization parameter λ 6

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system
Anisotropic Total-Variation
✓ ⇤
DJ
◆✓ ◆ ✓ ⇤ ◆
⌫ zx 1066
⇤ = .
DJ 0 ⌫
˜ 2.5 0
2.5
I : vertical sub-sampling.
In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver
Finite di↵erences gradient:
Numerical example D = [@1 , @2 ]
Super-resolution using (anisotropic) Total-Variation
Observations y 2
2
(a) y

Quadratic loss
(a) y

Quadratic loss
6
x 10
2.5
Projection Risk
GSURE
True Risk
Quadratic loss
2
(a) y
Quadratic loss

1.5
1.5

1.5

1 1
1
? x (y)
(b) x?(y, ? at the optimal
) 2 4 ?6 8 10 12

Overview



• Robustness to Noise


Identifiability Criterion
Identiﬁability criterion of a sign: we suppose (HJ ) holds
IC(s) = min || sI u|| (convex ! computable)
u Ker DJ

where = DJ (Id
+
A[J] )DI

u Ker DJ

where = DJ (Id
+
A[J] )DI

Discrete 1-D derivative:
D x = (xi xi 1 )i x
= Id (denoising)

u Ker DJ

where = DJ (Id
+
A[J] )DI

Discrete 1-D derivative:
D x = (xi xi 1 )i x
= Id (denoising)
+1
IC(s) = || J || J

sI = sign(DI x) sI
1
J = sI IC(sign(D x)) < 1

Robustness to Small Noise
IC(s) = min || sI u|| = DJ (Id
+
A[J] )DI
u Ker DJ

+
A[J] )DI
u Ker DJ

Theorem: If IC(sign(D x0 )) < 1 T = min |(D x0 )i |
i I
If ||w||/T is small enough and ||w||, then
x0 + A[J] w A[J] DI sI ,
is the unique solution of P (y).

+
A[J] )DI
u Ker DJ

i I

Theorem: If IC(sign(D x0 )) < 1
x0 is the unique solution of P0 ( x0 ).

+
A[J] )DI
u Ker DJ

i I

Theorem: If IC(sign(D x0 )) < 1
x0 is the unique solution of P0 ( x0 ).

When D = Id, results of J.J. Fuchs.

Robustness to Bounded Noise
Robustness criterion: RC(I) = max min || pI u||
||pI || 1 u ker(DJ )
= IC(p)

Robustness to Bounded Noise
Robustness criterion: RC(I) = max min || pI u||
||pI || 1 u ker(DJ )
= IC(p)

Theorem: If RC(I) < 1 for I = Supp(D x0 ), setting
cJ
= ⇥||w||2 with ⇥ > 1
1 RC(I)
P (y) has a unique solution x⇥ GJ and
||x0 x ||2 CJ ||w||2

Constants: cJ = ||DJ
+
( A[J] Id)||2,
cJ
CJ = ||A [J]
||2,2 || ||2,2 + ||DI ||2,
1 RC(I)
When D = Id, results of Tropp (ERC)

Source Condition
Noiseless CNS: x0 2 argmin ||D⇤ x||1
y= x D↵
(SC↵ ) x0
() 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im( ⇤

x
)
||D⇤ x||1 6 c

y=

Source Condition
y= x D↵
(SC↵ ) x0
() 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im( ⇤

x
)
||D⇤ x||1 6 c

y=
Theorem: If (HJ ), (SC↵ ) and ||↵J ||1 < 1, then
||D⇤ (x? x0 )|| = O(||w||)
[Grassmair, Inverse Prob., 2011]

Source Condition
y= x D↵
(SC↵ ) x0
() 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im( ⇤

x
)
||D⇤ x||1 6 c

y=
Theorem: If (HJ ), (SC↵ ) and ||↵J ||1 < 1, then
||D⇤ (x? x0 )|| = O(||w||)
[Grassmair, Inverse Prob., 2011]
⇢
⇤ ↵J = ⌦sI u
Proposition: Let s = sign(D x0 ) and
↵I = sI
Then IC(s) < 1 =) (SC↵ ) and ||↵J ||1 = IC(s).

IC(s) = min || sI u||
u Ker DJ

Example: Total Variation in 1-D
Discrete 1-D derivative: {GJ dim GJ = k}
x
D x = (xi xi 1 )i

Denoising = Id.
Signals with k 1 steps

x
D x = (xi xi 1 )i

Denoising = Id.
= sign(DI x) Signals with k 1 steps
IC(s) = || J ||
I
J = sign(DI x)

x

+1
J
I
1
IC(sign(D x)) < 1
Small noise robustness

x
D x = (xi xi 1 )i

Denoising = Id.
= sign(DI x) Signals with k 1 steps
IC(s) = || J ||
I
J = sign(DI x)

x
x

+1 +1
J J
I
1 1
IC(sign(D x)) < 1 IC(sign(D x)) = 1
Small noise robustness

Staircasing Example
Case IC= 1: = +
y x0 w
x (y)
yi
Support recovery.

i

Staircasing Example
Case IC= 1: = +
y x0 w
x (y)
yi
Support recovery.

i x (y)
yi
No support recovery.

i


Directional derivatives:
D1 x = (xi,j xi 1,j )i,j RN
D2 x = (xi,j xi,j 1 )i,j RN

Gradient:
D x = (D1 x, D2 x) RN 2

IC(s) < 1 IC(s) > 1
x x
Directional derivatives:
D1 x = (xi,j xi 1,j )i,j RN
D2 x = (xi,j xi,j 1 )i,j RN

Gradient: I I

D x = (D1 x, D2 x) RN 2

Dual vector:
I = sign(DI x)
J = sign(DI x)

Example: Fused Lasso
{GJ dim GJ = k}
Total variation and 1
hybrid: x
D x = (xi xi 1 )i ( xi )i
Compressed sensing:
Sum of k interval indicators.
Gaussian 2 RQ⇥N .

Example: Fused Lasso
{GJ dim GJ = k}
Total variation and 1
hybrid: x
D x = (xi xi 1 )i ( xi )i
Compressed sensing:
Sum of k interval indicators.
Gaussian 2 RQ⇥N .
Q
Probabilit´ P (⌘, ", N ) of the even IC< 1.
e 0 1
⌘ ⌘
x0
⌘
⌘

Q
" N

Example: Invariant Haar Analysis
Haar wavelets: 8 (j)
1 < +1 if 0 6 i < 2j
(j)
i = 1 if 2j 6 i < 0
2⌧ (j+1) :
0 otherwise.
Haar TI analysis:
(j)
||D x||1 = j ||x (j)
||1 = j ||x (j)
||TV
2j+1

be written as the sum over scales of the TV semi-norms of
filtered versions of the signal. As such, it can be understood as
◆ Example: Invariant Haar Analysis
a sort of multiscale total variation regularization. Apart from a
1l4 , multiplicative factor, one recovers Total Variation when Jmax =
M 8
Haar wavelets:
0.
< +1 if 0 6 < 2j
We consider a noiseless convolution setting (for Ni= 256)
(j)
is =
1
where (j) a circular convolution operator with a2j 6 i < 0
Gaussian
i 1 if
kernel of standard deviation :We first study the impact of
2⌧ (j+1) . 0 otherwise.
on the identifiability criterion IC. The blurred signal x⌘ is a
plays
Haar TI analysis:
centered boxcar signal with a support of size 2⌘N
(j)
worth
||D x||1 = j ||x ||1 =
(j)
j ⇤||x
x⌘ = 1{bN/2 ⌘N c,...,bN/2+⌘N c} , ⌘ 2 (0, 1/2] . (j)
||TV
Figure 3 displays the evolution of IC(sign(DH x0 ) as a
function of for three dictionaries (⌧ = 1, ⌧ = 0.5 and the
De-blurring: x = ' ? x, '(t) / e
t2
2 2
2j+1
Total Variation), where we fixed ⌘ = 0.2. In the identifiability

i ⌧ =1
⌧ = 1/2 x0
IC
TV
2

> 0
x? as
nates

Conclusion

SURE risk estimation:
Computing risk estimates
() Sensitivity analysis.

Conclusion


Open problem:
Fast algorithms to optimize .

SURE
?

Conclusion


Open problem:

Analysis vs. synthesis regularization:
Analysis support is less stable.

SURE
?

Conclusion


Open problem:

Analysis vs. synthesis regularization:
Analysis support is less stable.

SURE
Open problem:
Stability without support stability. ?

Robust Sparse Analysis Recovery

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Robust Sparse Analysis Recovery

Similaire à Robust Sparse Analysis Recovery (20)

Plus de Gabriel Peyré

Plus de Gabriel Peyré (17)

Robust Sparse Analysis Recovery