SlideShare une entreprise Scribd logo
1  sur  76
Télécharger pour lire hors ligne
Robust Sparse
Analysis Recovery
                     Joint work with:
   Gabriel Peyré
                      Samuel Vaiter
                      Charles Dossal
                        Jalal Fadili


 www.numerical-tours.com
Overview


• Synthesis vs. Analysis Regularization

• Risk Estimation

• Local Behavior of Sparse Regularization

• Robustness to Noise

• Numerical Illustrations
Inverse Problems
Recovering x0 RN from noisy observations
   y = x0 + w RP
Inverse Problems
Recovering x0 RN from noisy observations
   y = x0 + w RP


Examples: Inpainting, super-resolution, compressed-sensing




                                           x0
Inverse Problems
 Recovering x0 RN from noisy observations
    y = x0 + w RP


 Examples: Inpainting, super-resolution, compressed-sensing




                                            x0
Regularized inversion:
               1
x (y) 2 argmin ||y    x||2 + J(x)
         x2RN 2
              Data fidelity Regularity
Sparse Regularizations
  Synthesis regularization




Coe cients     Image x =
      1
 min ||y      ⇥ ||2 + ⇥|| ||1
   RQ 2
                  2
Sparse Regularizations
  Synthesis regularization        Analysis regularization

                                            D


Coe cients     Image x =          Image x      Correlations
      1                              1            =D x
 min ||y      ⇥ ||2 + ⇥|| ||1    min ||y    x||2 + ||D x||1
   RQ 2
                  2
                                x RN 2
                                               2
Sparse Regularizations
  Synthesis regularization        Analysis regularization

                                             D


Coe cients     Image x =          Image x       Correlations
      1                              1             =D x
 min ||y      ⇥ ||2 + ⇥|| ||1    min ||y     x||2 + ||D x||1
   RQ 2
                  2
                                x RN 2
                                                2


               =x                           D⇤ x
                                                   =

                  6= 0
Sparse Regularizations
  Synthesis regularization           Analysis regularization

                                               D


Coe cients      Image x =           Image x        Correlations
      1                                1              =D x
 min ||y        ⇥ ||2 + ⇥|| ||1    min ||y      x||2 + ||D x||1
   RQ 2
                    2
                                  x RN 2
                                                   2


                 =x                           D⇤ x
                                                     =

                    6= 0

   Unless D =      is orthogonal, produces di⇥erent results.
Variations and Stability
Observations: y = x0 + w
Recovery:                 1
            x (y) 2 argmin || x         y||2 + ||D⇤ x||1 (P (y))
      0+

                     x2RN 2

           x0+ (y) 2 argmin ||D⇤ x||1     (no noise)    (P0 (y))
                       x=y
Variations and Stability
Observations: y = x0 + w
Recovery:                 1
            x (y) 2 argmin || x         y||2 + ||D⇤ x||1 (P (y))
      0+

                     x2RN 2

           x0+ (y) 2 argmin ||D⇤ x||1     (no noise)    (P0 (y))
                       x=y
Questions:
  – Behavior of x (y) with respect to y and .
        ! Application: risk estimation (SURE, GCV, etc.)
Variations and Stability
Observations: y = x0 + w
Recovery:                 1
            x (y) 2 argmin || x          y||2 + ||D⇤ x||1 (P (y))
       0+

                     x2RN 2

            x0+ (y) 2 argmin ||D⇤ x||1     (no noise)    (P0 (y))
                         x=y
Questions:
  – Behavior of x (y) with respect to y and .
        ! Application: risk estimation (SURE, GCV, etc.)
   – Criteria to ensure ||x (y) x0 || 6 C||w||
            (with “reasonable” C)
Variations and Stability
Observations: y = x0 + w
Recovery:                 1
            x (y) 2 argmin || x          y||2 + ||D⇤ x||1 (P (y))
       0+

                     x2RN 2

            x0+ (y) 2 argmin ||D⇤ x||1     (no noise)    (P0 (y))
                         x=y
Questions:
  – Behavior of x (y) with respect to y and .
        ! Application: risk estimation (SURE, GCV, etc.)
   – Criteria to ensure ||x (y) x0 || 6 C||w||
            (with “reasonable” C)
Synthesis case (D = Id): works of Fuchs and Tropp.
Analysis case: [Nam et al. 2011] for w = 0.
Overview


• Inverse Problems

• Stein Unbiased Risk Estimators

• Theory: SURE for Sparse Regularization

• Practice: SURE for Iterative Algorithms
How to choose the value of the parameter   ?
                     Risk Minimization
Risk-based selection of

I   RiskAveragetorisk:
        associated : measure of
                                ) expected quality of x 2 )
                         R( the= Ew (||x (y) x?(y, 0)||wrt x ,
                                                            0
             ?                                 ||x?(y, ) x0||2 .
                                      R( ) = EwPlugin-estimator:
                 (y) = argmin        R(  )                               x   ? (y)   (y)
I   The optimal (theoretical)   minimizes the risk.




                          The risk is unknown since it depends on x0.
                         Can we estimate the risk solely from x?(y, )?
How to choose the value of the parameter   ?
                     Risk Minimization
Risk-based selection of

I   RiskAveragetorisk:
        associated : measure of
                                ) expected quality of x 2 )
                         R( the= Ew (||x (y) x?(y, 0)||wrt x ,
                                                            0
             ?                                 ||x?(y, ) x0||2 .
                                      R( ) = EwPlugin-estimator:
                 (y) = argmin        R(  )                               x   ? (y)   (y)
I   The optimal (theoretical)   minimizes the risk.




                  Ew is The risk is unknown ! usedepends on x0.
                        not accessible since it one observation.
      But:
                         Can we estimate the risk solely from x?(y, )?
How to choose the value of the parameter   ?
                     Risk Minimization
Risk-based selection of

I   RiskAveragetorisk:
        associated : measure of
                                ) expected quality of x 2 )
                         R( the= Ew (||x (y) x?(y, 0)||wrt x ,
                                                            0
             ?                                 ||x?(y, ) x0||2 .
                                      R( ) = EwPlugin-estimator:
                 (y) = argmin        R(  )                              x   ? (y)   (y)
I   The optimal (theoretical)   minimizes the risk.




                  Ew is The risk is unknown ! usedepends on x0.
                        not accessible since it one observation.
      But:
                  x0 isCan weaccessible ! needs risk?(y, )?
                        not estimate the risk solely from x estimators.
Prediction Risk Estimation
Prediction: µ (y) = x (y)
Sensitivity analysis: if µ is weakly di↵erentiable
    µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
Prediction Risk Estimation
Prediction: µ (y) = x (y)
Sensitivity analysis: if µ is weakly di↵erentiable
    µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
Stein Unbiased Risk Estimator:
    SURE (y) = ||y    µ (y)||2    2
                                      P +2   2
                                                 df (y)
    df (y) = tr(@µ (y)) = div(µ )(y)
Prediction Risk Estimation
 Prediction: µ (y) = x (y)
 Sensitivity analysis: if µ is weakly di↵erentiable
     µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
 Stein Unbiased Risk Estimator:
     SURE (y) = ||y    µ (y)||2    2
                                       P +2   2
                                                  df (y)
     df (y) = tr(@µ (y)) = div(µ )(y)

Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0           µ (y)||2 )
Prediction Risk Estimation
 Prediction: µ (y) = x (y)
 Sensitivity analysis: if µ is weakly di↵erentiable
     µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
 Stein Unbiased Risk Estimator:
     SURE (y) = ||y    µ (y)||2    2
                                       P +2   2
                                                  df (y)
     df (y) = tr(@µ (y)) = div(µ )(y)

Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0           µ (y)||2 )

   Other estimators: GCV, BIC, AIC, . . .
             Requires (not always available)
 SURE:
             Unbiased and good practical performances
Generalized SURE
Problem: || x0    x (y)|| poor indicator of ||x0   x (y)||.
Generalized SURE: take into account risk on ker( )?
Generalized SURE
Problem: || x0     x (y)|| poor indicator of ||x0             x (y)||.
Generalized SURE: take into account risk on ker( )?
                                                        ⇤ +
  GSURE (y) = ||ˆ(y)
                x         x (y)||2           2
                                                 tr((   ) )+2    2
                                                                     gdf (y)
                                             ⇤ +
Generalized df:   gdf (y) = tr (             ) @µ (y)
  Projker( )? = ⇧ = ⇤ (        ⇤ +
                                   )
                           ⇤           ⇤ +
  ML estimator: x(y) =
                 ˆ             (       ) y.
Generalized SURE
Problem: || x0     x (y)|| poor indicator of ||x0                x (y)||.
Generalized SURE: take into account risk on ker( )?
                                                          ⇤ +
  GSURE (y) = ||ˆ(y)
                x         x (y)||2           2
                                                 tr((      ) )+2      2
                                                                          gdf (y)
                                             ⇤ +
Generalized df:   gdf (y) = tr (             ) @µ (y)
  Projker( )? = ⇧ = ⇤ (        ⇤ +
                                   )
                           ⇤           ⇤ +
  ML estimator: x(y) =
                 ˆ             (       ) y.


 Theorem:     [Eldar 09, Pesquet al. 09, Vonesh et al. 08]
         Ew (GSURE (y)) = Ew (||⇧(x0                    x (y))||2 )
Generalized SURE
Problem: || x0     x (y)|| poor indicator of ||x0                x (y)||.
Generalized SURE: take into account risk on ker( )?
                                                          ⇤ +
  GSURE (y) = ||ˆ(y)
                x         x (y)||2           2
                                                 tr((      ) )+2      2
                                                                          gdf (y)
                                             ⇤ +
Generalized df:   gdf (y) = tr (             ) @µ (y)
  Projker( )? = ⇧ = ⇤ (        ⇤ +
                                   )
                           ⇤           ⇤ +
  ML estimator: x(y) =
                 ˆ             (       ) y.


 Theorem:     [Eldar 09, Pesquet al. 09, Vonesh et al. 08]
         Ew (GSURE (y)) = Ew (||⇧(x0                    x (y))||2 )

  ! How to compute @µ (y) ?
Overview



• Synthesis vs. Analysis Regularization

• Local Behavior of Sparse Regularization

• SURE Unbiased Risk Estimator

• Numerical Illustrations
Variations and Stability
Observations:   y = x0 + w


Recovery:
                  1
    x (y) 2 argmin || x   y||2 + ||D⇤ x||1   (P (y))
             x2RN 2
Variations and Stability
Observations:   y = x0 + w


Recovery:
                  1
    x (y) 2 argmin || x   y||2 + ||D⇤ x||1   (P (y))
             x2RN 2



    Remark: x (y) not always unique but
            µ (y) = x (y) always unique.
Variations and Stability
Observations:   y = x0 + w


Recovery:
                  1
    x (y) 2 argmin || x     y||2 + ||D⇤ x||1   (P (y))
             x2RN 2



    Remark: x (y) not always unique but
            µ (y) = x (y) always unique.
Questions:
   – When is y ! µ (y) di↵erentiable ?
    – Formula for @µ (y).
TV-1D Polytope
TV-1D ball:     B = {x  ||D x||1 1}         1     1   0    0
                                             0    1     1   0
Displayed in {x  ⇥x, 1⇤ = 0} R3     D   =
                                             0    0    1     1
     x (y) 2 argmin ||D⇤ x||1                 1   0    0    1
              y= x




                                   Copyright c Mael
TV-1D Polytope
TV-1D ball:     B = {x  ||D x||1 1}          1     1   0    0
                                              0    1     1   0
Displayed in {x  ⇥x, 1⇤ = 0} R3     D    =
                                              0    0    1     1
     x (y) 2 argmin ||D⇤ x||1                  1   0    0    1
              y= x
                      {x                            x
                            y=
                                  x}




                                       Copyright c Mael
TV-1D Polytope
TV-1D ball:     B = {x  ||D x||1 1}          1     1   0    0
                                              0    1     1   0
Displayed in {x  ⇥x, 1⇤ = 0} R3     D    =
                                              0    0    1     1
     x (y) 2 argmin ||D⇤ x||1                  1   0    0    1
              y= x
                      {x                            x
                            y=
                                  x}




                                       Copyright c Mael
TV-1D Polytope
TV-1D ball:     B = {x  ||D x||1 1}          1     1   0    0
                                              0    1     1   0
Displayed in {x  ⇥x, 1⇤ = 0} R3     D    =
                                              0    0    1     1
     x (y) 2 argmin ||D⇤ x||1                  1   0    0    1
              y= x
                      {x                            x
                            y=
                                  x}




                                       Copyright c Mael
Union of Subspaces Model
                     1
       x (y) 2 argmin || x     y||2 + ||D⇤ x||1   (P (y))
                x2RN 2

Support of the solution:
      I = {i  (D⇤ x (y))i 6= 0}
     J = Ic
Union of Subspaces Model
                     1
       x (y) 2 argmin || x     y||2 + ||D⇤ x||1       (P (y))
                x2RN 2
                                         x
Support of the solution:
      I = {i  (D⇤ x (y))i 6= 0}
                                                  I
     J = Ic

1-D total variation:   D x = (xi   xi   1 )i
Union of Subspaces Model
                     1
       x (y) 2 argmin || x       y||2 + ||D⇤ x||1       (P (y))
                x2RN 2
                                           x
Support of the solution:
      I = {i  (D⇤ x (y))i 6= 0}
                                                    I
     J = Ic

1-D total variation:    D x = (xi    xi   1 )i


Sub-space model:       GJ = ker(DJ ) = Im(DJ )
Union of Subspaces Model
                     1
       x (y) 2 argmin || x         y||2 + ||D⇤ x||1       (P (y))
                x2RN 2
                                             x
Support of the solution:
      I = {i  (D⇤ x (y))i 6= 0}
                                                      I
     J = Ic

1-D total variation:    D x = (xi      xi   1 )i


Sub-space model:       GJ = ker(DJ ) = Im(DJ )

Local well-posedness:     ker( )     GJ = {0} (HJ )

  Lemma: There exists a solution x? such that (HJ ) holds.
Local Sign Stability
                  1
   x (y) 2 argminx || x      y||2 + ||D⇤ x||1       (P (y))
                  2
 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H.
                                                /

To be understood: there exists a solution with same sign.




                                                          2 1        2
                                                     1
                                                2                        1
                                                             0            2
                                                                     1
                                                    2    1       2
                                                         dim(GJ )
Local Sign Stability
                   1
    x (y) 2 argminx || x     y||2 + ||D⇤ x||1       (P (y))
                   2
 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H.
                                                /

To be understood: there exists a solution with same sign.
                                              ⇤
Linearized problem:                   = sign(DI x (y))
                   1
   x ¯ (¯) = argmin || x
   ˆ y                     y ||2 + ¯ DI x, sI
                           ¯
              x GJ 2



                                                          2 1        2
                                                     1
                                                2                        1
                                                             0            2
                                                                     1
                                                    2    1       2
                                                         dim(GJ )
Local Sign Stability
                   1
    x (y) 2 argminx || x     y||2 + ||D⇤ x||1       (P (y))
                   2
 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H.
                                                /

To be understood: there exists a solution with same sign.
                                              ⇤
Linearized problem:                   = sign(DI x (y))
                   1
   x ¯ (¯) = argmin || x y ||2 + ¯ DI x, sI
   ˆ y                    ¯
              x GJ 2
           = A[J]   y ¯ DI sI
                    ¯
                   1
   A z = argmin || x||2
      [J]
                              x, z                        2 1        2
              x GJ 2                                 1
                                                2                        1
                                                             0            2
                                                                     1
                                                    2    1       2
                                                         dim(GJ )
Local Sign Stability
                   1
    x (y) 2 argminx || x         y||2 + ||D⇤ x||1        (P (y))
                   2
 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H.
                                                /

To be understood: there exists a solution with same sign.
                                                  ⇤
Linearized problem:                       = sign(DI x (y))
                      1
   x ¯ (¯) = argmin || x y ||2 + ¯ DI x, sI
   ˆ y                        ¯
               x GJ 2
           = A[J]     y ¯ DI sI
                      ¯
                      1
    A z = argmin || x||2
      [J]
                                 x, z                          2 1        2
               x GJ 2                                     1
                                                     2                        1
                                                                  0
Theorem:      If (y, ) / H, for (¯, ¯ )
                                 y                                             2
                                                                          1
                                                         2    1
 close to (y, ), x ¯ (¯) is a solution of P ¯ (¯).
                 ˆ y                           y                      2
                                                              dim(GJ )
Local Affine Maps
Local parameterization:   x ¯ (¯) = A[J]
                          ˆ y                    ¯
                                                 y   ¯ A[J] DI sI

Under uniqueness assumption:
  y 7! x (y)
               are piecewise a ne functions.
    7! x (y)


                                           breaking points

 x 0+                                 change of support of D⇤ x (y)

                          x   k
                                  =0
        0   =0                    k
Application to GSURE
                                              ⇤
 For y 2 H, one has locally:
       /                       µ (y) = A[J]       y + cst.

Corollary: Let I = supp(D⇤ x (y)) such that HJ holds.
      df (y) = tr (µ (y)) = dim( A[J] ⇤ )
      df (y) = ||x (y)||0   for D = Id (synthesis)
      gdf (y) = tr(⇧A[J] )
 are unbiased estimators of df and gdf.
Application to GSURE
                                                ⇤
 For y 2 H, one has locally:
       /                         µ (y) = A[J]       y + cst.

Corollary: Let I = supp(D⇤ x (y)) such that HJ holds.
      df (y) = tr (µ (y)) = dim( A[J] ⇤ )
      df (y) = ||x (y)||0   for D = Id (synthesis)
      gdf (y) = tr(⇧A[J] )
 are unbiased estimators of df and gdf.
Trick: tr(A) = Ez (hAz, zi), z ⇠ N (0, IdP ).
Application to GSURE
                                                         ⇤
   For y 2 H, one has locally:
         /                          µ (y) = A[J]             y + cst.

 Corollary: Let I = supp(D⇤ x (y)) such that HJ holds.
       df (y) = tr (µ (y)) = dim( A[J] ⇤ )
       df (y) = ||x (y)||0   for D = Id (synthesis)
       gdf (y) = tr(⇧A[J] )
  are unbiased estimators of df and gdf.
 Trick: tr(A) = Ez (hAz, zi), z ⇠ N (0, IdP ).

 Proposition: gdf (y) = Ez (h⌫(z), + zi), z ⇠ N (0, IdP )
                       ✓ ⇤         ◆✓      ◆ ✓ ⇤ ◆
                                DJ    ⌫(z)         z
     where ⌫(z) solves      ⇤               =
                         DJ      0      ⌫
                                        ˜        0
                         1
                             PK              +
In practice: gdf (y) ⇡   K    k=1 h⌫(zk ),       zk i, zk ⇠ N (0, IdP ).
pressed-sensing using multi-scale wavelet thresholding
                 CS with Wavelets




                                         Quadr
                            1.5
                                                                                                    6
                                                                                        x         106
                                                                                    2.5 x 10
                                        2 RP ⇥N                                     2.5
                                                                                  realization of a random vector.
                                          P 1 N/4
                                            =
         (b) x?(y, ) at the optimal                                     2                   4          6         8         10       12
                                                                                                Regularization parameter λ
                                      D: wavelet TI redundant tight frame.
   Compressed-sensing using multi-scale wavelet thresholding
                                                                                           2




                                                                         Quadratic loss
                                                                                           2




                                                                        Quadratic loss
                                                                    6
                                                                 x 10
                                                    2.5
                 +                                                                                                        Projection Risk
                y
           (c) xM L                                                                                                       GSURE
           (c) xM L                                                                                                       True Risk
                                         Quadratic loss
                                                             2
                                            Quadratic loss


                                                                                          1.5
                   (c) xM L                                                               1.5

                                                    1.5




                                                                                           1
         x x ? (y)the optimal                                1
                                                                        2
                                                                                           14       ?2
                                                                                                     6           8    4    10       12   6
     ?
 (d) x (y, ) at the optimal
             ?
         (d) (y, ) at                                                                                 2               4
                                                                                                Regularization parameter λ               6
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system
                   Anisotropic Total-Variation
                                       ✓ ⇤
                                               DJ
                                                   ◆✓ ◆ ✓ ⇤ ◆
                                                       ⌫            zx 1066
                                            ⇤              =            .
                                          DJ 0         ⌫
                                                       ˜      2.5 0
                                                                                              2.5
 I                                                : vertical sub-sampling.
     In practice, with law of large number, the empirical mean is replaced for the expectation.
 I   The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver
                                                               Finite di↵erences gradient:
 Numerical example                                                        D = [@1 , @2 ]
Super-resolution using (anisotropic) Total-Variation
      Observations y                                                                           2
                                                                                               2
           (a) y




                                                                             Quadratic loss
            (a) y




                                                                             Quadratic loss
                                                                  6
                                                               x 10
                                                    2.5
                                                                                                         Projection Risk
                                                                                                         GSURE
                                                                                                         True Risk
                                        Quadratic loss
                                                           2
                    (a) y
                                          Quadratic loss


                                                                                              1.5
                                                                                              1.5

                                                    1.5




                                                           1                                   1
                                                                                               1
        ?   x (y)
         (b) x?(y, ? at the optimal
                    )                                                 2              4          ?6   8   10        12
Overview


• Synthesis vs. Analysis Regularization

• Local Behavior of Sparse Regularization

• Robustness to Noise

• Numerical Illustrations
Identifiability Criterion
Identifiability criterion of a sign:        we suppose (HJ ) holds
    IC(s) =     min      || sI   u||       (convex ! computable)
              u Ker DJ

      where       = DJ (Id
                     +
                                       A[J] )DI
Identifiability Criterion
Identifiability criterion of a sign:        we suppose (HJ ) holds
    IC(s) =     min      || sI   u||       (convex ! computable)
              u Ker DJ

      where       = DJ (Id
                     +
                                       A[J] )DI


 Discrete 1-D derivative:
     D x = (xi xi 1 )i                                      x
        = Id (denoising)
Identifiability Criterion
Identifiability criterion of a sign:         we suppose (HJ ) holds
    IC(s) =       min     || sI   u||       (convex ! computable)
               u Ker DJ

      where           = DJ (Id
                         +
                                        A[J] )DI


 Discrete 1-D derivative:
     D x = (xi xi 1 )i                                       x
        = Id (denoising)
                                           +1
  IC(s) = ||   J ||                                              J

         sI = sign(DI x)                                sI
                                            1
          J = sI                             IC(sign(D x)) < 1
Robustness to Small Noise
IC(s) =     min      || sI   u||   = DJ (Id
                                      +
                                              A[J] )DI
          u Ker DJ
Robustness to Small Noise
IC(s) =     min      || sI   u||      = DJ (Id
                                         +
                                                        A[J] )DI
          u Ker DJ



 Theorem: If IC(sign(D x0 )) < 1           T = min |(D x0 )i |
                                                 i I
     If ||w||/T is small enough and       ||w||, then
              x0 + A[J] w       A[J] DI sI ,
     is the unique solution of P (y).
Robustness to Small Noise
IC(s) =     min      || sI   u||      = DJ (Id
                                         +
                                                        A[J] )DI
          u Ker DJ



 Theorem: If IC(sign(D x0 )) < 1           T = min |(D x0 )i |
                                                 i I
     If ||w||/T is small enough and       ||w||, then
              x0 + A[J] w       A[J] DI sI ,
     is the unique solution of P (y).


 Theorem: If IC(sign(D x0 )) < 1
         x0 is the unique solution of P0 ( x0 ).
Robustness to Small Noise
IC(s) =     min      || sI   u||      = DJ (Id
                                         +
                                                        A[J] )DI
          u Ker DJ



 Theorem: If IC(sign(D x0 )) < 1           T = min |(D x0 )i |
                                                  i I
     If ||w||/T is small enough and       ||w||, then
              x0 + A[J] w       A[J] DI sI ,
     is the unique solution of P (y).


 Theorem: If IC(sign(D x0 )) < 1
         x0 is the unique solution of P0 ( x0 ).

            When D = Id, results of J.J. Fuchs.
Robustness to Bounded Noise
Robustness criterion: RC(I) = max        min        || pI   u||
                           ||pI ||   1 u ker(DJ )
                                               = IC(p)
Robustness to Bounded Noise
Robustness criterion: RC(I) = max                     min      || pI    u||
                                      ||pI ||   1 u ker(DJ )
                                                            = IC(p)

  Theorem: If RC(I) < 1 for I = Supp(D x0 ), setting
                         cJ
             = ⇥||w||2            with ⇥ > 1
                       1 RC(I)
      P (y) has a unique solution x⇥ GJ and
                ||x0 x ||2 CJ ||w||2

Constants:    cJ = ||DJ
                      +
                                ( A[J]          Id)||2,
                                                      cJ
              CJ = ||A   [J]
                               ||2,2 || ||2,2 +             ||DI ||2,
                                                  1   RC(I)
             When D = Id, results of Tropp (ERC)
Source Condition
Noiseless CNS:    x0 2 argmin ||D⇤ x||1
                        y= x               D↵
(SC↵ )                                              x0
 () 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im(   ⇤




                                                x
                                       )
                                                     ||D⇤ x||1 6 c




                                            y=
Source Condition
Noiseless CNS:     x0 2 argmin ||D⇤ x||1
                            y= x              D↵
(SC↵ )                                                 x0
 () 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im(      ⇤




                                                   x
                                          )
                                                        ||D⇤ x||1 6 c




                                                 y=
Theorem:    If (HJ ), (SC↵ ) and ||↵J ||1 < 1, then
                 ||D⇤ (x?    x0 )|| = O(||w||)
                                   [Grassmair, Inverse Prob., 2011]
Source Condition
Noiseless CNS:       x0 2 argmin ||D⇤ x||1
                              y= x                         D↵
(SC↵ )                                                              x0
 () 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im(             ⇤




                                                                x
                                                 )
                                                                     ||D⇤ x||1 6 c




                                                               y=
Theorem:       If (HJ ), (SC↵ ) and ||↵J ||1 < 1, then
                   ||D⇤ (x?    x0 )|| = O(||w||)
                                         [Grassmair, Inverse Prob., 2011]
                                                           ⇢
                                     ⇤                         ↵J = ⌦sI     u
Proposition:     Let s = sign(D x0 ) and
                                                               ↵I = sI
     Then IC(s) < 1 =) (SC↵ ) and ||↵J ||1 = IC(s).

            IC(s) =       min        || sI           u||
                       u Ker DJ
Overview



• Synthesis vs. Analysis Regularization

• Local Behavior of Sparse Regularization

• Robustness to Noise

• Numerical Illustrations
Example: Total Variation in 1-D
Discrete 1-D derivative:       {GJ  dim GJ = k}
                               x
    D x = (xi    xi   1 )i

Denoising   = Id.
                              Signals with k   1 steps
Example: Total Variation in 1-D
Discrete 1-D derivative:                    {GJ  dim GJ = k}
                                            x
       D x = (xi    xi   1 )i

Denoising       = Id.
                           = sign(DI x)    Signals with k   1 steps
IC(s) = ||   J ||
                         I
                         J =    sign(DI x)


                         x

 +1
                             J
                    I
   1
   IC(sign(D x)) < 1
  Small noise robustness
Example: Total Variation in 1-D
Discrete 1-D derivative:                    {GJ  dim GJ = k}
                                            x
       D x = (xi    xi   1 )i

Denoising       = Id.
                           = sign(DI x)    Signals with k   1 steps
IC(s) = ||   J ||
                         I
                         J =    sign(DI x)

                                                            x
                         x

 +1                                   +1
                             J                                  J
                    I
   1                                   1
   IC(sign(D x)) < 1                       IC(sign(D x)) = 1
  Small noise robustness
Staircasing Example
Case IC= 1:             =           +
                    y       x0              w
                        x (y)
    yi
                               Support recovery.




                    i
Staircasing Example
Case IC= 1:             =            +
                    y       x0               w
                        x (y)
    yi
                                Support recovery.




                    i   x (y)
    yi
                                No support recovery.




                    i
Example: Total Variation in 2-D

Directional derivatives:
  D1 x = (xi,j xi 1,j )i,j       RN
  D2 x = (xi,j   xi,j   1 )i,j   RN

Gradient:
  D x = (D1 x, D2 x)       RN    2
Example: Total Variation in 2-D
                                       IC(s) < 1       IC(s) > 1
                                      x            x
Directional derivatives:
  D1 x = (xi,j xi 1,j )i,j       RN
  D2 x = (xi,j   xi,j   1 )i,j   RN

Gradient:                              I           I

  D x = (D1 x, D2 x)       RN    2



Dual vector:
        I = sign(DI x)
        J =    sign(DI x)
Example: Fused Lasso
                                        {GJ  dim GJ = k}
Total variation and   1
                          hybrid:   x
 D x = (xi xi 1 )i        ( xi )i
Compressed sensing:
                                    Sum of k interval indicators.
    Gaussian 2 RQ⇥N .
Example: Fused Lasso
                                         {GJ  dim GJ = k}
Total variation and   1
                          hybrid:    x
 D x = (xi xi 1 )i        ( xi )i
Compressed sensing:
                                     Sum of k interval indicators.
    Gaussian 2 RQ⇥N .
                     Q
Probabilit´ P (⌘, ", N ) of the even IC< 1.
          e                                   0                  1
⌘                         ⌘
                                                    x0
                                                     ⌘
                                                             ⌘

                                                  Q
                      "                           N
Example: Invariant Haar Analysis
Haar wavelets:              8                                            (j)
                    1       < +1 if 0 6 i < 2j
       (j)
       i     =                 1 if     2j 6 i < 0
                 2⌧ (j+1)   :
                              0 otherwise.
Haar TI analysis:
                                                                         (j)
||D x||1 = j ||x             (j)
                                   ||1 =   j   ||x   (j)
                                                           ||TV
                                                                  2j+1
be written as the sum over scales of the TV semi-norms of
           filtered versions of the signal. As such, it can be understood as
  ◆                    Example: Invariant Haar Analysis
           a sort of multiscale total variation regularization. Apart from a
     1l4 , multiplicative factor, one recovers Total Variation when Jmax =
M                                         8
      Haar wavelets:
           0.
                                          < +1 if 0 6 < 2j
              We consider a noiseless convolution setting (for Ni= 256)
                                                                                               (j)
                      is =
                                   1
           where (j) a circular convolution operator with a2j 6 i < 0
                                                                   Gaussian
                    i                             1 if
           kernel of standard deviation :We first study the impact of
                             2⌧ (j+1) . 0 otherwise.
           on the identifiability criterion IC. The blurred signal x⌘ is a
 plays
      Haar TI analysis:
           centered boxcar signal with a support of size 2⌘N
                                                                                               (j)
worth
      ||D x||1 = j ||x                        ||1 =
                                             (j)
                                                           j ⇤||x
                  x⌘ = 1{bN/2 ⌘N c,...,bN/2+⌘N c} , ⌘ 2 (0, 1/2] .          (j)
                                                                             ||TV
           Figure 3 displays the evolution of IC(sign(DH x0 ) as a
           function of for three dictionaries (⌧ = 1, ⌧ = 0.5 and the
      De-blurring:                    x = ' ? x,             '(t) / e
                                                                                   t2
                                                                                  2 2
                                                                                        2j+1
           Total Variation), where we fixed ⌘ = 0.2. In the identifiability


       i                                                    ⌧ =1
                                                        ⌧ = 1/2                    x0
       IC
                                                                TV
  2

  > 0
 x? as
 nates
Conclusion

SURE risk estimation:
  Computing risk estimates
  () Sensitivity analysis.
Conclusion

SURE risk estimation:
  Computing risk estimates
  () Sensitivity analysis.

Open problem:
  Fast algorithms to optimize .




                                  SURE
                                         ?
Conclusion

SURE risk estimation:
  Computing risk estimates
  () Sensitivity analysis.

Open problem:
  Fast algorithms to optimize .


Analysis vs. synthesis regularization:
 Analysis support is less stable.



                                         SURE
                                                ?
Conclusion

SURE risk estimation:
  Computing risk estimates
  () Sensitivity analysis.

Open problem:
  Fast algorithms to optimize .


Analysis vs. synthesis regularization:
 Analysis support is less stable.



                                         SURE
Open problem:
  Stability without support stability.          ?

Contenu connexe

Tendances

Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationrubyyc
 
Lesson 16: Derivatives of Logarithmic and Exponential Functions
Lesson 16: Derivatives of Logarithmic and Exponential FunctionsLesson 16: Derivatives of Logarithmic and Exponential Functions
Lesson 16: Derivatives of Logarithmic and Exponential FunctionsMatthew Leingang
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Alex (Oleksiy) Varfolomiyev
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionGabriel Peyré
 
Introduction to Numerical Methods for Differential Equations
Introduction to Numerical Methods for Differential EquationsIntroduction to Numerical Methods for Differential Equations
Introduction to Numerical Methods for Differential Equationsmatthew_henderson
 
Derivative Free Optimization
Derivative Free OptimizationDerivative Free Optimization
Derivative Free OptimizationOlivier Teytaud
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
Application of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scoresApplication of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scoresAlexander Decker
 
11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scores11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scoresAlexander Decker
 
Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1Dr. Nirav Vyas
 
Gaussseidelsor
GaussseidelsorGaussseidelsor
Gaussseidelsoruis
 
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...SSA KPI
 
2003 Ames.Models
2003 Ames.Models2003 Ames.Models
2003 Ames.Modelspinchung
 

Tendances (20)

Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Sect2 4
Sect2 4Sect2 4
Sect2 4
 
Lesson 16: Derivatives of Logarithmic and Exponential Functions
Lesson 16: Derivatives of Logarithmic and Exponential FunctionsLesson 16: Derivatives of Logarithmic and Exponential Functions
Lesson 16: Derivatives of Logarithmic and Exponential Functions
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
Digfilt
DigfiltDigfilt
Digfilt
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
確率伝播
確率伝播確率伝播
確率伝播
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : Multiresolution
 
Digital fiiter
Digital fiiterDigital fiiter
Digital fiiter
 
Introduction to Numerical Methods for Differential Equations
Introduction to Numerical Methods for Differential EquationsIntroduction to Numerical Methods for Differential Equations
Introduction to Numerical Methods for Differential Equations
 
Derivative Free Optimization
Derivative Free OptimizationDerivative Free Optimization
Derivative Free Optimization
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Application of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scoresApplication of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scores
 
11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scores11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scores
 
03 finding roots
03 finding roots03 finding roots
03 finding roots
 
Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1
 
Gaussseidelsor
GaussseidelsorGaussseidelsor
Gaussseidelsor
 
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
On Foundations of Parameter Estimation for Generalized Partial Linear Models ...
 
2003 Ames.Models
2003 Ames.Models2003 Ames.Models
2003 Ames.Models
 

En vedette

D03403018026
D03403018026D03403018026
D03403018026theijes
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursGabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : FourierGabriel Peyré
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsGabriel Peyré
 
Signal Processing Course : Approximation
Signal Processing Course : ApproximationSignal Processing Course : Approximation
Signal Processing Course : ApproximationGabriel Peyré
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationGabriel Peyré
 

En vedette (8)

D03403018026
D03403018026D03403018026
D03403018026
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active Contours
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : Fourier
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
 
Signal Processing Course : Approximation
Signal Processing Course : ApproximationSignal Processing Course : Approximation
Signal Processing Course : Approximation
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems Regularization
 

Similaire à Robust Sparse Analysis Recovery

Normal density and discreminant analysis
Normal density and discreminant analysisNormal density and discreminant analysis
Normal density and discreminant analysisVARUN KUMAR
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualStéphane Canu
 
Estimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and ClimateEstimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and Climatemodons
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster AnalysisSSA KPI
 
Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPSSA KPI
 
Algebric Functions.pdf
Algebric Functions.pdfAlgebric Functions.pdf
Algebric Functions.pdfMamadArip
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodTasuku Soma
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
Basic deep learning & Deep learning application to medicine
Basic deep learning & Deep learning application to medicineBasic deep learning & Deep learning application to medicine
Basic deep learning & Deep learning application to medicineHongyoon Choi
 
數學測試
數學測試數學測試
數學測試s9007912
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalStéphane Canu
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrixRumah Belajar
 

Similaire à Robust Sparse Analysis Recovery (20)

Normal density and discreminant analysis
Normal density and discreminant analysisNormal density and discreminant analysis
Normal density and discreminant analysis
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
Estimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and ClimateEstimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and Climate
 
Symmetrical2
Symmetrical2Symmetrical2
Symmetrical2
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLP
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
Algebric Functions.pdf
Algebric Functions.pdfAlgebric Functions.pdf
Algebric Functions.pdf
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares Method
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
Basic deep learning & Deep learning application to medicine
Basic deep learning & Deep learning application to medicineBasic deep learning & Deep learning application to medicine
Basic deep learning & Deep learning application to medicine
 
數學測試
數學測試數學測試
數學測試
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrix
 

Plus de Gabriel Peyré

Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse RepresentationGabriel Peyré
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image ProcessingGabriel Peyré
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationGabriel Peyré
 
Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : IntroductionGabriel Peyré
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsGabriel Peyré
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingGabriel Peyré
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusGabriel Peyré
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoveryGabriel Peyré
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseGabriel Peyré
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesGabriel Peyré
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : DenoisingGabriel Peyré
 
Signal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingSignal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingGabriel Peyré
 
Signal Processing Course : Wavelets
Signal Processing Course : WaveletsSignal Processing Course : Wavelets
Signal Processing Course : WaveletsGabriel Peyré
 
Sparsity and Compressed Sensing
Sparsity and Compressed SensingSparsity and Compressed Sensing
Sparsity and Compressed SensingGabriel Peyré
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesGabriel Peyré
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal TransportGabriel Peyré
 

Plus de Gabriel Peyré (17)

Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse Representation
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh Parameterization
 
Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : Introduction
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : Geodesics
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse Recovery
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the Course
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal Bases
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : Denoising
 
Signal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingSignal Processing Course : Compressed Sensing
Signal Processing Course : Compressed Sensing
 
Signal Processing Course : Wavelets
Signal Processing Course : WaveletsSignal Processing Course : Wavelets
Signal Processing Course : Wavelets
 
Sparsity and Compressed Sensing
Sparsity and Compressed SensingSparsity and Compressed Sensing
Sparsity and Compressed Sensing
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 

Robust Sparse Analysis Recovery

  • 1. Robust Sparse Analysis Recovery Joint work with: Gabriel Peyré Samuel Vaiter Charles Dossal Jalal Fadili www.numerical-tours.com
  • 2. Overview • Synthesis vs. Analysis Regularization • Risk Estimation • Local Behavior of Sparse Regularization • Robustness to Noise • Numerical Illustrations
  • 3. Inverse Problems Recovering x0 RN from noisy observations y = x0 + w RP
  • 4. Inverse Problems Recovering x0 RN from noisy observations y = x0 + w RP Examples: Inpainting, super-resolution, compressed-sensing x0
  • 5. Inverse Problems Recovering x0 RN from noisy observations y = x0 + w RP Examples: Inpainting, super-resolution, compressed-sensing x0 Regularized inversion: 1 x (y) 2 argmin ||y x||2 + J(x) x2RN 2 Data fidelity Regularity
  • 6. Sparse Regularizations Synthesis regularization Coe cients Image x = 1 min ||y ⇥ ||2 + ⇥|| ||1 RQ 2 2
  • 7. Sparse Regularizations Synthesis regularization Analysis regularization D Coe cients Image x = Image x Correlations 1 1 =D x min ||y ⇥ ||2 + ⇥|| ||1 min ||y x||2 + ||D x||1 RQ 2 2 x RN 2 2
  • 8. Sparse Regularizations Synthesis regularization Analysis regularization D Coe cients Image x = Image x Correlations 1 1 =D x min ||y ⇥ ||2 + ⇥|| ||1 min ||y x||2 + ||D x||1 RQ 2 2 x RN 2 2 =x D⇤ x = 6= 0
  • 9. Sparse Regularizations Synthesis regularization Analysis regularization D Coe cients Image x = Image x Correlations 1 1 =D x min ||y ⇥ ||2 + ⇥|| ||1 min ||y x||2 + ||D x||1 RQ 2 2 x RN 2 2 =x D⇤ x = 6= 0 Unless D = is orthogonal, produces di⇥erent results.
  • 10. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) 0+ x2RN 2 x0+ (y) 2 argmin ||D⇤ x||1 (no noise) (P0 (y)) x=y
  • 11. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) 0+ x2RN 2 x0+ (y) 2 argmin ||D⇤ x||1 (no noise) (P0 (y)) x=y Questions: – Behavior of x (y) with respect to y and . ! Application: risk estimation (SURE, GCV, etc.)
  • 12. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) 0+ x2RN 2 x0+ (y) 2 argmin ||D⇤ x||1 (no noise) (P0 (y)) x=y Questions: – Behavior of x (y) with respect to y and . ! Application: risk estimation (SURE, GCV, etc.) – Criteria to ensure ||x (y) x0 || 6 C||w|| (with “reasonable” C)
  • 13. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) 0+ x2RN 2 x0+ (y) 2 argmin ||D⇤ x||1 (no noise) (P0 (y)) x=y Questions: – Behavior of x (y) with respect to y and . ! Application: risk estimation (SURE, GCV, etc.) – Criteria to ensure ||x (y) x0 || 6 C||w|| (with “reasonable” C) Synthesis case (D = Id): works of Fuchs and Tropp. Analysis case: [Nam et al. 2011] for w = 0.
  • 14. Overview • Inverse Problems • Stein Unbiased Risk Estimators • Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  • 15. How to choose the value of the parameter ? Risk Minimization Risk-based selection of I RiskAveragetorisk: associated : measure of ) expected quality of x 2 ) R( the= Ew (||x (y) x?(y, 0)||wrt x , 0 ? ||x?(y, ) x0||2 . R( ) = EwPlugin-estimator: (y) = argmin R( ) x ? (y) (y) I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0. Can we estimate the risk solely from x?(y, )?
  • 16. How to choose the value of the parameter ? Risk Minimization Risk-based selection of I RiskAveragetorisk: associated : measure of ) expected quality of x 2 ) R( the= Ew (||x (y) x?(y, 0)||wrt x , 0 ? ||x?(y, ) x0||2 . R( ) = EwPlugin-estimator: (y) = argmin R( ) x ? (y) (y) I The optimal (theoretical) minimizes the risk. Ew is The risk is unknown ! usedepends on x0. not accessible since it one observation. But: Can we estimate the risk solely from x?(y, )?
  • 17. How to choose the value of the parameter ? Risk Minimization Risk-based selection of I RiskAveragetorisk: associated : measure of ) expected quality of x 2 ) R( the= Ew (||x (y) x?(y, 0)||wrt x , 0 ? ||x?(y, ) x0||2 . R( ) = EwPlugin-estimator: (y) = argmin R( ) x ? (y) (y) I The optimal (theoretical) minimizes the risk. Ew is The risk is unknown ! usedepends on x0. not accessible since it one observation. But: x0 isCan weaccessible ! needs risk?(y, )? not estimate the risk solely from x estimators.
  • 18. Prediction Risk Estimation Prediction: µ (y) = x (y) Sensitivity analysis: if µ is weakly di↵erentiable µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 )
  • 19. Prediction Risk Estimation Prediction: µ (y) = x (y) Sensitivity analysis: if µ is weakly di↵erentiable µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 ) Stein Unbiased Risk Estimator: SURE (y) = ||y µ (y)||2 2 P +2 2 df (y) df (y) = tr(@µ (y)) = div(µ )(y)
  • 20. Prediction Risk Estimation Prediction: µ (y) = x (y) Sensitivity analysis: if µ is weakly di↵erentiable µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 ) Stein Unbiased Risk Estimator: SURE (y) = ||y µ (y)||2 2 P +2 2 df (y) df (y) = tr(@µ (y)) = div(µ )(y) Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 µ (y)||2 )
  • 21. Prediction Risk Estimation Prediction: µ (y) = x (y) Sensitivity analysis: if µ is weakly di↵erentiable µ (y + ) = µ (y) + @µ (y) · + O(|| ||2 ) Stein Unbiased Risk Estimator: SURE (y) = ||y µ (y)||2 2 P +2 2 df (y) df (y) = tr(@µ (y)) = div(µ )(y) Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 µ (y)||2 ) Other estimators: GCV, BIC, AIC, . . . Requires (not always available) SURE: Unbiased and good practical performances
  • 22. Generalized SURE Problem: || x0 x (y)|| poor indicator of ||x0 x (y)||. Generalized SURE: take into account risk on ker( )?
  • 23. Generalized SURE Problem: || x0 x (y)|| poor indicator of ||x0 x (y)||. Generalized SURE: take into account risk on ker( )? ⇤ + GSURE (y) = ||ˆ(y) x x (y)||2 2 tr(( ) )+2 2 gdf (y) ⇤ + Generalized df: gdf (y) = tr ( ) @µ (y) Projker( )? = ⇧ = ⇤ ( ⇤ + ) ⇤ ⇤ + ML estimator: x(y) = ˆ ( ) y.
  • 24. Generalized SURE Problem: || x0 x (y)|| poor indicator of ||x0 x (y)||. Generalized SURE: take into account risk on ker( )? ⇤ + GSURE (y) = ||ˆ(y) x x (y)||2 2 tr(( ) )+2 2 gdf (y) ⇤ + Generalized df: gdf (y) = tr ( ) @µ (y) Projker( )? = ⇧ = ⇤ ( ⇤ + ) ⇤ ⇤ + ML estimator: x(y) = ˆ ( ) y. Theorem: [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Ew (GSURE (y)) = Ew (||⇧(x0 x (y))||2 )
  • 25. Generalized SURE Problem: || x0 x (y)|| poor indicator of ||x0 x (y)||. Generalized SURE: take into account risk on ker( )? ⇤ + GSURE (y) = ||ˆ(y) x x (y)||2 2 tr(( ) )+2 2 gdf (y) ⇤ + Generalized df: gdf (y) = tr ( ) @µ (y) Projker( )? = ⇧ = ⇤ ( ⇤ + ) ⇤ ⇤ + ML estimator: x(y) = ˆ ( ) y. Theorem: [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Ew (GSURE (y)) = Ew (||⇧(x0 x (y))||2 ) ! How to compute @µ (y) ?
  • 26. Overview • Synthesis vs. Analysis Regularization • Local Behavior of Sparse Regularization • SURE Unbiased Risk Estimator • Numerical Illustrations
  • 27. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2
  • 28. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2 Remark: x (y) not always unique but µ (y) = x (y) always unique.
  • 29. Variations and Stability Observations: y = x0 + w Recovery: 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2 Remark: x (y) not always unique but µ (y) = x (y) always unique. Questions: – When is y ! µ (y) di↵erentiable ? – Formula for @µ (y).
  • 30. TV-1D Polytope TV-1D ball: B = {x ||D x||1 1} 1 1 0 0 0 1 1 0 Displayed in {x ⇥x, 1⇤ = 0} R3 D = 0 0 1 1 x (y) 2 argmin ||D⇤ x||1 1 0 0 1 y= x Copyright c Mael
  • 31. TV-1D Polytope TV-1D ball: B = {x ||D x||1 1} 1 1 0 0 0 1 1 0 Displayed in {x ⇥x, 1⇤ = 0} R3 D = 0 0 1 1 x (y) 2 argmin ||D⇤ x||1 1 0 0 1 y= x {x x y= x} Copyright c Mael
  • 32. TV-1D Polytope TV-1D ball: B = {x ||D x||1 1} 1 1 0 0 0 1 1 0 Displayed in {x ⇥x, 1⇤ = 0} R3 D = 0 0 1 1 x (y) 2 argmin ||D⇤ x||1 1 0 0 1 y= x {x x y= x} Copyright c Mael
  • 33. TV-1D Polytope TV-1D ball: B = {x ||D x||1 1} 1 1 0 0 0 1 1 0 Displayed in {x ⇥x, 1⇤ = 0} R3 D = 0 0 1 1 x (y) 2 argmin ||D⇤ x||1 1 0 0 1 y= x {x x y= x} Copyright c Mael
  • 34. Union of Subspaces Model 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2 Support of the solution: I = {i (D⇤ x (y))i 6= 0} J = Ic
  • 35. Union of Subspaces Model 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2 x Support of the solution: I = {i (D⇤ x (y))i 6= 0} I J = Ic 1-D total variation: D x = (xi xi 1 )i
  • 36. Union of Subspaces Model 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2 x Support of the solution: I = {i (D⇤ x (y))i 6= 0} I J = Ic 1-D total variation: D x = (xi xi 1 )i Sub-space model: GJ = ker(DJ ) = Im(DJ )
  • 37. Union of Subspaces Model 1 x (y) 2 argmin || x y||2 + ||D⇤ x||1 (P (y)) x2RN 2 x Support of the solution: I = {i (D⇤ x (y))i 6= 0} I J = Ic 1-D total variation: D x = (xi xi 1 )i Sub-space model: GJ = ker(DJ ) = Im(DJ ) Local well-posedness: ker( ) GJ = {0} (HJ ) Lemma: There exists a solution x? such that (HJ ) holds.
  • 38. Local Sign Stability 1 x (y) 2 argminx || x y||2 + ||D⇤ x||1 (P (y)) 2 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H. / To be understood: there exists a solution with same sign. 2 1 2 1 2 1 0 2 1 2 1 2 dim(GJ )
  • 39. Local Sign Stability 1 x (y) 2 argminx || x y||2 + ||D⇤ x||1 (P (y)) 2 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H. / To be understood: there exists a solution with same sign. ⇤ Linearized problem: = sign(DI x (y)) 1 x ¯ (¯) = argmin || x ˆ y y ||2 + ¯ DI x, sI ¯ x GJ 2 2 1 2 1 2 1 0 2 1 2 1 2 dim(GJ )
  • 40. Local Sign Stability 1 x (y) 2 argminx || x y||2 + ||D⇤ x||1 (P (y)) 2 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H. / To be understood: there exists a solution with same sign. ⇤ Linearized problem: = sign(DI x (y)) 1 x ¯ (¯) = argmin || x y ||2 + ¯ DI x, sI ˆ y ¯ x GJ 2 = A[J] y ¯ DI sI ¯ 1 A z = argmin || x||2 [J] x, z 2 1 2 x GJ 2 1 2 1 0 2 1 2 1 2 dim(GJ )
  • 41. Local Sign Stability 1 x (y) 2 argminx || x y||2 + ||D⇤ x||1 (P (y)) 2 Lemma: sign(D⇤ x (y)) is constant around (y, ) 2 H. / To be understood: there exists a solution with same sign. ⇤ Linearized problem: = sign(DI x (y)) 1 x ¯ (¯) = argmin || x y ||2 + ¯ DI x, sI ˆ y ¯ x GJ 2 = A[J] y ¯ DI sI ¯ 1 A z = argmin || x||2 [J] x, z 2 1 2 x GJ 2 1 2 1 0 Theorem: If (y, ) / H, for (¯, ¯ ) y 2 1 2 1 close to (y, ), x ¯ (¯) is a solution of P ¯ (¯). ˆ y y 2 dim(GJ )
  • 42. Local Affine Maps Local parameterization: x ¯ (¯) = A[J] ˆ y ¯ y ¯ A[J] DI sI Under uniqueness assumption: y 7! x (y) are piecewise a ne functions. 7! x (y) breaking points x 0+ change of support of D⇤ x (y) x k =0 0 =0 k
  • 43. Application to GSURE ⇤ For y 2 H, one has locally: / µ (y) = A[J] y + cst. Corollary: Let I = supp(D⇤ x (y)) such that HJ holds. df (y) = tr (µ (y)) = dim( A[J] ⇤ ) df (y) = ||x (y)||0 for D = Id (synthesis) gdf (y) = tr(⇧A[J] ) are unbiased estimators of df and gdf.
  • 44. Application to GSURE ⇤ For y 2 H, one has locally: / µ (y) = A[J] y + cst. Corollary: Let I = supp(D⇤ x (y)) such that HJ holds. df (y) = tr (µ (y)) = dim( A[J] ⇤ ) df (y) = ||x (y)||0 for D = Id (synthesis) gdf (y) = tr(⇧A[J] ) are unbiased estimators of df and gdf. Trick: tr(A) = Ez (hAz, zi), z ⇠ N (0, IdP ).
  • 45. Application to GSURE ⇤ For y 2 H, one has locally: / µ (y) = A[J] y + cst. Corollary: Let I = supp(D⇤ x (y)) such that HJ holds. df (y) = tr (µ (y)) = dim( A[J] ⇤ ) df (y) = ||x (y)||0 for D = Id (synthesis) gdf (y) = tr(⇧A[J] ) are unbiased estimators of df and gdf. Trick: tr(A) = Ez (hAz, zi), z ⇠ N (0, IdP ). Proposition: gdf (y) = Ez (h⌫(z), + zi), z ⇠ N (0, IdP ) ✓ ⇤ ◆✓ ◆ ✓ ⇤ ◆ DJ ⌫(z) z where ⌫(z) solves ⇤ = DJ 0 ⌫ ˜ 0 1 PK + In practice: gdf (y) ⇡ K k=1 h⌫(zk ), zk i, zk ⇠ N (0, IdP ).
  • 46. pressed-sensing using multi-scale wavelet thresholding CS with Wavelets Quadr 1.5 6 x 106 2.5 x 10 2 RP ⇥N 2.5 realization of a random vector. P 1 N/4 = (b) x?(y, ) at the optimal 2 4 6 8 10 12 Regularization parameter λ D: wavelet TI redundant tight frame. Compressed-sensing using multi-scale wavelet thresholding 2 Quadratic loss 2 Quadratic loss 6 x 10 2.5 + Projection Risk y (c) xM L GSURE (c) xM L True Risk Quadratic loss 2 Quadratic loss 1.5 (c) xM L 1.5 1.5 1 x x ? (y)the optimal 1 2 14 ?2 6 8 4 10 12 6 ? (d) x (y, ) at the optimal ? (d) (y, ) at 2 4 Regularization parameter λ 6
  • 47. where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system Anisotropic Total-Variation ✓ ⇤ DJ ◆✓ ◆ ✓ ⇤ ◆ ⌫ zx 1066 ⇤ = . DJ 0 ⌫ ˜ 2.5 0 2.5 I : vertical sub-sampling. In practice, with law of large number, the empirical mean is replaced for the expectation. I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver Finite di↵erences gradient: Numerical example D = [@1 , @2 ] Super-resolution using (anisotropic) Total-Variation Observations y 2 2 (a) y Quadratic loss (a) y Quadratic loss 6 x 10 2.5 Projection Risk GSURE True Risk Quadratic loss 2 (a) y Quadratic loss 1.5 1.5 1.5 1 1 1 ? x (y) (b) x?(y, ? at the optimal ) 2 4 ?6 8 10 12
  • 48. Overview • Synthesis vs. Analysis Regularization • Local Behavior of Sparse Regularization • Robustness to Noise • Numerical Illustrations
  • 49. Identifiability Criterion Identifiability criterion of a sign: we suppose (HJ ) holds IC(s) = min || sI u|| (convex ! computable) u Ker DJ where = DJ (Id + A[J] )DI
  • 50. Identifiability Criterion Identifiability criterion of a sign: we suppose (HJ ) holds IC(s) = min || sI u|| (convex ! computable) u Ker DJ where = DJ (Id + A[J] )DI Discrete 1-D derivative: D x = (xi xi 1 )i x = Id (denoising)
  • 51. Identifiability Criterion Identifiability criterion of a sign: we suppose (HJ ) holds IC(s) = min || sI u|| (convex ! computable) u Ker DJ where = DJ (Id + A[J] )DI Discrete 1-D derivative: D x = (xi xi 1 )i x = Id (denoising) +1 IC(s) = || J || J sI = sign(DI x) sI 1 J = sI IC(sign(D x)) < 1
  • 52. Robustness to Small Noise IC(s) = min || sI u|| = DJ (Id + A[J] )DI u Ker DJ
  • 53. Robustness to Small Noise IC(s) = min || sI u|| = DJ (Id + A[J] )DI u Ker DJ Theorem: If IC(sign(D x0 )) < 1 T = min |(D x0 )i | i I If ||w||/T is small enough and ||w||, then x0 + A[J] w A[J] DI sI , is the unique solution of P (y).
  • 54. Robustness to Small Noise IC(s) = min || sI u|| = DJ (Id + A[J] )DI u Ker DJ Theorem: If IC(sign(D x0 )) < 1 T = min |(D x0 )i | i I If ||w||/T is small enough and ||w||, then x0 + A[J] w A[J] DI sI , is the unique solution of P (y). Theorem: If IC(sign(D x0 )) < 1 x0 is the unique solution of P0 ( x0 ).
  • 55. Robustness to Small Noise IC(s) = min || sI u|| = DJ (Id + A[J] )DI u Ker DJ Theorem: If IC(sign(D x0 )) < 1 T = min |(D x0 )i | i I If ||w||/T is small enough and ||w||, then x0 + A[J] w A[J] DI sI , is the unique solution of P (y). Theorem: If IC(sign(D x0 )) < 1 x0 is the unique solution of P0 ( x0 ). When D = Id, results of J.J. Fuchs.
  • 56. Robustness to Bounded Noise Robustness criterion: RC(I) = max min || pI u|| ||pI || 1 u ker(DJ ) = IC(p)
  • 57. Robustness to Bounded Noise Robustness criterion: RC(I) = max min || pI u|| ||pI || 1 u ker(DJ ) = IC(p) Theorem: If RC(I) < 1 for I = Supp(D x0 ), setting cJ = ⇥||w||2 with ⇥ > 1 1 RC(I) P (y) has a unique solution x⇥ GJ and ||x0 x ||2 CJ ||w||2 Constants: cJ = ||DJ + ( A[J] Id)||2, cJ CJ = ||A [J] ||2,2 || ||2,2 + ||DI ||2, 1 RC(I) When D = Id, results of Tropp (ERC)
  • 58. Source Condition Noiseless CNS: x0 2 argmin ||D⇤ x||1 y= x D↵ (SC↵ ) x0 () 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im( ⇤ x ) ||D⇤ x||1 6 c y=
  • 59. Source Condition Noiseless CNS: x0 2 argmin ||D⇤ x||1 y= x D↵ (SC↵ ) x0 () 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im( ⇤ x ) ||D⇤ x||1 6 c y= Theorem: If (HJ ), (SC↵ ) and ||↵J ||1 < 1, then ||D⇤ (x? x0 )|| = O(||w||) [Grassmair, Inverse Prob., 2011]
  • 60. Source Condition Noiseless CNS: x0 2 argmin ||D⇤ x||1 y= x D↵ (SC↵ ) x0 () 9↵ 2 @||D⇤ x0 ||1 , D↵ 2 Im( ⇤ x ) ||D⇤ x||1 6 c y= Theorem: If (HJ ), (SC↵ ) and ||↵J ||1 < 1, then ||D⇤ (x? x0 )|| = O(||w||) [Grassmair, Inverse Prob., 2011] ⇢ ⇤ ↵J = ⌦sI u Proposition: Let s = sign(D x0 ) and ↵I = sI Then IC(s) < 1 =) (SC↵ ) and ||↵J ||1 = IC(s). IC(s) = min || sI u|| u Ker DJ
  • 61. Overview • Synthesis vs. Analysis Regularization • Local Behavior of Sparse Regularization • Robustness to Noise • Numerical Illustrations
  • 62. Example: Total Variation in 1-D Discrete 1-D derivative: {GJ dim GJ = k} x D x = (xi xi 1 )i Denoising = Id. Signals with k 1 steps
  • 63. Example: Total Variation in 1-D Discrete 1-D derivative: {GJ dim GJ = k} x D x = (xi xi 1 )i Denoising = Id. = sign(DI x) Signals with k 1 steps IC(s) = || J || I J = sign(DI x) x +1 J I 1 IC(sign(D x)) < 1 Small noise robustness
  • 64. Example: Total Variation in 1-D Discrete 1-D derivative: {GJ dim GJ = k} x D x = (xi xi 1 )i Denoising = Id. = sign(DI x) Signals with k 1 steps IC(s) = || J || I J = sign(DI x) x x +1 +1 J J I 1 1 IC(sign(D x)) < 1 IC(sign(D x)) = 1 Small noise robustness
  • 65. Staircasing Example Case IC= 1: = + y x0 w x (y) yi Support recovery. i
  • 66. Staircasing Example Case IC= 1: = + y x0 w x (y) yi Support recovery. i x (y) yi No support recovery. i
  • 67. Example: Total Variation in 2-D Directional derivatives: D1 x = (xi,j xi 1,j )i,j RN D2 x = (xi,j xi,j 1 )i,j RN Gradient: D x = (D1 x, D2 x) RN 2
  • 68. Example: Total Variation in 2-D IC(s) < 1 IC(s) > 1 x x Directional derivatives: D1 x = (xi,j xi 1,j )i,j RN D2 x = (xi,j xi,j 1 )i,j RN Gradient: I I D x = (D1 x, D2 x) RN 2 Dual vector: I = sign(DI x) J = sign(DI x)
  • 69. Example: Fused Lasso {GJ dim GJ = k} Total variation and 1 hybrid: x D x = (xi xi 1 )i ( xi )i Compressed sensing: Sum of k interval indicators. Gaussian 2 RQ⇥N .
  • 70. Example: Fused Lasso {GJ dim GJ = k} Total variation and 1 hybrid: x D x = (xi xi 1 )i ( xi )i Compressed sensing: Sum of k interval indicators. Gaussian 2 RQ⇥N . Q Probabilit´ P (⌘, ", N ) of the even IC< 1. e 0 1 ⌘ ⌘ x0 ⌘ ⌘ Q " N
  • 71. Example: Invariant Haar Analysis Haar wavelets: 8 (j) 1 < +1 if 0 6 i < 2j (j) i = 1 if 2j 6 i < 0 2⌧ (j+1) : 0 otherwise. Haar TI analysis: (j) ||D x||1 = j ||x (j) ||1 = j ||x (j) ||TV 2j+1
  • 72. be written as the sum over scales of the TV semi-norms of filtered versions of the signal. As such, it can be understood as ◆ Example: Invariant Haar Analysis a sort of multiscale total variation regularization. Apart from a 1l4 , multiplicative factor, one recovers Total Variation when Jmax = M 8 Haar wavelets: 0. < +1 if 0 6 < 2j We consider a noiseless convolution setting (for Ni= 256) (j) is = 1 where (j) a circular convolution operator with a2j 6 i < 0 Gaussian i 1 if kernel of standard deviation :We first study the impact of 2⌧ (j+1) . 0 otherwise. on the identifiability criterion IC. The blurred signal x⌘ is a plays Haar TI analysis: centered boxcar signal with a support of size 2⌘N (j) worth ||D x||1 = j ||x ||1 = (j) j ⇤||x x⌘ = 1{bN/2 ⌘N c,...,bN/2+⌘N c} , ⌘ 2 (0, 1/2] . (j) ||TV Figure 3 displays the evolution of IC(sign(DH x0 ) as a function of for three dictionaries (⌧ = 1, ⌧ = 0.5 and the De-blurring: x = ' ? x, '(t) / e t2 2 2 2j+1 Total Variation), where we fixed ⌘ = 0.2. In the identifiability i ⌧ =1 ⌧ = 1/2 x0 IC TV 2 > 0 x? as nates
  • 73. Conclusion SURE risk estimation: Computing risk estimates () Sensitivity analysis.
  • 74. Conclusion SURE risk estimation: Computing risk estimates () Sensitivity analysis. Open problem: Fast algorithms to optimize . SURE ?
  • 75. Conclusion SURE risk estimation: Computing risk estimates () Sensitivity analysis. Open problem: Fast algorithms to optimize . Analysis vs. synthesis regularization: Analysis support is less stable. SURE ?
  • 76. Conclusion SURE risk estimation: Computing risk estimates () Sensitivity analysis. Open problem: Fast algorithms to optimize . Analysis vs. synthesis regularization: Analysis support is less stable. SURE Open problem: Stability without support stability. ?