SlideShare une entreprise Scribd logo
1  sur  143
Télécharger pour lire hors ligne
Proximal Splitting
and Optimal Transport
       Gabriel Peyré
    www.numerical-tours.com
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
ork,         Measure Preserving    Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
ans-
 that
eme
 rate
ance
eval
 t al.         µ0                    µ1
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
    Mass preserving map T : Rk    Rk .
ans-
 that µ1 = T µ0 where (T µ0 )(A) = µ0 (T           (A))
                                               1

eme
 rate
ance            x                      T (x)
eval
 t al.         µ0                         µ1
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
    Mass preserving map T : Rk    Rk .
ans-
 that µ1 = T µ0 where (T µ0 )(A) = µ0 (T                  (A))
                                                      1

eme
 rate
ance            x                      T (x)
eval
 t al.         µ0                         µ1


  Distributions with densities:     µi =   i (x)dx

        T µ0 = µ1                 1 (T (x))|det   ⇥T (x)| =      0 (x)
Optimal Transport
Lp optimal transport:
        W2 (µ0 , µ1 )p = min       ||T (x)   x||p µ0 (dx)
                        T µ0 =µ1
Optimal Transport
Lp optimal transport:
           W2 (µ0 , µ1 )p = min     ||T (x)   x||p µ0 (dx)
                         T µ0 =µ1

Regularity condition:
     µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .




       T     T
  µ1
           µ0
Optimal Transport
Lp optimal transport:
           W2 (µ0 , µ1 )p = min       ||T (x)    x||p µ0 (dx)
                         T µ0 =µ1

Regularity condition:
     µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .


Theorem (p = 2): T is defined as T =               with    convex.

       T     T          T (x)
                                  T (x )        T is monotone:
  µ1                   x                   T (x) T (x ), x x        0
           µ0          x
Wasserstein Distance
                                            µ

Couplings:          µ,                  x
     A       Rd , ⇥(A Rd ) = µ(A)   y
     B       Rd , ⇥(Rd B) = (B)
Wasserstein Distance
                                                            µ

Couplings:          µ,                                  x
     A       Rd , ⇥(A Rd ) = µ(A)                 y
     B       Rd , ⇥(Rd B) = (B)
Transportation cost:




       Wp (µ, )p = min                c(x, y)d⇥(x, y)
                         µ,   Rd Rd
Wasserstein Distance
                                                            µ

Couplings:          µ,                                  x
     A       Rd , ⇥(A Rd ) = µ(A)                 y
     B       Rd , ⇥(Rd B) = (B)
Transportation cost:




       Wp (µ, )p = min                c(x, y)d⇥(x, y)
                         µ,   Rd Rd
Optimal Transport
Let p > 1 and µ does not vanish on small sets.

 Unique       µ,   s.t.   Wp (µ, )p =           c(x, y)d⇥(x, y)
                                        Rd Rd

Optimal transport T : Rd     Rd :


                                                             µ

                                                        x
                                                y
                                                     (x, T (x))
Optimal Transport
Let p > 1 and µ does not vanish on small sets.

 Unique        µ,   s.t.    Wp (µ, )p =           c(x, y)d⇥(x, y)
                                          Rd Rd

Optimal transport T : Rd       Rd :


                                                               µ

                                                          x
p = 2: T =      unique solution of                y
       ⇥ is convex l.s.c.                              (x, T (x))
       ( ⇥)⇤µ =
1-D Continuous Wasserstein
Distributions µ,    on R.
                                           t
Cumulative functions:           Cµ (t) =       dµ(x)

For all p > 1:     T =C     1
                                 Cµ
    T is non-decreasing (“change of contrast”)
1-D Continuous Wasserstein
Distributions µ,       on R.
                                                  t
Cumulative functions:                  Cµ (t) =       dµ(x)

For all p > 1:     T =C            1
                                        Cµ
    T is non-decreasing (“change of contrast”)

Explicit formulas:
                       1                                       H
   Wp (µ, )p =             |Cµ 1       C   1 p
                                             |
                   0

  W1 (µ, ) =         |Cµ       C | = ||(Cµ        C ) ⇥ H||1
                 R
Grayscale Histogram Transfer
                                                f1
Input images: fi : [0, 1]
                        2
                            [0, 1], i = 0, 1.




f0
Grayscale Histogram Transfer
                                                         f1
Input images: fi : [0, 1]  2
                                     [0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
      µi ([a, b]) =            1{a   f b} (x)dx
                      [0,1]2
                                                              µ1

f0




            µ0
Grayscale Histogram Transfer
                                                           f1
Input images: fi : [0, 1]  2
                                     [0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
      µi ([a, b]) =            1{a   f b} (x)dx
                      [0,1]2
Optimal transport: T = Cµ11              Cµ0 .                     µ1

f0                         Cµ0 (f0 )                            T (f0 )
                  Cµ0                               Cµ11



            µ0                                                     µ1
pplication to Color Transfer
                Color Histogram Equalization
                                                      1
    Input color images: fi RN 3 . projection iof= to style
                         Sliced Wasserstein
                                                  ⇥ X
                                                      N x
                                                                                                            fi (x)
                         image color statistics Y



                           Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                    f0
                                                                                               Sliced Wasserstein project
                                                                                               image color statistics Y

     f0

                                            Source image after color transfer
      µ1 image (Y )
       Style                                Source image (X )
                                            µ0
                                J. Rabin     Wasserstein Regularization
pplication to Color Transfer
                Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projection iof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                            fi (x)
                        image color statistics Y
   Optimal assignement:         min ||f0 f1 ⇥ ||
                                                          N

                           Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                    f0
                                                                                               Sliced Wasserstein project
                                                                                               image color statistics Y

     f0

                                            Source image after color transfer
      µ1 image (Y )
       Style                                Source image (X )
                                            µ0
                                J. Rabin     Wasserstein Regularization
pplication to Color Transfer
                Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projection iof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                               fi (x)
                        image color statistics Y
   Optimal assignement:         min ||f0 f1 ⇥ ||
                                                             N

   Transport:             T : f0 (x)              R3              f1 ( (i))                   R3
                              Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                      f0
                                                                                                  Sliced Wasserstein project
                                                                                                  image color statistics Y

     f0

                                               Source image after color transfer
      µ1 image (Y )
       Style                                   Source image (X )
                                               µ0
                                T
                                  J. Rabin      Wasserstein Regularization
pplication to Color Transfer
                  Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projection iof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                                                                                fi (x)
                        image color statistics Y
   Optimal assignement:         min ||f0 f1 ⇥ ||
                                                                   N
                                                          Optimal transport framework Sliced Wasserstein projection Applications



   Transport:        T : f0 (x) R3Application to Color Transfer R3
                                          f1 ( (i))
                                 Optimal transport framework Sliced Wasserstein projection Applications


                       ˜ Application to ColorfTransfer
    Equalization:) f0 = T (f0 )               ˜ = f1
                                                0                Sliced Wasserstein projection of X to sty
     Source image (X                                             image color statistics Y


     f1                                         f0                                                                                  T (f0 )
                                                                                                                             Sliced Wasserstein project
                                                                                                                             image color statistics Y
                                                                           Source image (X )
                                                                                                               T
     f0

                                                  Source image after color transfer
       µ1 image (Y )
        Style                                     Source image (X )
                                                  µ0                                Source image after color transfer
                                                                                      µ1
                                                                             Style image (Y )

                                   T                                                                                    J. Rabin   Wasserstein Regularization
                                     J. Rabin      Wasserstein Regularization
cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0:
                                  ðv
    can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring
 v cc
    >                    tem cv cc can be of equations. elliptic                               a sys-
                                                                            the GPU. We used cubic grid. interpolation used a trilineara parallelizable four-
                                                                                                          the GPU. We operator using interpolation operator for transferring

                                    Image Registration
                       Ittem isto verify that a correction for dv can be obtained by solving with an
                          is easy solved using preconditioned conjugate gradient              color Gauss-Seidel relaxation scheme. Thisrestriction
 s solved using preconditioned conjugate À1       gradient with an                                        the coarse grid residual increases robustness
                                                                            the coarse grid correction to fine grids. Thecorrection to fine grids. The residual restriction
                       the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys-   and efficiency and is especially suited for the implementation on
                         incomplete Cholesky preconditioner.
mplete Cholesky preconditioner.         v      v
                                                                            operator for projecting residual from for projecting residual from the fine to coarse grids is
                                                                                                          operator the fine to coarse grids is
                       tem c c> can be thought as an elliptic system of equations. The sys-
                              v c                                                                                the GPU. We used a trilinear interpolation operator for transferring
                        tem is solved using preconditioned conjugate gradient with an                            the coarse grid correction to fine grids. The residual restriction
                        incomplete Cholesky preconditioner.                                                      operator for projecting residual from the fine to coarse grids is




                                                                                                     T




                                                                 [ur Rehman et al, 2009]
                        Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the
                        anterior part of the brain.
Convex Formulation (Benamou-Brenier)
       ⇢
            ⇢ : Rd ⇥ [0, 1] ! R+   solving:
Find
            m : Rd ⇥ [0, 1] ! Rd

           W (µ0 , µ1 )2 = min J(x) + ◆C (x)
                        x=(m,⇢)
Convex Formulation (Benamou-Brenier)
         ⇢
              ⇢ : Rd ⇥ [0, 1] ! R+   solving:
Find
              m : Rd ⇥ [0, 1] ! Rd

             W (µ0 , µ1 )2 = min J(x) + ◆C (x)
                          x=(m,⇢)

         Z      Z   1
J(x) =           j(x(s, t))dtds
        s2Rd t=0
                8 ||m||2
                < ˜˜ ⇢     if ⇢ > 0,
                               ˜
      j(m, ⇢) =
         ˜ ˜
                : 0 if ⇢ = 0 and m = 0,
                            ˜        ˜
                   +1 otherwise.
    2 R 2 R2
Convex Formulation (Benamou-Brenier)
         ⇢
              ⇢ : Rd ⇥ [0, 1] ! R+   solving:
 Find
              m : Rd ⇥ [0, 1] ! Rd

             W (µ0 , µ1 )2 = min J(x) + ◆C (x)
                          x=(m,⇢)

         Z      Z   1
J(x) =            j(x(s, t))dtds
         s2Rd t=0
                 8 ||m||2
                 < ˜˜ ⇢     if ⇢ > 0,
                                ˜
       j(m, ⇢) =
          ˜ ˜
                 : 0 if ⇢ = 0 and m = 0,
                             ˜        ˜
                    +1 otherwise.
     2 R 2 R2
C = {x = (m, ⇢)  div(x) = 0, B(⇢) = (⇢0 , ⇢1 )}
      B(⇢) = (⇢(0, ·), ⇢(1, ·))
Numerical Examples
 ⇢0                  ⇢1




                          t
Numerical Examples
                ⇢0                                         ⇢1
  con-
work,
plica-
 ad of
 ease.
peeds
 rans-
 t that
 heme
 erate
mance
  ieval
 et al.




                                                                       t
          Figure 7: Synthetic 2D examples on a Euclidean domain. The
Discrete Formulation
                                     s
Centered grid formulation (d = 1):
      min J(x) + ◆C (x)
    x2RGc ⇥2
               P
     J(x) =        i2Gc   j(xi )                        t
                                         Centered grid Gc
Discrete Formulation
                                     s
Centered grid formulation (d = 1):
      min J(x) + ◆C (x)
    x2RGc ⇥2
               P
     J(x) =        i2Gc   j(xi )                        t
Staggered grid formulation :             Centered grid Gc
     min 2 J(I(x)) + ◆C (x)          s
      1
  x2RGst ⇥RGst




                                                          t
                                         Staggered grid
                                              1    2
                                             Gst Gst
Discrete Formulation
                                                   s
Centered grid formulation (d = 1):
      min J(x) + ◆C (x)
    x2RGc ⇥2
               P
     J(x) =        i2Gc   j(xi )                                      t
Staggered grid formulation :                           Centered grid Gc
     min 2 J(I(x)) + ◆C (x)                        s
      1
  x2RGst ⇥RGst

Interpolation operator:
                      1             2
                     Gst           Gst
         1     2
  I = (I , I ) : R         ⇥R             ! RG c
                                                                     t
 2I1 (m)i,j = mi+ 1 ,j + mi
                  2
                                         1
                                         2 ,j
                                        Staggered grid
 ! Projection on div(x) = 0 using FFTs.      1    2
                                            Gst Gst
SOCP Formulation
                                              P
         min      J(x) + ◆C (x)      J(x) =       i2Gc   j(xi )
       x2RGc ⇥d
                       X
()        min              ri     s.t.   8 i 2 Gc , (mi , ⇢i , ri ) 2 K
     x2RGc ⇥d ,r2RGc
                       i

(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2  ||m||2 6 ⇢r
                             ˜ ˜ ˜               ˜      ˜˜
SOCP Formulation
                                              P
         min      J(x) + ◆C (x)      J(x) =       i2Gc   j(xi )
       x2RGc ⇥d
                       X
()        min              ri     s.t.   8 i 2 Gc , (mi , ⇢i , ri ) 2 K
     x2RGc ⇥d ,r2RGc
                       i

(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2  ||m||2 6 ⇢r
                             ˜ ˜ ˜               ˜      ˜˜

Second order cone program:
    ! Use interior point methods (e.g. MOSEK software).
         Linear convergence with iteration #.
         Poor scaling with dimension |Gc |.
  E cient for medium scale problems (N ⇠ 104 ).
1
        Example:         Regularization
Inverse problem:   measurements       y = x0 + w
             x0                   y
1
          Example:           Regularization
 Inverse problem:    measurements        y = x0 + w
              x0                     y                  x?
                                         argmin



Regularized inversion:   x? 2 argmin 1 ||y
                                      2      x||2 + R(x)
                               x2R N
                                     Data fidelity Regularity
1
          Example:           Regularization
 Inverse problem:    measurements        y = x0 + w
              x0                     y                 x?
                                         argmin



Regularized inversion:  x? 2 argmin 1 ||y
                                     2      x||2 + R(x)
                              x2R N
                                    Data fidelity Regularity
                           P
Total Variation:    R(x) = i ||(rx)i ||
1
           Example:              Regularization
 Inverse problem:        measurements             y = x0 + w
              x0                              y                        x?
                                                  argmin



Regularized inversion:   x? 2 argmin 1 ||y
                                      2      x||2 + R(x)
                               x2R N
                                     Data fidelity Regularity
                            P
Total Variation:     R(x) = i ||(rx)i ||

 1
                         P                                 ⇤
` sparsity:    R(x) =        i   |xi |
     Images are sparse
     in wavelet bases.                                                  ⇤
                                         Image f =   x     Coe↵. x =        f
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

           Problem:   min G(x)
                      x H
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:    min G(x)
                         x H

Class of functions:                              x         y

  Convex: G(tx + (1     t)y)   tG(x) + (1   t)G(y)   t   [0, 1]
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:    min G(x)
                         x H

Class of functions:                                x         y

  Convex: G(tx + (1     t)y)   tG(x) + (1     t)G(y)   t   [0, 1]

  Lower semi-continuous:       lim inf G(x)   G(x0 )
                               x   x0

  Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
                               ⇤
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:         min G(x)
                              x H

Class of functions:                                     x         y

  Convex: G(tx + (1       t)y)      tG(x) + (1     t)G(y)   t   [0, 1]

  Lower semi-continuous:            lim inf G(x)   G(x0 )
                                    x   x0

  Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
                               ⇤

                                    0 if x ⇥ C,
Indicator:            C (x)   =
                                    +    otherwise.
   (C closed and convex)
Sub-differential

Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)   G(x) + ⌅u, z   x⇧}
                                             G(x) = |x|



                                             G(0) = [ 1, 1]
Sub-differential

Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)       G(x) + ⌅u, z   x⇧}
                                                 G(x) = |x|
Smooth functions:
     If F is C 1 , F (x) = { F (x)}

                                                 G(0) = [ 1, 1]
Sub-differential

Sub-di erential:
        G(x) = {u ⇥ H  ⇤ z, G(z)     G(x) + ⌅u, z   x⇧}
                                                 G(x) = |x|
Smooth functions:
     If F is C 1 , F (x) = { F (x)}

                                                 G(0) = [ 1, 1]
First-order conditions:
    x      argmin G(x)        0     G(x )
            x H
Sub-differential

Sub-di erential:
        G(x) = {u ⇥ H  ⇤ z, G(z)          G(x) + ⌅u, z   x⇧}
                                                      G(x) = |x|
Smooth functions:
     If F is C 1 , F (x) = { F (x)}

                                                      G(0) = [ 1, 1]
First-order conditions:
    x      argmin G(x)            0       G(x )
             x H                                                U (x)

                                                                    x
Monotone operator:        U (x) = G(x)
        (u, v)   U (x)   U (y),       y   x, v    u   0
Prox and Subdifferential
                   1
Prox G (x) = argmin ||x   z||2 + G(z)
                z  2
Prox and Subdifferential
                           1
        Prox G (x) = argmin ||x   z||2 + G(z)
                        z  2
Resolvant of G:
        z = Prox   G (x)          0   z   x + ⇥G(z)
        x   (Id + ⇥G)(z)
Prox and Subdifferential
                           1
        Prox G (x) = argmin ||x          z||2 + G(z)
                        z  2
Resolvant of G:
          z = Prox   G (x)               0     z   x + ⇥G(z)
          x    (Id + ⇥G)(z)              z = (Id + ⇥G)   1
                                                             (x)
Inverse of a set-valued mapping:
          where x    U (y)       y   U   1
                                             (x)
   Prox   G   = (Id + ⇥G)    1
                                 is a single-valued mapping
Prox and Subdifferential
                           1
        Prox G (x) = argmin ||x             z||2 + G(z)
                        z  2
Resolvant of G:
          z = Prox   G (x)                  0     z   x + ⇥G(z)
          x    (Id + ⇥G)(z)                 z = (Id + ⇥G)       1
                                                                    (x)
Inverse of a set-valued mapping:
          where x     U (y)       y     U   1
                                                (x)
   Prox   G   = (Id + ⇥G)     1
                                  is a single-valued mapping
Fix point:     x     argmin G(x)
                       x
               0     G(x )              x       (Id + ⇥G)(x )
               x⇥ = (Id + ⇥G)      1
                                       (x⇥ ) = Prox   G (x⇥ )
Proximal Calculus
Separability:    G(x) = G1 (x1 ) + . . . + Gn (xn )
     ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
Proximal Calculus
Separability:    G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
            =     (Id +       )   1
Proximal Calculus
Separability:     G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
            =      (Id +       )   1


Composition by tight frame: A A = Id
      ProxG     A (x)   =A   ProxG A + Id        A     A
Proximal Calculus
Separability:       G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
              =      (Id +              )   1


Composition by tight frame: A A = Id
      ProxG       A (x)   =A      ProxG A + Id     A         A
                                                                     x
Indicators:        G(x) =       C (x)
                                                         C
    Prox   G (x)   = ProjC (x)                               ProjC (x)
                   = argmin ||x             z||
                          z C
Prox of Sparse Regularizers
                   1
Prox G (x) = argmin ||x   z||2 + G(z)
                z  2
Prox of Sparse Regularizers
                            1
         Prox G (x) = argmin ||x   z||2 + G(z)
                         z  2
G(x) = ||x||1 =        |xi |            12              log(1 + x2 )
                   i                    10
                                                         |x| ||x||0
                                         8


                                         6


                                         4


                                         2




G(x) = ||x||0 = | {i  xi = 0} |         0


                                        −2
                                                                               G(x)
                                        −10   −8   −6    −4   −2   0   2   4    6   8   10




G(x) =        log(1 + |xi |2 )
          i
Prox of Sparse Regularizers
                            1
         Prox G (x) = argmin ||x                 z||2 + G(z)
                         z  2
G(x) = ||x||1 =              |xi |                      12                     log(1 + x2 )
                         i                              10
                                                                                |x| ||x||0
     Prox       G (x)i   = max 0, 1              xi      8



                                         |xi |           6


                                                         4


                                                         2




G(x) = ||x||0 = | {i  xi = 0} |                         0


                                                        −2
                                                                                                                G(x)
                                 xi if |xi |      2 ,   −10     −8        −6        −4        −2    0   2   4       6       8        10




     Prox       G (x)i   =
                                                         10




                                 0 otherwise.
                                                          8


                                                          6


                                                          4


                                                          2




G(x) =           log(1 + |xi |2 )                       −2
                                                          0




            i                                           −4




            3rd order polynomial root.
                                                        −6


                                                        −8
                                                                                                    ProxG (x)
                                                        −10
                                                              −10    −8        −6        −4    −2   0   2   4   6       8       10
Legendre-Fenchel Duality
Legendre-Fenchel transform:
 G (u) =     sup      u, x   G(x)                     eu
           x dom(G)                 G(x)        S lop
                                    G (u)

                                            x
Legendre-Fenchel Duality
Legendre-Fenchel transform:
 G (u) =     sup      u, x   G(x)                     eu
           x dom(G)                 G(x)        S lop
                                    G (u)
Example: quadratic functional
          1                                 x
   G(x) = Ax, x + x, b
          2
           1
   G (u) = u b, A 1 (u b)
           2
Legendre-Fenchel Duality
Legendre-Fenchel transform:
 G (u) =     sup      u, x   G(x)                     eu
           x dom(G)                 G(x)        S lop
                                    G (u)
Example: quadratic functional
          1                                 x
   G(x) = Ax, x + x, b
          2
           1
   G (u) = u b, A 1 (u b)
           2

Moreau’s identity:
      Prox   G   (x) = x     ProxG/ (x/ )
       G simple              G simple
Indicator and Homogeneous Functionals
Positively 1-homogeneous functional:      G( x) = | |G(x)
       Example: norm       G(x) = ||x||

Duality:   G (x) =   G (·) 1 (x)       G (y) = min     x, y
                                              G(x) 1
Indicator and Homogeneous Functionals
Positively 1-homogeneous functional:          G( x) = | |G(x)
        Example: norm          G(x) = ||x||

Duality:      G (x) =   G (·) 1 (x)      G (y) = min       x, y
                                                  G(x) 1
 p
     norms:    G(x) = ||x||p          1 1
                                       + =1       1   p, q    +
              G (x) = ||x||q          p q
Indicator and Homogeneous Functionals
Positively 1-homogeneous functional:                 G( x) = | |G(x)
         Example: norm               G(x) = ||x||

Duality:       G (x) =       G (·) 1 (x)        G (y) = min       x, y
                                                         G(x) 1
 p
     norms:          G(x) = ||x||p          1 1
                                             + =1        1   p, q    +
                  G (x) = ||x||q            p q

Example: Proximal operator of                  norm
     Prox    ||·||    = Id     Proj||·||1

     Proj||·||1       (x)i = max 0, 1               xi
                                            |xi |
         for a well-chosen ⇥ = ⇥ (x, )
Prox of the J Functional
            X                           ||m||2
                                          ˜
J(m, ⇢) =       j(mi , ⇢i )   j(m, ⇢) =
                                ˜ ˜              for ⇢ > 0
                                                     ˜
            i
                                          ⇢˜
Prox of the J Functional
            X                                 ||m||2
                                                ˜
J(m, ⇢) =         j(mi , ⇢i )       j(m, ⇢) =
                                      ˜ ˜              for ⇢ > 0
                                                           ˜
             i
                                                ⇢˜
Prox   J (m, ⇢)   = (Prox j (mi , ⇢i ))i
Prox of the J Functional
             X                                 ||m||2
                                                 ˜
J(m, ⇢) =         j(mi , ⇢i )        j(m, ⇢) =
                                       ˜ ˜              for ⇢ > 0
                                                            ˜
              i
                                                 ⇢˜
Prox   J (m, ⇢)   = (Prox j (mi , ⇢i ))i
 j ⇤ = ◆C    where      C = (a, b) 2 R2 ⇥ R  2||a||2 + b 6 0

Prox j (˜) = x
        x    ˜         ProjC (˜/ )
                              x            where x = (m, ⇢)
                                                 ˜    ˜ ˜
Prox of the J Functional
             X                                 ||m||2
                                                 ˜
J(m, ⇢) =         j(mi , ⇢i )        j(m, ⇢) =
                                       ˜ ˜                  for ⇢ > 0
                                                                ˜
              i
                                                 ⇢˜
Prox   J (m, ⇢)   = (Prox j (mi , ⇢i ))i
 j ⇤ = ◆C    where      C = (a, b) 2 R2 ⇥ R  2||a||2 + b 6 0

Prox j (˜) = x
        x    ˜         ProjC (˜/ )
                              x            where x = (m, ⇢)
                                                 ˜    ˜ ˜

                                      ⇢
                                           (m? , ⇢? ) if ⇢? > 0
 Proposition:        Prox (m, ⇢) =
                           ˜ ˜
                                           (0, 0) otherwise.
             ⇢? m
                ˜
          ?
   where m = ?                   and ⇢? is the largest root of
            ⇢ +2
  X 3 + (4         ⇢)X 2 + 4 (
                   ˜                 ⇢)X
                                     ˜        ||m||2
                                                ˜       4   2
                                                                ⇢=0
                                                                ˜
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )       [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:   If 0 <   < 2/L, x(   )
                                         x a solution.
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )                      [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:   If 0 <     < 2/L, x(        )
                                                x a solution.

Sub-gradient descent: x(   +1)
                                 = x(   )
                                                v( ) ,   v(   )
                                                                    G(x( ) )

     Theorem:   If       1/⇥, x(   )
                                            x a solution.

         Problem: slow.
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )                            [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:    If 0 <     < 2/L, x(           )
                                                      x a solution.

Sub-gradient descent: x(    +1)
                                  = x(      )
                                                      v( ) ,   v(   )
                                                                          G(x( ) )

     Theorem:    If       1/⇥, x(   )
                                                x a solution.

         Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox                  G (x(⇥) ) [implicit]

     Theorem:    If       c > 0, x(     )
                                                    x a solution.

          Prox   G    hard to compute.                  [Rockafellar, 70]
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Splitting:    E(x) = F (x) +            Gi (x)
                                    i
                         Smooth         Simple
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Splitting:    E(x) = F (x) +            Gi (x)
                                    i
                         Smooth         Simple
                                         F (x)
Iterative algorithms using:
                                        Prox Gi (x)
                               solves
   Forward-Backward:                         F + G
   Douglas-Rachford:                                  Gi
   Primal-Dual:                                       Gi A
   Generalized FB:                           F+       Gi
Smooth + Simple Splitting
Inverse problem:    measurements      y = Kf0 + w
    f0                Kf0
                K                     K : RN   RP ,   P   N


Model: f0 =     x0 sparse in dictionary    .
Sparse recovery: f =    x where x solves
              min F (x) + G(x)
             x RN
                Smooth Simple
                        1
Data fidelity:   F (x) = ||y     x||2           =K ⇥
                        2
Regularization: G(x) = ||x||1 =    |xi |
                                  i
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)        0       F (x ) + G(x )
           x
                      (x      F (x ))      x + ⇥G(x )
                       x⇥ = Prox   G (x⇥       F (x⇥ ))
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)           0       F (x ) + G(x )
           x
                        (x       F (x ))      x + ⇥G(x )
                        x⇥ = Prox     G (x⇥       F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox   G   x(⇥)       F (x(⇥) )
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)           0         F (x ) + G(x )
           x
                        (x       F (x ))         x + ⇥G(x )
                         x⇥ = Prox    G (x⇥          F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox   G       x(⇥)      F (x(⇥) )

Projected gradient descent:    G=         C
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)                    0         F (x ) + G(x )
              x
                             (x            F (x ))        x + ⇥G(x )
                             x⇥ = Prox        G (x⇥           F (x⇥ ))

Forward-backward:          x(⇥+1) = Prox       G       x(⇥)      F (x(⇥) )

Projected gradient descent:               G=       C


        Theorem:           Let       F be L-Lipschitz.
         If       < 2/L,    x(   )
                                      x      a solution of ( )

                                          [Passty 79, Gabay, 83]
Example: L1 Regularization

    1
 min || x    y||2 + ||x||1             min F (x) + G(x)
  x 2                                    x

            1
     F (x) = || x      y||2
            2
             F (x) =        ( x   y)                 L = ||   ||

     G(x) = ||x||1
                                               ⇥
            Prox   G (x)i   = max 0, 1                 xi
                                             |xi |


Forward-backward                  Iterative soft thresholding
Convergence Speed

min E(x) = F (x) + G(x)
 x

            F is L-Lipschitz.
           G is simple.

Theorem:    If L > 0, FB iterates x(   )
                                           satisfies
        E(x( ) )   E(x )    C/


      C degrades with L         0.
Multi-steps Accelerations
Beck-Teboule accelerated FB:        t(0) = 1
                     ✓                     ◆
      (`+1)             (`)  1
    x       = Prox1/L y        rF (y (`) )
                             L
             1+     1 + 4(t( ) )2
    t( +1) =
                      2()
                       t      1 (
    y ( +1)
            =x( +1)
                    + ( +1) (x         +1)
                                             x( ) )
                        t

   (see also Nesterov method)

                                                      C
 Theorem:    If L > 0,   E(x ( )
                                   )    E(x )


Complexity theory: optimal in a worse-case sense.
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Douglas Rachford Scheme

                  min G1 (x) + G2 (x)             ( )
                   x
                       Simple         Simple
Douglas-Rachford iterations:

  z (⇥+1) = 1          z (⇥) +       RProx   G2    RProx      G1 (z (⇥) )
                  2              2
  x(⇥+1) = Prox   G2 (z (⇥+1) )

Reflexive prox:    RProx     G (x)    = 2Prox      G (x)   x
Douglas Rachford Scheme

                  min G1 (x) + G2 (x)                 ( )
                   x
                           Simple         Simple
Douglas-Rachford iterations:

  z (⇥+1) = 1              z (⇥) +       RProx   G2    RProx      G1 (z (⇥) )
                  2                  2
  x(⇥+1) = Prox   G2 (z (⇥+1) )

Reflexive prox:    RProx           G (x)   = 2Prox     G (x)   x

       Theorem:        If 0 <        < 2 and ⇥ > 0,
                 x(    )
                              x            a solution of ( )
                                            [Lions, Mercier, 79]
DR Fix Point Equation

min G1 (x) + G2 (x)             0    (G1 + G2 )(x)
 x

     z, z   x    ⇥( G1 )(x) and x         z       ⇥( G2 )(x)

 x = Prox       G1 (z)   and   (2x   z)       x    ⇥( G2 )(x)
DR Fix Point Equation

min G1 (x) + G2 (x)                        0       (G1 + G2 )(x)
 x

     z, z     x    ⇥( G1 )(x) and x                      z       ⇥( G2 )(x)

 x = Prox         G1 (z)        and       (2x     z)         x     ⇥( G2 )(x)
            x = Prox    G2 (2x        z) = Prox          G2      RProx    G1 (z)

            z = 2Prox      G2    RProx         G1 (y)        (2x     z)

            z = 2Prox      G2    RProx         G1 (z)        RProx    G1 (z)

        z = RProx          G2     RProx         G1 (z)

            z= 1            z+            RProx    G2        RProx    G1 (z)
                       2              2
Example: Optimal Transport on Centered Grid
                                     s
 min       J(x) + ◆C (x)
x2RGc ⇥2

   C = {x = (m, ⇢)  Ax = b}    I0                 I1
   b = (0, ⇢0 , ⇢1 )
                                                    t
  A(x) = (div(x), ⇢I0 , ⇢I1 )
                                     Centered grid Gc
Example: Optimal Transport on Centered Grid
                                              s
 min       J(x) + ◆C (x)
x2RGc ⇥2

   C = {x = (m, ⇢)  Ax = b}             I0                 I1
   b = (0, ⇢0 , ⇢1 )
                                                             t
  A(x) = (div(x), ⇢I0 , ⇢I1 )
                                              Centered grid Gc
Prox   J   : cubic root (closed form).
Example: Optimal Transport on Centered Grid
                                                s
 min       J(x) + ◆C (x)
x2RGc ⇥2

   C = {x = (m, ⇢)  Ax = b}               I0                  I1
   b = (0, ⇢0 , ⇢1 )
                                                               t
  A(x) = (div(x), ⇢I0 , ⇢I1 )
                                                Centered grid Gc
Prox   J   : cubic root (closed form).
Prox◆C = ProjC = (Id          A⇤    1
                                        A) + A⇤     1
                                                        y
   1
       = (AA⇤ )   1
                      : solving a Poisson equation with b.c.
Example: Optimal Transport on Centered Grid
                                                    s
     min       J(x) + ◆C (x)
   x2RGc ⇥2

      C = {x = (m, ⇢)  Ax = b}                I0                  I1
      b = (0, ⇢0 , ⇢1 )
                                                                   t
      A(x) = (div(x), ⇢I0 , ⇢I1 )
                                                    Centered grid Gc
   Prox    J   : cubic root (closed form).
   Prox◆C = ProjC = (Id           A⇤    1
                                            A) + A⇤     1
                                                            y
       1
           = (AA⇤ )   1
                          : solving a Poisson equation with b.c.

Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
Example: Optimal Transport on Centered Grid
                                                    s
     min       J(x) + ◆C (x)
   x2RGc ⇥2

      C = {x = (m, ⇢)  Ax = b}                I0                  I1
      b = (0, ⇢0 , ⇢1 )
                                                                   t
      A(x) = (div(x), ⇢I0 , ⇢I1 )
                                                    Centered grid Gc
   Prox    J   : cubic root (closed form).
   Prox◆C = ProjC = (Id           A⇤    1
                                            A) + A⇤     1
                                                            y
       1
           = (AA⇤ )   1
                          : solving a Poisson equation with b.c.

Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]

     ! Advantage: relaxation parameter ↵ 2]0, 1[.
Example: Constrained L1
                  min ||x||1                min G1 (x) + G2 (x)
                  x=y                        x

G1 (x) = iC (x),         C = {x  x = y}
   Prox   G1 (x) = ProjC (x) = x +
                                                 ⇥
                                                     (   ⇥
                                                             )   1
                                                                     (y      x)

G2 (x) = ||x||1       Prox     G2 (x)   =    max 0, 1                        xi
                                                                     |xi |        i
          e⇥cient if            easy to invert.
Example: Constrained L1
                  min ||x||1                min G1 (x) + G2 (x)
                  x=y                        x

G1 (x) = iC (x),         C = {x  x = y}
   Prox   G1 (x) = ProjC (x) = x +
                                                 ⇥
                                                     (       ⇥
                                                                 )     1
                                                                           (y            x)

G2 (x) = ||x||1       Prox     G2 (x)   =    max 0, 1                                    xi
                                                                           |xi |                i
          e⇥cient if            easy to invert.                      log10 (||x( ) ||1          ||x ||1 )
                                                         1

Example: compressed sensing                          −1
                                                         0



       R100   400
                      Gaussian matrix                −2
                                                     −3      = 0.01
  y = x0                ||x0 ||0 = 17                −4      =1
                                                     −5
                                                             = 10
                                                                 50        100     150        200   250
Auxiliary Variables with DR
min G1 (x) + G2 A(x)              Linear map A : E   H.
  x
 min G(z) +    C (z)              G1 , G2 simple.
z⇥H E

      G(x, y) = G1 (x) + G2 (y)
      C = {(x, y) ⇥ H   E  Ax = y}
Auxiliary Variables with DR
        min G1 (x) + G2 A(x)                Linear map A : E   H.
          x
        min G(z) +      C (z)               G1 , G2 simple.
       z⇥H E

              G(x, y) = G1 (x) + G2 (y)
              C = {(x, y) ⇥ H       E  Ax = y}

Prox   G (x, y)   = (Prox   G1 (x), Prox G2 (y))

Prox C (x, y) = (x + A y , y
                       ˜          y ) = (˜, A˜)
                                  ˜      x x

                    y = (Id + AA )
                    ˜                 1
                                          (Ax   y)
       where
                   x = (Id + A A)
                   ˜                  1
                                          (A y + x)
       e cient if Id + AA or Id + A A easy to invert.
Example: TV Regularization
          1                                   ||u||1 =        ||ui ||
      min ||Kf y||2 + ||⇥f ||1
       f  2                                              i
      min G1 (f ) + G2 (f )
        x

G1 (u) = ||u||1      Prox   G1 (u)i    = max 0, 1                      ui
                                                         ||ui ||
         1
G2 (f ) = ||Kf     y||2         Prox        = (Id + K K)           1
                                                                       K
         2                             G2


C = (f, u) ⇥ RN       RN    2
                                 u = ⇤f
                         ˜ ˜
        Prox C (f, u) = (f , f )
Example: TV Regularization
          1                                    ||u||1 =        ||ui ||
      min ||Kf y||2 + ||⇥f ||1
       f  2                                               i
      min G1 (f ) + G2 (f )
        x

G1 (u) = ||u||1       Prox   G1 (u)i    = max 0, 1                      ui
                                                          ||ui ||
         1
G2 (f ) = ||Kf      y||2         Prox        = (Id + K K)           1
                                                                        K
         2                              G2


C = (f, u) ⇥ RN        RN    2
                                  u = ⇤f
                         ˜ ˜
        Prox C (f, u) = (f , f )
Compute the solution of:           (Id +       ˜
                                              )f =   div(u) + f
            O(N log(N )) operations using FFT.
Example: TV Regularization




  Orignal f0       y = f0 + w     Recovery f




y = Kx0                                Iteration
Alternating Direction Method of Multipliers
min F (x) + G A(x)     (?)    ()    min F (x) + G(y)
 x                                 x,y=Ax
     A : RN ! RP injective.
Alternating Direction Method of Multipliers
 min F (x) + G A(x)     (?)    ()      min F (x) + G(y)
   x                                  x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y   Axi
             x,y   u
Alternating Direction Method of Multipliers
 min F (x) + G A(x)     (?)    ()      min F (x) + G(y)
   x                                  x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y       Axi
             x,y   u
Augmented: min max L (x, y, u) = L(x, y, u) +       ||y   Ax||2
             x,y   u                            2
Alternating Direction Method of Multipliers
 min F (x) + G A(x)            (?)     ()        min F (x) + G(y)
   x                                            x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y                  Axi
                x,y   u
Augmented: min max L (x, y, u) = L(x, y, u) +                  ||y   Ax||2
                x,y   u                                    2
                  (`+1)
                x         = argminx L (x, y (`) , u(`) )
         ADMM




                y (`+1) = argminy L (x(`+1) , y, u(`) )
                u(`+1) = u(`) + (y (`+1)       Ax(`+1) )
Alternating Direction Method of Multipliers
 min F (x) + G A(x)              (?)       ()      min F (x) + G(y)
   x                                              x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y                    Axi
                  x,y   u
Augmented: min max L (x, y, u) = L(x, y, u) +                    ||y   Ax||2
                  x,y   u                                    2
                    (`+1)
                  x         = argminx L (x, y (`) , u(`) )
           ADMM




                  y (`+1) = argminy L (x(`+1) , y, u(`) )
                  u(`+1) = u(`) + (y (`+1)       Ax(`+1) )

       Theorem: If          > 0, x(    )
                                           x    a solution of ( )

                  [Gabay, Mercier, Glowinski, Marrocco, 76]
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2

  Proposition:   ProxAF = A+ Id      ProxF ⇤   A⇤ /   (·/ )
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2

  Proposition:      ProxAF = A+ Id          ProxF ⇤    A⇤ /   (·/ )


               x(`+1) = ProxA (y (`)
                            F/            u(`) )
      ADMM




               y (`+1) = ProxG/ (Ax(`+1) + u(`) )

               u(`+1) = u(`) + (y (`+1)    Ax(`+1) )
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2

  Proposition:      ProxAF = A+ Id          ProxF ⇤    A⇤ /   (·/ )


               x(`+1) = ProxA (y (`)
                            F/            u(`) )
      ADMM




               y (`+1) = ProxG/ (Ax(`+1) + u(`) )

               u(`+1) = u(`) + (y (`+1)    Ax(`+1) )

     ! If G A is simple: use DR.
     ! If F ⇤ A⇤ is simple: use ADMM.
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)         !   min F ⇤ ( A⇤ u) + G⇤ (u)
      x                             u
Important: no bijection between u and x.
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)         !    min F ⇤ ( A⇤ u) + G⇤ (u)
      x                              u
Important: no bijection between u and x.

 Proposition:   DR applied to F ⇤     A⇤ + G⇤ is ADMM.

                                    [Eckstein, Bertsekas, 92]
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)          !    min F ⇤ ( A⇤ u) + G⇤ (u)
      x                               u
Important: no bijection between u and x.

 Proposition:   DR applied to F ⇤         A⇤ + G⇤ is ADMM.

                                     [Eckstein, Bertsekas, 92]
DR iterations (when   ↵ = 1):
     (`+1)   1 (`)    1
   z       = z +        RProx   F⇤   A⇤    RProx   G⇤   (z (`) )
             2        2
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)         !    min F ⇤ ( A⇤ u) + G⇤ (u)
      x                              u
Important: no bijection between u and x.

 Proposition:   DR applied to F ⇤     A⇤ + G⇤ is ADMM.

                                    [Eckstein, Bertsekas, 92]
DR iterations (when ↵ = 1):
     (`+1)   1 (`) 1
   z       = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) )
             2      2
The iterates of ADMM are recovered using:
                                   (`)   1 (`)
                                  y = (z          u(`) )
  x(`+1) = ProxA (y (`) u(`) )
                 F/
                                  u(`) = Prox G⇤ (z (`) )
More than 2 Functionals

      min G1 (x) + . . . + Gk (x)                 each Fi is simple
         x

    min G(x1 , . . . , xk ) +   C (x1 , . . . , xk )
     x

G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )

C = (x1 , . . . , xk )   Hk  x1 = . . . = xk
More than 2 Functionals

            min G1 (x) + . . . + Gk (x)                    each Fi is simple
             x

        min G(x1 , . . . , xk ) +        C (x1 , . . . , xk )
         x

   G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )

   C = (x1 , . . . , xk )          Hk  x1 = . . . = xk


G and   C   are simple:

 Prox   G (x1 , . . . , xk )    = (Prox     Gi (xi ))i
                                                                 1
 Prox   ⇥C (x1 , . . . , xk )   = (˜, . . . , x)
                                   x          ˜        where x =
                                                             ˜              xi
                                                                 k      i
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
GFB Splitting
                                             n
                        min F (x) +               Gi (x)   ( )
                    x RN
                                            i=1
i = 1, . . . , n,             Smooth         Simple
     (⇥+1)    (⇥)                                    (⇥)
    zi     = zi +         Proxn        G   (2x
                                             (⇥)
                                                    zi       F (x(⇥) )) x(⇥)
                  n
              1                ( +1)
    x( +1)
             =                zi
                    n   i=1




                                       [Raguet, Fadili, Peyr´ 2012]
                                                            e
GFB Splitting
                                               n
                        min F (x) +                  Gi (x)   ( )
                    x RN
                                               i=1
i = 1, . . . , n,             Smooth            Simple
     (⇥+1)    (⇥)                                       (⇥)
    zi     = zi +         Proxn        G   (2x  (⇥)
                                                       zi       F (x(⇥) )) x(⇥)
                  n
              1                ( +1)
    x( +1)
              =               zi
                    n   i=1


        Theorem:                Let        F be L-Lipschitz.
        If    < 2/L,          x(   )
                                           x     a solution of ( )
                                       [Raguet, Fadili, Peyr´ 2012]
                                                            e
GFB Splitting
                                               n
                        min F (x) +                  Gi (x)   ( )
                    x RN
                                               i=1
i = 1, . . . , n,             Smooth            Simple
     (⇥+1)    (⇥)                                       (⇥)
    zi     = zi +         Proxn        G   (2x  (⇥)
                                                       zi       F (x(⇥) )) x(⇥)
                  n
              1                ( +1)
    x( +1)
              =               zi
                    n   i=1


        Theorem:                Let        F be L-Lipschitz.
        If    < 2/L,          x(   )
                                           x     a solution of ( )
                                     [Raguet, Fadili, Peyr´ 2012]
                                                          e
             n=1               Forward-backward.
             F =0              Douglas-Rachford.
GFB Fix Point
x   argmin F (x) +   i   Gi (x)         0      F (x ) +   i   Gi (x )
    x RN
            yi   Gi (x ),         F (x ) +   i yi   =0
GFB Fix Point
x   argmin F (x) +       i   Gi (x)         0       F (x ) +       i   Gi (x )
    x RN
             yi       Gi (x ),        F (x ) +    i yi   =0

                         1
           (zi )n ,
                i=1    i, x            zi        F (x )        ⇥Gi (x )
                         n
           x =    1
                  n    i zi           (use zi = x             F (x )    N yi )
GFB Fix Point
x   argmin F (x) +         i   Gi (x)          0            F (x ) +        i   Gi (x )
    x RN
             yi         Gi (x ),        F (x ) +        i yi     =0

                           1
           (zi )n ,
                i=1      i, x            zi          F (x )            ⇥Gi (x )
                           n
           x =     1
                   n     i zi           (use zi = x                   F (x )      N yi )

                  (2x      zi           F (x ))         x        n ⇥Gi (x )
                  x⇥ = Proxn       Gi (2x⇥         zi            F (x⇥ ))
                  zi = zi + Proxn         G   (2x⇥          zi         F (x⇥ ))     x⇥
GFB Fix Point
x   argmin F (x) +         i   Gi (x)          0            F (x ) +        i   Gi (x )
    x RN
             yi         Gi (x ),        F (x ) +        i yi     =0

                           1
           (zi )n ,
                i=1      i, x            zi          F (x )            ⇥Gi (x )
                           n
           x =     1
                   n     i zi           (use zi = x                   F (x )      N yi )

                  (2x      zi           F (x ))         x        n ⇥Gi (x )
                  x⇥ = Proxn       Gi (2x⇥         zi            F (x⇥ ))
                  zi = zi + Proxn         G   (2x⇥          zi         F (x⇥ ))     x⇥

      +                    Fix point equation on (x , z1 , . . . , zn ).
Block Regularization
        1       2
                     block sparsity: G(x) =                 ||x[b] ||,          ||x[b] ||2 =         x2
                                                                                                      m
                                                      b B                                      m b

iments                            Towards More Complex Penalization
            (2)                 Bk
2
    +       ` 1 `2
                      4
                      k=1   x   1,2

                                                                                          b B1       i b xi
                                 ⇥ x⇥⇥1 =   i ⇥xi ⇥         b B          i   b xi2               +
                                                                                                     i b xi
              N: 256
                                                                                          b B2

                                            b     B




    Image f =            x Coe cients x.
Block Regularization
     1        2
                  block sparsity: G(x) =                    ||x[b] ||,      ||x[b] ||2 =          x2
                                                                                                   m
                                                      b B                                  m b

iments Towards More Complex Penalization
 Non-overlapping decomposition: B = B ... B
                  Towards More Complex Penalization
                Towards More Complex Penalization
                       n
                                                                     1               n

2     G(x) =4 x iBk
      (2)
    + ` ` k=1 G 1,2
                 (x)                        Gi (x) =           ||x[b] ||,
         1    2       i=1                              b Bi

                                                                                  b b 1b1 B1 i b xiixb xi
                                                                                                    22
                                                                                    BB
                            ⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
                                  ⇥=                                                     ++ +
                                                                                               i b i
                               ⇥ ⇥1 ⇥1 = i i ⇥i i ⇥    bb B B i
                                                        Bb           xii2bi2xi2
                                                                   bbx
                                                                   i
             N: 256
                                                                                  b b 2b2 B2 i
                                                                                    BB             xi2 b2xi
                                                                                                 b b xi
                                                                                                 i

                                           b      B




    Image f =              x Coe cients x.             Blocks B1                    B1       B2
Block Regularization
     1        2
                  block sparsity: G(x) =             ||x[b] ||,      ||x[b] ||2 =           x2
                                                                                             m
                                               b B                                  m b

iments Towards More Complex Penalization
 Non-overlapping decomposition: B = B ... B
                  Towards More Complex Penalization
                Towards More Complex Penalization
                       n
                                                              1               n

2     G(x) =4 x iBk
      (2)
    + ` ` k=1 G 1,2
                 (x)                 Gi (x) =           ||x[b] ||,
         1    2       i=1                        b Bi
    Each Gi is simple:                                                    b b 1b1 B1 i b xiixb xi
                                                                            BB
                                                                                            22

                   ⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
                        ⇥ ⇥1 = i ⇥i i                                                x +
                                                                                       i b i
     ⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
                           =                   Bb        bx                         ++m
             N: 256                                                    ||x[b]b||B            xi2 b2xi
                                                                              2 2 B2
                                                                          b B b        i   b b xi
                                                                                           i

                                    b    B




    Image f =              x Coe cients x.       Blocks B1                   B1        B2
Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
                   Deconv. x 2Inpaint. min 2 ⇥ ` `
                                            x                                                                                                  x
                                                                                                                                             k=1    x+1,2`          k=1
          log10(E−E
          2                                                                                                                                                1   `2
                                        Numerical Illustration



                                                 log10(E−
                             1
                                  Numerical Experiments Experiments
                                    1
                                              Numerical                       1



                                                                                                                                            TI (2)`2 4
                                                                              0

                  ||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K                                                                                   x= + `wavelets x
                                                                                                                                       Bk 2
                             0
            : 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2)
  Deconvolution min 2 Y ⇥ K
         tmin
      −1 EFB
           x 102
                            Deconvolution +GCP: 1` 4
                           −1
                               tEFB 2 +
                                      10 40   20
                                                           368s
                                                      30 1 2 2 40k=1
                                                       `
                                                                                                                                   x   1,2      1    k=1
                       20        30
       3           iteration 3
                             #                   i
                                      EFB iteration #       EFB
             log10(E−Emin)




                                                              log10(E−Emin)
                                                                                          PR                           PR
                             2   = convolution
                                             2   = inpainting+convolution l1/l2
                                                                       l1/l2
                                                                            : 1.30e−03;   CPλ2 : 1.30e−03;             CP 2
                                                                                                                         λ
                      tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
                                        tEFB: 161s; tPR: 173s; tCP N: 256
          tEFB: 161s; noise: 0.025; :convol.: 2                     190s
                             1
                                  Numerical Experiments            2          1

                                                                              EFB
                                                                                        it.     N: 256
                                                                                                                       EFB
                                                                                                                        (4)                  Bk
                                                                                  Y ⇥P K                           +
                             0                                                0
                                                  log10(E−Emin)

      3                                                                        3
                                                                              1
                                                                              PR
                                                                                                               2       PR          16
 onv. + Inpaint. minx
  2
                                                                              CP
                                                                              2 30
                                                                               2
                                                                                                          x            CP
                                                                                                                        `140`2     k=1   x   1,2
                                      10          20                                      10    40   20        30
      1                                     iteration #
                                                      1                                         iteration #
      0                                                                       0                                         λ4 : 1.00e−03;             λ4 : 1.00e−03;
                                                                                                                         l1/l2                      l1/l2
     tEFB: 283s; tPR: 298s; tCP: 368s
  −1      noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
                                noise: convol.: 2
                                 −1                         it. #50; SNR: 21.80dB #50; SNR: 21.80dB
                                                                                it.
                             10        20
                                   iteration #
                                                     30
                                                                              EFB
                                                                                    40     10        20
                                                                                                 iteration #
                                                                                                                30            40                         x0
      3
                                                                              PR
min




      2
                                                                              CP
                                                                                                              λ2     : 1.30e−03;                   λ2 : 1.30e−03;
                                                                                                               l1/l2                                l1/l2
      1               noise: 0.025; convol.: 2                                           noise: 0.025; it. #50; SNR: 22.49dB
                                                                                                       convol.: 2                             it. #50; SNR: 22.49dB
10




      0




           log10
                             10        20
                                 (E(x( ) ) #
                                   iteration
                                                    30
                                                  E(x ))
                                                         y = x0 + w                 40
                                                                                                                                   x
                                                                                                               4
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Primal-dual Formulation
Fenchel-Rockafellar duality:       A:H⇥   L     linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui      G⇤ (u)
                                                      2
x2H                            x          u2L
Primal-dual Formulation
Fenchel-Rockafellar duality:       A:H⇥        L       linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui             G⇤ (u)
                                                             2
x2H                            x           u2L

Strong duality:   0 2 ri(dom(G2 ))      A ri(dom(G1 ))
(min $ max)       = max        G⇤ (u) + min G1 (x) + hx, A⇤ ui
                                2
                      u                  x

                  = max        G⇤ (u)
                                2       G⇤ (
                                          1    A⇤ u)
                      u
Primal-dual Formulation
Fenchel-Rockafellar duality:        A:H⇥        L       linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui              G⇤ (u)
                                                              2
x2H                             x           u2L

Strong duality:       0 2 ri(dom(G2 ))   A ri(dom(G1 ))
(min $ max)           = max     G⇤ (u) + min G1 (x) + hx, A⇤ ui
                                 2
                          u               x

                      = max     G⇤ (u)
                                 2       G⇤ (
                                           1    A⇤ u)
                          u
Recovering x? from some u? :
       x? = argmin G1 (x? ) + hx? , A⇤ u? i
                  x
Primal-dual Formulation
Fenchel-Rockafellar duality:            A:H⇥       L       linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui                 G⇤ (u)
                                                                 2
x2H                                 x           u2L

Strong duality:       0 2 ri(dom(G2 ))      A ri(dom(G1 ))
(min $ max)           = max        G⇤ (u) + min G1 (x) + hx, A⇤ ui
                                    2
                          u                  x

                      = max        G⇤ (u)
                                    2       G⇤ (
                                              1    A⇤ u)
                          u
Recovering x? from some u? :
       x? = argmin G1 (x? ) + hx? , A⇤ u? i
                  x

   ()      A⇤ u? 2 @G1 (x? )
   () x? 2 (@G1 )          1
                               ( A⇤ u? ) = @G⇤ ( A⇤ s? )
                                             1
Forward-Backward on the Dual

If G1 is strongly convex:    r2 G1 > cId
                                                 c
  G1 (tx + (1   t)y) 6 tG1 (x) + (1   t)G1 (y)     t(1   t)||x   y||2
                                                 2
Forward-Backward on the Dual

If G1 is strongly convex:     r2 G1 > cId
                                                 c
  G1 (tx + (1   t)y) 6 tG1 (x) + (1   t)G1 (y)     t(1   t)||x   y||2
                                                 2
       x? uniquely defined.
                                          x? = rG? ( A⇤ u? )
                                                 1
       G? is of class C 1 .
        1
Forward-Backward on the Dual

If G1 is strongly convex:         r2 G1 > cId
                                                    c
  G1 (tx + (1   t)y) 6 tG1 (x) + (1      t)G1 (y)     t(1   t)||x   y||2
                                                    2
       x? uniquely defined.
                                             x? = rG? ( A⇤ u? )
                                                    1
       G? is of class C 1 .
        1

 FB on the dual:       min G1 (x) + G2 A(x)
                       x2H
                   =   min G? ( A⇤ u) + G? (u)
                             1            2
                       u2L
                            Smooth      Simple
                              ⇣                             ⌘
       u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
                       2               1
Example: TV Denoising
       1
   min ||f      y||2 + ||⇥f ||1              min ||y + div(u)||2
  f RN 2                                   ||u||

             ||u||1 =       ||ui ||        ||u||    = max ||ui ||
                                                        i
                        i
Dual solution u                   Primal solution f = y + div(u )

                                                   [Chambolle 2004]
Example: TV Denoising
       1
   min ||f         y||2 + ||⇥f ||1               min ||y + div(u)||2
  f RN 2                                       ||u||

                ||u||1 =        ||ui ||        ||u||    = max ||ui ||
                                                            i
                            i
Dual solution u                       Primal solution f = y + div(u )

FB (aka projected gradient descent):                   [Chambolle 2004]

   u(   +1)
              = Proj||·||          u( ) +    (y + div(u( ) ))
                                     ui
  v = Proj||·||  (u)      vi =
                               max(||ui ||/ , 1)
                             2      1
  Convergence if     <           =
                       ||div ⇥||    4
Primal-Dual Algorithm
          min G1 (x) + G2 A(x)
          x H
() min max G1 (x)    G⇤ (z) + hA(x), zi
                      2
      x    z
Primal-Dual Algorithm
          min G1 (x) + G2 A(x)
          x H
() min max G1 (x)            G⇤ (z) + hA(x), zi
                              2
      x        z

    z (`+1) = Prox      G⇤
                         2
                             (z (`) + A(˜(`) )
                                        x
    x(⇥+1) = Prox       G1 (x(⇥)          A (z (⇥) ))
    ˜
    x(   +1)
               = x(   +1)
                            + (x(   +1)
                                             x( ) )
    = 0: Arrow-Hurwicz algorithm.
    = 1: convergence speed on duality gap.
Primal-Dual Algorithm
             min G1 (x) + G2 A(x)
             x H
() min max G1 (x)             G⇤ (z) + hA(x), zi
                               2
         x     z

    z (`+1) = Prox       G⇤
                          2
                              (z (`) + A(˜(`) )
                                         x
    x(⇥+1) = Prox        G1 (x(⇥)         A (z (⇥) ))
    ˜
    x(   +1)
               = x(   +1)
                            + (x(   +1)
                                             x( ) )
     = 0: Arrow-Hurwicz algorithm.
     = 1: convergence speed on duality gap.

  Theorem: [Chambolle-Pock 2011]
     If 0             1 and ⇥⇤ ||A||2 < 1 then
    x(   )
               x minimizer of G1 + G2 A.
Example: Optimal Transport

Staggered grid formulation :
         min
         1     2
                   J(I(x)) + ◆C (x)
   x2RGst ⇥RGst
                           1          2
                          Gst        Gst
               1    2
     I = (I , I ) : R           ⇥R         ! RG c

     s
                                              s

                                      I

                          t                                      t
         Staggered grid                           Centered grid Gc
              1    2
             Gst Gst
Conclusion
Inverse problems in imaging:
      Large scale, N 106 .
     Non-smooth (sparsity, TV, . . . )
     (Sometimes) convex.
     Highly structured (separability,    p
                                             norms, . . . ).
Conclusion
Inverse problems in imaging:
      Large scale, N 106 .
                 Towards More Complex Penalization
     Non-smooth (sparsity, TV, . . . )
     (Sometimes) convex.                                  b B1       i b xi
                                                                           2

                ⇥ x⇥⇥1 =   i ⇥xi ⇥    b B
                                                 2
                                            i p xi               +
     Highly structured (separability,          b
                                                norms, . . . ).
                                                          b B2       i   b xi2
Proximal splitting:
     Unravel the structure of problems.
     Parallelizable.


                               Decomposition G =     k   Gk
Conclusion
Inverse problems in imaging:
      Large scale, N 106 .
                 Towards More Complex Penalization
     Non-smooth (sparsity, TV, . . . )
     (Sometimes) convex.                               b B1       i b xi
                                                                        2

                ⇥ x⇥⇥1 =   i ⇥xi ⇥   b B
                                                2
                                           i p xi             +
     Highly structured (separability,         b
                                               norms, . . . ).
                                                       b B2       i   b xi2
Proximal splitting:
     Unravel the structure of problems.
     Parallelizable.
Open problems:
                        Decomposition G = k Gk
     Less structured problems without smoothness.
     Non-convex optimization.

Contenu connexe

Tendances

Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse RepresentationGabriel Peyré
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGabriel Peyré
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionGabriel Peyré
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationGabriel Peyré
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyFrank Nielsen
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexityFrank Nielsen
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsGabriel Peyré
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image ProcessingGabriel Peyré
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingGabriel Peyré
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas EberleBigMC
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slackStéphane Canu
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsGabriel Peyré
 
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge TheoryL. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge TheorySEENET-MTP
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3jainatin
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 

Tendances (20)

Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse Representation
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : Multiresolution
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh Parameterization
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : Geodesics
 
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge TheoryL. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 

Similaire à Optimal Transport and Proximal Splitting

Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Sean Meyn
 
Gibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceGibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceJeremyHeng10
 
Gibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceGibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceJeremyHeng10
 
Lesson 31: Evaluating Definite Integrals
Lesson 31: Evaluating Definite IntegralsLesson 31: Evaluating Definite Integrals
Lesson 31: Evaluating Definite IntegralsMatthew Leingang
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handoutcsedays
 
Emat 213 study guide
Emat 213 study guideEmat 213 study guide
Emat 213 study guideakabaka12
 
Reflect tsukuba524
Reflect tsukuba524Reflect tsukuba524
Reflect tsukuba524kazuhase2011
 
Case Study (All)
Case Study (All)Case Study (All)
Case Study (All)gudeyi
 
Advanced Functions Unit 1
Advanced Functions Unit 1Advanced Functions Unit 1
Advanced Functions Unit 1leefong2310
 
"Modern Tracking" Short Course Taught at University of Hawaii
"Modern Tracking" Short Course Taught at University of Hawaii"Modern Tracking" Short Course Taught at University of Hawaii
"Modern Tracking" Short Course Taught at University of HawaiiWilliam J Farrell III
 
Correlative level coding
Correlative level codingCorrelative level coding
Correlative level codingsrkrishna341
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”IOSRJM
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
On the Stick and Rope Problem - Draft 1
On the Stick and Rope Problem - Draft 1On the Stick and Rope Problem - Draft 1
On the Stick and Rope Problem - Draft 1Iwan Pranoto
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 

Similaire à Optimal Transport and Proximal Splitting (20)

Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009
 
Gibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceGibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inference
 
Gibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceGibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inference
 
Lesson 31: Evaluating Definite Integrals
Lesson 31: Evaluating Definite IntegralsLesson 31: Evaluating Definite Integrals
Lesson 31: Evaluating Definite Integrals
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handout
 
Emat 213 study guide
Emat 213 study guideEmat 213 study guide
Emat 213 study guide
 
Reflect tsukuba524
Reflect tsukuba524Reflect tsukuba524
Reflect tsukuba524
 
Mcgill3
Mcgill3Mcgill3
Mcgill3
 
Case Study (All)
Case Study (All)Case Study (All)
Case Study (All)
 
Advanced Functions Unit 1
Advanced Functions Unit 1Advanced Functions Unit 1
Advanced Functions Unit 1
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
"Modern Tracking" Short Course Taught at University of Hawaii
"Modern Tracking" Short Course Taught at University of Hawaii"Modern Tracking" Short Course Taught at University of Hawaii
"Modern Tracking" Short Course Taught at University of Hawaii
 
Correlative level coding
Correlative level codingCorrelative level coding
Correlative level coding
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”
 
SSA slides
SSA slidesSSA slides
SSA slides
 
2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
On the Stick and Rope Problem - Draft 1
On the Stick and Rope Problem - Draft 1On the Stick and Rope Problem - Draft 1
On the Stick and Rope Problem - Draft 1
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 

Plus de Gabriel Peyré

Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : IntroductionGabriel Peyré
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusGabriel Peyré
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoveryGabriel Peyré
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseGabriel Peyré
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesGabriel Peyré
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : FourierGabriel Peyré
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : DenoisingGabriel Peyré
 
Signal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingSignal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingGabriel Peyré
 
Signal Processing Course : Approximation
Signal Processing Course : ApproximationSignal Processing Course : Approximation
Signal Processing Course : ApproximationGabriel Peyré
 
Signal Processing Course : Wavelets
Signal Processing Course : WaveletsSignal Processing Course : Wavelets
Signal Processing Course : WaveletsGabriel Peyré
 
Sparsity and Compressed Sensing
Sparsity and Compressed SensingSparsity and Compressed Sensing
Sparsity and Compressed SensingGabriel Peyré
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesGabriel Peyré
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal TransportGabriel Peyré
 
A Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneA Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneGabriel Peyré
 

Plus de Gabriel Peyré (14)

Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : Introduction
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse Recovery
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the Course
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal Bases
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : Fourier
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : Denoising
 
Signal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingSignal Processing Course : Compressed Sensing
Signal Processing Course : Compressed Sensing
 
Signal Processing Course : Approximation
Signal Processing Course : ApproximationSignal Processing Course : Approximation
Signal Processing Course : Approximation
 
Signal Processing Course : Wavelets
Signal Processing Course : WaveletsSignal Processing Course : Wavelets
Signal Processing Course : Wavelets
 
Sparsity and Compressed Sensing
Sparsity and Compressed SensingSparsity and Compressed Sensing
Sparsity and Compressed Sensing
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
A Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneA Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New One
 

Dernier

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Dernier (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Optimal Transport and Proximal Splitting

  • 1. Proximal Splitting and Optimal Transport Gabriel Peyré www.numerical-tours.com
  • 2. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 3. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds ans- that eme rate ance eval t al. µ0 µ1
  • 4. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds Mass preserving map T : Rk Rk . ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1 eme rate ance x T (x) eval t al. µ0 µ1
  • 5. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds Mass preserving map T : Rk Rk . ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1 eme rate ance x T (x) eval t al. µ0 µ1 Distributions with densities: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
  • 6. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1
  • 7. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1 Regularity condition: µ0 or µ1 does not give mass to “small sets”. Theorem (p > 1): there exists a unique optimal T . T T µ1 µ0
  • 8. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1 Regularity condition: µ0 or µ1 does not give mass to “small sets”. Theorem (p > 1): there exists a unique optimal T . Theorem (p = 2): T is defined as T = with convex. T T T (x) T (x ) T is monotone: µ1 x T (x) T (x ), x x 0 µ0 x
  • 9. Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B)
  • 10. Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B) Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
  • 11. Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B) Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
  • 12. Optimal Transport Let p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd Rd Optimal transport T : Rd Rd : µ x y (x, T (x))
  • 13. Optimal Transport Let p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd Rd Optimal transport T : Rd Rd : µ x p = 2: T = unique solution of y ⇥ is convex l.s.c. (x, T (x)) ( ⇥)⇤µ =
  • 14. 1-D Continuous Wasserstein Distributions µ, on R. t Cumulative functions: Cµ (t) = dµ(x) For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”)
  • 15. 1-D Continuous Wasserstein Distributions µ, on R. t Cumulative functions: Cµ (t) = dµ(x) For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”) Explicit formulas: 1 H Wp (µ, )p = |Cµ 1 C 1 p | 0 W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1 R
  • 16. Grayscale Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. f0
  • 17. Grayscale Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. Gray-value distributions: µi defined on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 µ1 f0 µ0
  • 18. Grayscale Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. Gray-value distributions: µi defined on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 Optimal transport: T = Cµ11 Cµ0 . µ1 f0 Cµ0 (f0 ) T (f0 ) Cµ0 Cµ11 µ0 µ1
  • 19. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
  • 20. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
  • 21. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Transport: T : f0 (x) R3 f1 ( (i)) R3 Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 T J. Rabin Wasserstein Regularization
  • 22. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Transport: T : f0 (x) R3Application to Color Transfer R3 f1 ( (i)) Optimal transport framework Sliced Wasserstein projection Applications ˜ Application to ColorfTransfer Equalization:) f0 = T (f0 ) ˜ = f1 0 Sliced Wasserstein projection of X to sty Source image (X image color statistics Y f1 f0 T (f0 ) Sliced Wasserstein project image color statistics Y Source image (X ) T f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 Source image after color transfer µ1 Style image (Y ) T J. Rabin Wasserstein Regularization J. Rabin Wasserstein Regularization
  • 23. cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0: ðv can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring v cc > tem cv cc can be of equations. elliptic a sys- the GPU. We used cubic grid. interpolation used a trilineara parallelizable four- the GPU. We operator using interpolation operator for transferring Image Registration Ittem isto verify that a correction for dv can be obtained by solving with an is easy solved using preconditioned conjugate gradient color Gauss-Seidel relaxation scheme. Thisrestriction s solved using preconditioned conjugate À1 gradient with an the coarse grid residual increases robustness the coarse grid correction to fine grids. Thecorrection to fine grids. The residual restriction the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys- and efficiency and is especially suited for the implementation on incomplete Cholesky preconditioner. mplete Cholesky preconditioner. v v operator for projecting residual from for projecting residual from the fine to coarse grids is operator the fine to coarse grids is tem c c> can be thought as an elliptic system of equations. The sys- v c the GPU. We used a trilinear interpolation operator for transferring tem is solved using preconditioned conjugate gradient with an the coarse grid correction to fine grids. The residual restriction incomplete Cholesky preconditioner. operator for projecting residual from the fine to coarse grids is T [ur Rehman et al, 2009] Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the anterior part of the brain.
  • 24. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢)
  • 25. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢) Z Z 1 J(x) = j(x(s, t))dtds s2Rd t=0 8 ||m||2 < ˜˜ ⇢ if ⇢ > 0, ˜ j(m, ⇢) = ˜ ˜ : 0 if ⇢ = 0 and m = 0, ˜ ˜ +1 otherwise. 2 R 2 R2
  • 26. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢) Z Z 1 J(x) = j(x(s, t))dtds s2Rd t=0 8 ||m||2 < ˜˜ ⇢ if ⇢ > 0, ˜ j(m, ⇢) = ˜ ˜ : 0 if ⇢ = 0 and m = 0, ˜ ˜ +1 otherwise. 2 R 2 R2 C = {x = (m, ⇢) div(x) = 0, B(⇢) = (⇢0 , ⇢1 )} B(⇢) = (⇢(0, ·), ⇢(1, ·))
  • 28. Numerical Examples ⇢0 ⇢1 con- work, plica- ad of ease. peeds rans- t that heme erate mance ieval et al. t Figure 7: Synthetic 2D examples on a Euclidean domain. The
  • 29. Discrete Formulation s Centered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Centered grid Gc
  • 30. Discrete Formulation s Centered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Staggered grid formulation : Centered grid Gc min 2 J(I(x)) + ◆C (x) s 1 x2RGst ⇥RGst t Staggered grid 1 2 Gst Gst
  • 31. Discrete Formulation s Centered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Staggered grid formulation : Centered grid Gc min 2 J(I(x)) + ◆C (x) s 1 x2RGst ⇥RGst Interpolation operator: 1 2 Gst Gst 1 2 I = (I , I ) : R ⇥R ! RG c t 2I1 (m)i,j = mi+ 1 ,j + mi 2 1 2 ,j Staggered grid ! Projection on div(x) = 0 using FFTs. 1 2 Gst Gst
  • 32. SOCP Formulation P min J(x) + ◆C (x) J(x) = i2Gc j(xi ) x2RGc ⇥d X () min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K x2RGc ⇥d ,r2RGc i (Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r ˜ ˜ ˜ ˜ ˜˜
  • 33. SOCP Formulation P min J(x) + ◆C (x) J(x) = i2Gc j(xi ) x2RGc ⇥d X () min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K x2RGc ⇥d ,r2RGc i (Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r ˜ ˜ ˜ ˜ ˜˜ Second order cone program: ! Use interior point methods (e.g. MOSEK software). Linear convergence with iteration #. Poor scaling with dimension |Gc |. E cient for medium scale problems (N ⇠ 104 ).
  • 34. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y
  • 35. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argmin Regularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data fidelity Regularity
  • 36. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argmin Regularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data fidelity Regularity P Total Variation: R(x) = i ||(rx)i ||
  • 37. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argmin Regularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data fidelity Regularity P Total Variation: R(x) = i ||(rx)i || 1 P ⇤ ` sparsity: R(x) = i |xi | Images are sparse in wavelet bases. ⇤ Image f = x Coe↵. x = f
  • 38. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 39. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H
  • 40. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
  • 41. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤
  • 42. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤ 0 if x ⇥ C, Indicator: C (x) = + otherwise. (C closed and convex)
  • 43. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| G(0) = [ 1, 1]
  • 44. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]
  • 45. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1] First-order conditions: x argmin G(x) 0 G(x ) x H
  • 46. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1] First-order conditions: x argmin G(x) 0 G(x ) x H U (x) x Monotone operator: U (x) = G(x) (u, v) U (x) U (y), y x, v u 0
  • 47. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2
  • 48. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z)
  • 49. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z) z = (Id + ⇥G) 1 (x) Inverse of a set-valued mapping: where x U (y) y U 1 (x) Prox G = (Id + ⇥G) 1 is a single-valued mapping
  • 50. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z) z = (Id + ⇥G) 1 (x) Inverse of a set-valued mapping: where x U (y) y U 1 (x) Prox G = (Id + ⇥G) 1 is a single-valued mapping Fix point: x argmin G(x) x 0 G(x ) x (Id + ⇥G)(x ) x⇥ = (Id + ⇥G) 1 (x⇥ ) = Prox G (x⇥ )
  • 51. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
  • 52. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1
  • 53. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A
  • 54. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A x Indicators: G(x) = C (x) C Prox G (x) = ProjC (x) ProjC (x) = argmin ||x z|| z C
  • 55. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2
  • 56. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 G(x) = ||x||1 = |xi | 12 log(1 + x2 ) i 10 |x| ||x||0 8 6 4 2 G(x) = ||x||0 = | {i xi = 0} | 0 −2 G(x) −10 −8 −6 −4 −2 0 2 4 6 8 10 G(x) = log(1 + |xi |2 ) i
  • 57. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 G(x) = ||x||1 = |xi | 12 log(1 + x2 ) i 10 |x| ||x||0 Prox G (x)i = max 0, 1 xi 8 |xi | 6 4 2 G(x) = ||x||0 = | {i xi = 0} | 0 −2 G(x) xi if |xi | 2 , −10 −8 −6 −4 −2 0 2 4 6 8 10 Prox G (x)i = 10 0 otherwise. 8 6 4 2 G(x) = log(1 + |xi |2 ) −2 0 i −4 3rd order polynomial root. −6 −8 ProxG (x) −10 −10 −8 −6 −4 −2 0 2 4 6 8 10
  • 58. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) x
  • 59. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) Example: quadratic functional 1 x G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2
  • 60. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) Example: quadratic functional 1 x G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2 Moreau’s identity: Prox G (x) = x ProxG/ (x/ ) G simple G simple
  • 61. Indicator and Homogeneous Functionals Positively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x|| Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1
  • 62. Indicator and Homogeneous Functionals Positively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x|| Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1 p norms: G(x) = ||x||p 1 1 + =1 1 p, q + G (x) = ||x||q p q
  • 63. Indicator and Homogeneous Functionals Positively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x|| Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1 p norms: G(x) = ||x||p 1 1 + =1 1 p, q + G (x) = ||x||q p q Example: Proximal operator of norm Prox ||·|| = Id Proj||·||1 Proj||·||1 (x)i = max 0, 1 xi |xi | for a well-chosen ⇥ = ⇥ (x, )
  • 64. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜
  • 65. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜ Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
  • 66. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜ Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0 Prox j (˜) = x x ˜ ProjC (˜/ ) x where x = (m, ⇢) ˜ ˜ ˜
  • 67. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜ Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0 Prox j (˜) = x x ˜ ProjC (˜/ ) x where x = (m, ⇢) ˜ ˜ ˜ ⇢ (m? , ⇢? ) if ⇢? > 0 Proposition: Prox (m, ⇢) = ˜ ˜ (0, 0) otherwise. ⇢? m ˜ ? where m = ? and ⇢? is the largest root of ⇢ +2 X 3 + (4 ⇢)X 2 + 4 ( ˜ ⇢)X ˜ ||m||2 ˜ 4 2 ⇢=0 ˜
  • 68. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 69. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution.
  • 70. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution. Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow.
  • 71. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution. Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow. Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit] Theorem: If c > 0, x( ) x a solution. Prox G hard to compute. [Rockafellar, 70]
  • 72. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available.
  • 73. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available. Splitting: E(x) = F (x) + Gi (x) i Smooth Simple
  • 74. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available. Splitting: E(x) = F (x) + Gi (x) i Smooth Simple F (x) Iterative algorithms using: Prox Gi (x) solves Forward-Backward: F + G Douglas-Rachford: Gi Primal-Dual: Gi A Generalized FB: F+ Gi
  • 75. Smooth + Simple Splitting Inverse problem: measurements y = Kf0 + w f0 Kf0 K K : RN RP , P N Model: f0 = x0 sparse in dictionary . Sparse recovery: f = x where x solves min F (x) + G(x) x RN Smooth Simple 1 Data fidelity: F (x) = ||y x||2 =K ⇥ 2 Regularization: G(x) = ||x||1 = |xi | i
  • 76. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))
  • 77. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
  • 78. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) ) Projected gradient descent: G= C
  • 79. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) ) Projected gradient descent: G= C Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Passty 79, Gabay, 83]
  • 80. Example: L1 Regularization 1 min || x y||2 + ||x||1 min F (x) + G(x) x 2 x 1 F (x) = || x y||2 2 F (x) = ( x y) L = || || G(x) = ||x||1 ⇥ Prox G (x)i = max 0, 1 xi |xi | Forward-backward Iterative soft thresholding
  • 81. Convergence Speed min E(x) = F (x) + G(x) x F is L-Lipschitz. G is simple. Theorem: If L > 0, FB iterates x( ) satisfies E(x( ) ) E(x ) C/ C degrades with L 0.
  • 82. Multi-steps Accelerations Beck-Teboule accelerated FB: t(0) = 1 ✓ ◆ (`+1) (`) 1 x = Prox1/L y rF (y (`) ) L 1+ 1 + 4(t( ) )2 t( +1) = 2() t 1 ( y ( +1) =x( +1) + ( +1) (x +1) x( ) ) t (see also Nesterov method) C Theorem: If L > 0, E(x ( ) ) E(x ) Complexity theory: optimal in a worse-case sense.
  • 83. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 84. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple Simple Douglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) ) Reflexive prox: RProx G (x) = 2Prox G (x) x
  • 85. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple Simple Douglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) ) Reflexive prox: RProx G (x) = 2Prox G (x) x Theorem: If 0 < < 2 and ⇥ > 0, x( ) x a solution of ( ) [Lions, Mercier, 79]
  • 86. DR Fix Point Equation min G1 (x) + G2 (x) 0 (G1 + G2 )(x) x z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x) x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
  • 87. DR Fix Point Equation min G1 (x) + G2 (x) 0 (G1 + G2 )(x) x z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x) x = Prox G1 (z) and (2x z) x ⇥( G2 )(x) x = Prox G2 (2x z) = Prox G2 RProx G1 (z) z = 2Prox G2 RProx G1 (y) (2x z) z = 2Prox G2 RProx G1 (z) RProx G1 (z) z = RProx G2 RProx G1 (z) z= 1 z+ RProx G2 RProx G1 (z) 2 2
  • 88. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc
  • 89. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form).
  • 90. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c.
  • 91. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c. Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
  • 92. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c. Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000] ! Advantage: relaxation parameter ↵ 2]0, 1[.
  • 93. Example: Constrained L1 min ||x||1 min G1 (x) + G2 (x) x=y x G1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x) G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert.
  • 94. Example: Constrained L1 min ||x||1 min G1 (x) + G2 (x) x=y x G1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x) G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 ) 1 Example: compressed sensing −1 0 R100 400 Gaussian matrix −2 −3 = 0.01 y = x0 ||x0 ||0 = 17 −4 =1 −5 = 10 50 100 150 200 250
  • 95. Auxiliary Variables with DR min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y}
  • 96. Auxiliary Variables with DR min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y} Prox G (x, y) = (Prox G1 (x), Prox G2 (y)) Prox C (x, y) = (x + A y , y ˜ y ) = (˜, A˜) ˜ x x y = (Id + AA ) ˜ 1 (Ax y) where x = (Id + A A) ˜ 1 (A y + x) e cient if Id + AA or Id + A A easy to invert.
  • 97. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) x G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1 G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2 C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f )
  • 98. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) x G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1 G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2 C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f ) Compute the solution of: (Id + ˜ )f = div(u) + f O(N log(N )) operations using FFT.
  • 99. Example: TV Regularization Orignal f0 y = f0 + w Recovery f y = Kx0 Iteration
  • 100. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.
  • 101. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u
  • 102. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2
  • 103. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2 (`+1) x = argminx L (x, y (`) , u(`) ) ADMM y (`+1) = argminy L (x(`+1) , y, u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) )
  • 104. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2 (`+1) x = argminx L (x, y (`) , u(`) ) ADMM y (`+1) = argminy L (x(`+1) , y, u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) ) Theorem: If > 0, x( ) x a solution of ( ) [Gabay, Mercier, Glowinski, Marrocco, 76]
  • 105. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2
  • 106. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
  • 107. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ ) x(`+1) = ProxA (y (`) F/ u(`) ) ADMM y (`+1) = ProxG/ (Ax(`+1) + u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) )
  • 108. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ ) x(`+1) = ProxA (y (`) F/ u(`) ) ADMM y (`+1) = ProxG/ (Ax(`+1) + u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) ) ! If G A is simple: use DR. ! If F ⇤ A⇤ is simple: use ADMM.
  • 109. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x.
  • 110. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92]
  • 111. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92] DR iterations (when ↵ = 1): (`+1) 1 (`) 1 z = z + RProx F⇤ A⇤ RProx G⇤ (z (`) ) 2 2
  • 112. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92] DR iterations (when ↵ = 1): (`+1) 1 (`) 1 z = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) ) 2 2 The iterates of ADMM are recovered using: (`) 1 (`) y = (z u(`) ) x(`+1) = ProxA (y (`) u(`) ) F/ u(`) = Prox G⇤ (z (`) )
  • 113. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk
  • 114. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk G and C are simple: Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i 1 Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x) x ˜ where x = ˜ xi k i
  • 115. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 116. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1 i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 [Raguet, Fadili, Peyr´ 2012] e
  • 117. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1 i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Raguet, Fadili, Peyr´ 2012] e
  • 118. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1 i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Raguet, Fadili, Peyr´ 2012] e n=1 Forward-backward. F =0 Douglas-Rachford.
  • 119. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0
  • 120. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi )
  • 121. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi ) (2x zi F (x )) x n ⇥Gi (x ) x⇥ = Proxn Gi (2x⇥ zi F (x⇥ )) zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
  • 122. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi ) (2x zi F (x )) x n ⇥Gi (x ) x⇥ = Proxn Gi (2x⇥ zi F (x⇥ )) zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥ + Fix point equation on (x , z1 , . . . , zn ).
  • 123. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m b iments Towards More Complex Penalization (2) Bk 2 + ` 1 `2 4 k=1 x 1,2 b B1 i b xi ⇥ x⇥⇥1 = i ⇥xi ⇥ b B i b xi2 + i b xi N: 256 b B2 b B Image f = x Coe cients x.
  • 124. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m b iments Towards More Complex Penalization Non-overlapping decomposition: B = B ... B Towards More Complex Penalization Towards More Complex Penalization n 1 n 2 G(x) =4 x iBk (2) + ` ` k=1 G 1,2 (x) Gi (x) = ||x[b] ||, 1 2 i=1 b Bi b b 1b1 B1 i b xiixb xi 22 BB ⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥ ⇥= ++ + i b i ⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ bb B B i Bb xii2bi2xi2 bbx i N: 256 b b 2b2 B2 i BB xi2 b2xi b b xi i b B Image f = x Coe cients x. Blocks B1 B1 B2
  • 125. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m b iments Towards More Complex Penalization Non-overlapping decomposition: B = B ... B Towards More Complex Penalization Towards More Complex Penalization n 1 n 2 G(x) =4 x iBk (2) + ` ` k=1 G 1,2 (x) Gi (x) = ||x[b] ||, 1 2 i=1 b Bi Each Gi is simple: b b 1b1 B1 i b xiixb xi BB 22 ⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2 ⇥ ⇥1 = i ⇥i i x + i b i ⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1 = Bb bx ++m N: 256 ||x[b]b||B xi2 b2xi 2 2 B2 b B b i b b xi i b B Image f = x Coe cients x. Blocks B1 B1 B2
  • 126. Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2 Deconv. x 2Inpaint. min 2 ⇥ ` ` x x k=1 x+1,2` k=1 log10(E−E 2 1 `2 Numerical Illustration log10(E− 1 Numerical Experiments Experiments 1 Numerical 1 TI (2)`2 4 0 ||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K x= + `wavelets x Bk 2 0 : 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2) Deconvolution min 2 Y ⇥ K tmin −1 EFB x 102 Deconvolution +GCP: 1` 4 −1 tEFB 2 + 10 40 20 368s 30 1 2 2 40k=1 ` x 1,2 1 k=1 20 30 3 iteration 3 # i EFB iteration # EFB log10(E−Emin) log10(E−Emin) PR PR 2 = convolution 2 = inpainting+convolution l1/l2 l1/l2 : 1.30e−03; CPλ2 : 1.30e−03; CP 2 λ tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB tEFB: 161s; tPR: 173s; tCP N: 256 tEFB: 161s; noise: 0.025; :convol.: 2 190s 1 Numerical Experiments 2 1 EFB it. N: 256 EFB (4) Bk Y ⇥P K + 0 0 log10(E−Emin) 3 3 1 PR 2 PR 16 onv. + Inpaint. minx 2 CP 2 30 2 x CP `140`2 k=1 x 1,2 10 20 10 40 20 30 1 iteration # 1 iteration # 0 0 λ4 : 1.00e−03; λ4 : 1.00e−03; l1/l2 l1/l2 tEFB: 283s; tPR: 298s; tCP: 368s −1 noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2 noise: convol.: 2 −1 it. #50; SNR: 21.80dB #50; SNR: 21.80dB it. 10 20 iteration # 30 EFB 40 10 20 iteration # 30 40 x0 3 PR min 2 CP λ2 : 1.30e−03; λ2 : 1.30e−03; l1/l2 l1/l2 1 noise: 0.025; convol.: 2 noise: 0.025; it. #50; SNR: 22.49dB convol.: 2 it. #50; SNR: 22.49dB 10 0 log10 10 20 (E(x( ) ) # iteration 30 E(x )) y = x0 + w 40 x 4
  • 127. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 128. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L
  • 129. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u
  • 130. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u Recovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x
  • 131. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u Recovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x () A⇤ u? 2 @G1 (x? ) () x? 2 (@G1 ) 1 ( A⇤ u? ) = @G⇤ ( A⇤ s? ) 1
  • 132. Forward-Backward on the Dual If G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2
  • 133. Forward-Backward on the Dual If G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2 x? uniquely defined. x? = rG? ( A⇤ u? ) 1 G? is of class C 1 . 1
  • 134. Forward-Backward on the Dual If G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2 x? uniquely defined. x? = rG? ( A⇤ u? ) 1 G? is of class C 1 . 1 FB on the dual: min G1 (x) + G2 A(x) x2H = min G? ( A⇤ u) + G? (u) 1 2 u2L Smooth Simple ⇣ ⌘ u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) ) 2 1
  • 135. Example: TV Denoising 1 min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2 f RN 2 ||u|| ||u||1 = ||ui || ||u|| = max ||ui || i i Dual solution u Primal solution f = y + div(u ) [Chambolle 2004]
  • 136. Example: TV Denoising 1 min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2 f RN 2 ||u|| ||u||1 = ||ui || ||u|| = max ||ui || i i Dual solution u Primal solution f = y + div(u ) FB (aka projected gradient descent): [Chambolle 2004] u( +1) = Proj||·|| u( ) + (y + div(u( ) )) ui v = Proj||·|| (u) vi = max(||ui ||/ , 1) 2 1 Convergence if < = ||div ⇥|| 4
  • 137. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) G⇤ (z) + hA(x), zi 2 x z
  • 138. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) G⇤ (z) + hA(x), zi 2 x z z (`+1) = Prox G⇤ 2 (z (`) + A(˜(`) ) x x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) )) ˜ x( +1) = x( +1) + (x( +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap.
  • 139. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) G⇤ (z) + hA(x), zi 2 x z z (`+1) = Prox G⇤ 2 (z (`) + A(˜(`) ) x x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) )) ˜ x( +1) = x( +1) + (x( +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap. Theorem: [Chambolle-Pock 2011] If 0 1 and ⇥⇤ ||A||2 < 1 then x( ) x minimizer of G1 + G2 A.
  • 140. Example: Optimal Transport Staggered grid formulation : min 1 2 J(I(x)) + ◆C (x) x2RGst ⇥RGst 1 2 Gst Gst 1 2 I = (I , I ) : R ⇥R ! RG c s s I t t Staggered grid Centered grid Gc 1 2 Gst Gst
  • 141. Conclusion Inverse problems in imaging: Large scale, N 106 . Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. Highly structured (separability, p norms, . . . ).
  • 142. Conclusion Inverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. b B1 i b xi 2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b B 2 i p xi + Highly structured (separability, b norms, . . . ). b B2 i b xi2 Proximal splitting: Unravel the structure of problems. Parallelizable. Decomposition G = k Gk
  • 143. Conclusion Inverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. b B1 i b xi 2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b B 2 i p xi + Highly structured (separability, b norms, . . . ). b B2 i b xi2 Proximal splitting: Unravel the structure of problems. Parallelizable. Open problems: Decomposition G = k Gk Less structured problems without smoothness. Non-convex optimization.