This document discusses higher-order segmentation functionals for image segmentation. It begins with an overview of different surface representations and optimization approaches for segmentation, including continuous, combinatorial, and mixed optimization. The main focus is on combinatorial optimization using graph cuts. The talk will cover minimizing entropy and color consistency functionals for image segmentation. It will also discuss convex cardinality potentials, distribution consistency, and extending boundary regularization from length to curvature.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Yuri Boykov — Combinatorial optimization for higher-order segmentation functionals: Entropy, Color Consistency, Curvature, etc.
1. Higher-order Segmentation Functionals:
Entropy, Color Consistency, Curvature, etc.
Yuri Boykov
jointly with
Andrew
Delong
M. Tang
C. Nieuwenhuis
I. Ben Ayed
E. Toppe
O. Veksler
C. Olsson
H. Isack
L. Gorelick
A. Osokin
A. Delong
5. Linear (modular) appearance of region
R( S )
f,S
f p sp
p
Examples of potential functions
• Log-likelihoods
fp
• Chan-Vese
fp
• Ballooning
fp
ln Pr ( I p )
( I p c) 2
1
7. Basic boundary regularization for
B( S )
w [s p
sq ]
s p {0,1}
pq N
second-order terms
[s p sq ]
s p (1 sq )
(1 s p ) sq
quadratic
8. Basic boundary regularization for
B( S )
wpq [s p
sq ]
s p {0,1}
pq N
second-order terms
Examples of discontinuity penalties
• Boundary length
wpq 1
• Image-weighted
boundary length
wpq
exp ( I p I q )2
9. Basic boundary regularization for
B( S )
wpq [s p
sq ]
s p {0,1}
pq N
second-order terms
• corresponds to boundary length | |
– grids [B&K, 2003], via integral geometry
– complexes [Sullivan 1994]
• submodular second-order energy
– can be minimized exactly via graph cuts
a cut
t
n-links
w pq
[Greig et al.’91, Sullivan’94, Boykov-Jolly’01]
s
11. Submodular set functions
Set function
E:2
is submodular if for any
S, T
E ( S T ) E ( S T ) E ( S ) E (T )
Ω
S
T
Significance: any submodular set function can be
globally optimized in polynomial time
[Grotschel et al.1981,88, Schrijver 2000]
O( |
9
|)
12. Submodular set functions
an alternative equivalent definition providing intuitive interpretation: “diminishing returns”
Set function
E (T
E:2
is submodular if for any
{v}) E (T ) E (S
S
T
{v}) E (S )
Ω
S
T
S
Easily follows from the previous definition: E (T
v
v
T
S
T
S
{v}) E (S ) E (S
{v}) E(T)
Significance: any submodular set function can be
globally optimized in polynomial time
[Grotschel et al.1981,88, Schrijver 2000]
9
O( |
|)
13. Graph cuts for minimization of
submodular set functions
Assume set Ω and 2nd-order (quadratic) function
E ( s)
E pq ( s p , sq )
( pq ) N
Function E(S) is submodular if for any
s p , sq {0,1}
Indicator variables
( p , q) N
E pq (0,0) E pq (1,1) E pq (1,0) E pq (0,1)
Significance: submodular 2nd-order boolean (set) function
can be globally optimized in polynomial time by graph cuts
[Hammer 1968, Pickard&Ratliff 1973] O(| N | |
[Boros&Hammer 2000, Kolmogorov&Zabih2003]
|2 )
15. Graph cuts for minimization of
posterior energy (MRF)
Assume Gibbs distribution over binary random variables
Pr (s1 ,..., sn )
exp( E(S ))
for
s p {0,1}
S
{ p | sp
1}
Theorem [Boykov, Delong, Kolmogorov, Veksler in unpublished book 2014?]
All random variables
sp
are positively correlated iff set function
E(S)
That is, submodularity implies MRF with “smoothness” prior
is submodular
18. Overview of this talk
high-order functionals
• From likelihoods to entropy
• From entropy to color consistency
optimization
block-coordinate descent
[Zhu&Yuille 96, GrabCut 04]
global minimum
[our work: One Cut 2014]
• Convex cardinality potentials
submodular approximations
• Distribution consistency
[our work: Trust Region 13, Auxiliary Cuts 13]
• From length to curvature
other extensions [arXiv13]
19. Given likelihood models
unary (linear) term
E( S| 0 , 1 )
ln Pr(I p|
p
assuming known
sp
)
pair-wise (quadratic) term
wpq [s p sq ]
s p {0,1}
pq N
guaranteed globally optimal S
• parametric models – e.g. Gaussian or GMM
• non-parametric models - histograms
I p RGB
image segmentation, graph cut
[Boykov&Jolly, ICCV2001]
20. Beyond fixed likelihood models
mixed optimization term
E( S , 0 , 1 )
ln Pr(I p|
p
extra variables
• parametric models – e.g. Gaussian or GMM
• non-parametric models - histograms
sp
)
pair-wise (quadratic) term
wpq [s p sq ]
s p {0,1}
pq N
NP hard mixed optimization!
[Vesente et al., ICCV’09]
I p RGB
Models 0 , 1
are iteratively
re-estimated
(from initial box)
iterative image segmentation, Grabcut
(block coordinate descent S
0, 1)
[Rother, et al. SIGGRAPH’2004]
21. Block-coordinate descent for
E(S , 0 , 1 )
• Minimize over segmentation S for fixed
E(S , 0 , 1 )
ln Pr(I p|
Sp
)
p
0
,
wpq [s p sq ]
pq N
optimal S is computed using graph cuts, as in [BJ 2001]
• Minimize over
0
,
1
for fixed labeling S
fixed for
E(S , 0 , 1 )
ln Pr( I p| 0 )
p :s p 0
ln Pr( I p| 1 )
p :s p 1
ˆ
0
pS
distribution of intensities in
current bkg. segment S ={p:Sp=0}
w pq [ s p
pq N
ˆ
1
S=const
pS
distribution of intensities in
current obj. segment S={p:Sp=1}
sq ]
1
22. Iterative learning of color models
(binary case s p {0,1})
• GrabCut: iterated graph cuts
E(S , 0 , 1 )
ln Pr(I p|
p
start from models 0 , 1
inside and outside some given box
Sp
)
[Rother et al., SIGGRAPH 04]
wpq [s p sq ]
pq N
iterate graph cuts and model re-estimation
until convergence to a local minimum
solution is sensitive to initial box
23. Iterative learning of color models
(binary case s p {0,1} )
E=1.410×106
E=1.39×106
E=2.41×106
E=2.37×106
BCD minimization of E ( S , 0 , 1 ) converges to a local minimum
(interactivity a la “snakes”)
24. Iterative learning of color models
(could be used for more than 2 labels s p {0,1,2,...})
• Unsupervised segmentation [Zhu&Yuille, 1996]
using level sets + merging heuristic
E ( S , 0 , 1 , 2 ...)
ln Pr(I p|
p
Sp
)
wpq [s p sq ]
| labels|
pq N
iterate segmentation
and model re-estimation
until convergence
models compete, stable result if sufficiently many
initialize models 0 , 1 , 2 ,
from many randomly sampled boxes
25. Iterative learning of color models
(could be used for more than 2 labels s p {0,1,2,...})
• Unsupervised segmentation [Delong et al., 2012]
using a-expansion (graph-cuts)
E ( S , 0 , 1 , 2 ...)
ln Pr(I p|
p
Sp
)
wpq [s p sq ]
| labels|
pq N
iterate segmentation
and model re-estimation
until convergence
models compete, stable result if sufficiently many
initialize models 0 , 1 , 2 ,
from many randomly sampled boxes
26. Iterative learning of other models
(could be used for more than 2 labels s p {0,1,2,...})
• Geometric multi-model fitting [Isack et al., 2012]
using a-expansion (graph-cuts)
E ( S , 0 , 1 , 2 ...)
p
Sp
p
initialize plane models 0 , 1 , 2 ,
from many randomly sampled SIFT matches
in 2 images of the same scene
-p
w pq [ s p
sq ]
| labels|
pq N
iterate segmentation
and model re-estimation
until convergence
models compete, stable result if sufficiently many
27. Iterative learning of other models
(could be used for more than 2 labels s p {0,1,2,...})
• Geometric multi-model fitting [Isack et al., 2012]
using a-expansion (graph-cuts)
E ( S , 0 , 1 , 2 ...)
p
Sp
p
-p
w pq [ s p
sq ]
| labels|
pq N
VIDEO
initialize Fundamental matrices 0 , 1 , 2 ,
from many randomly sampled SIFT matches
in 2 consecutive frames in video
iterate segmentation
and model re-estimation
until convergence
models compete, stable result if sufficiently many
28. From color model estimation
to entropy and color consistency
global optimization in One Cut
[Tang et al. ICCV 2013]
29. Interpretation of log-likelihoods:
entropy of segment intensities
ln Pr ( I p| )
Si { p S | I p
i}
given distribution
of intensities
=
| Si| ln pi
S2
i
=
S
Si
S3
S5
S4
piS ln pi
| S|
i
pis
| Si |
|S|
probability of
intensity i in S
= {p1 , p2 , ... , pn }
p S
pixels of color i in S
S1
where
S
S
S
p S { p1 , p2 ,..., pn }
distribution of intensities
observed at S
H(S| )
cross entropy
of distribution pS w.r.t.
30. Interpretation of log-likelihoods:
entropy of segment intensities
joint estimation of S and color models [Rother et al., SIGGRAPH’04, ICCV’09]
E(S , 0 , 1 )
ln Pr( I p | 0 )
p :S p 0
ln Pr( I p | 1 )
p :S p 1
| S| H ( S| 0 )
w pq [ s p
sq ]
pq N
| S| H ( S| 1 )
min
0, 1
cross-entropy
entropy
Note: H(P|Q) H(P) for any two distributions (equality when Q=P)
entropy of
intensities in
E(S )
| S | H (S )
S
entropy of
intensities in
S
| S | H (S )
wpq [s p
pq N
minimization of segments entropy [Tang et al, ICCV 2013]
sq ]
31. Interpretation of log-likelihoods:
entropy of segment intensities
[Rother et al., SIGGRAPH’04, ICCV’09]
mixed optimization
E(S , 0 , 1 )
ln Pr( I p | 0 )
p :S p 0
ln Pr( I p | 1 )
p :S p 1
| S| H ( S| 0 )
w pq [ s p
sq ]
pq N
| S| H ( S| 1 )
min
0, 1
cross-entropy
entropy
Note: H(P|Q) H(P) for any two distributions (equality when Q=P)
entropy of
intensities in
E(S )
| S | H (S )
S
entropy of
intensities in
S
| S | H (S )
wpq [s p
pq N
binary optimization
[Tang et al, ICCV 2013]
sq ]
32. Interpretation of log-likelihoods:
entropy of segment intensities
E(S , 0 , 1 )
ln Pr( I p | 0 )
p :S p 0
ln Pr( I p | 1 )
p :S p 1
| S| H ( S| 0 )
w pq [ s p
sq ]
pq N
| S| H ( S| 1 )
min
0, 1
cross-entropy
entropy
Note: H(P|Q) H(P) for any two distributions (equality when Q=P)
entropy of
intensities in
E(S )
| S | H (S )
S
entropy of
intensities in
S
| S | H (S )
wpq [s p
pq N
common energy for categorical clustering, e.g. [Li et al. ICML’04]
sq ]
33. Minimizing entropy of segments intensities
(intuitive motivation)
E(S )
| S | H (S )
| S | H (S )
wpq [s p
sq ]
pq N
break image into two coherent segments
with low entropy of intensities
high entropy segmentation
S
low entropy segmentation
S
S
S
S
S
S
unsupervised image segmentation (like in Chan-Vese)
S
34. Minimizing entropy of segments intensities
(intuitive motivation)
E(S )
| S | H (S )
| S | H (S )
wpq [s p
sq ]
pq N
S
break image into two coherent segments
with low entropy of intensities
S
S
more general than Chan-Vese (colors can vary within each segment)
S
35. From entropy
to color consistency
all pixels
i
2
1
i
4
5
3
Minimization of entropy encourages
pixels i of the same color bin i
to be segmented together
(proof: see next page)
36. From entropy
to color consistency
E(S )
| S | H (S )
| S | H (S )
wpq [s p
sq ]
Si = S
pq N
piS ln piS
|S |
i
i
| Si | ln
| Si |
|S |
i
| Si | ln | S |
i
Si
piS ln piS
|S|
| Si | ln
| Si |
|S |
i
| Si | ln | S |
| Si | ln | Si |
i
i
Si
i
S
S
| Si | ln | Si |
i
|S|
|S|
( | Si | ln | Si | | Si | ln | Si | )
| S | ln | S | | S | ln | S |
i
volume
balancing
pixels in each color bin i
prefer to be together
(either inside object
or background)
|S |
color consistency
|S|
i
i
i
37. From entropy
to color consistency
Si = S
segmentation S
with better
color consistency
i
S
S
( | Si | ln | Si | | Si | ln | Si | )
| S | ln | S | | S | ln | S |
i
volume
balancing
pixels in each color bin i
prefer to be together
(either inside object
or background)
|S |
color consistency
|S|
i
i
i
38. From entropy
to color consistency
In many applications, this term
can be either dropped or replaced
with simple unary ballooning
[Tang et al. ICCV 2013]
Si = S
Graph-cut constructions
for similar cardinality terms
(for superpixel consistency)
[Kohli et al. IJCV’09]
convex function of
cardinality |S|
(non-submodular)
i
S
concave function of
cardinality |Si|
(submodular)
S
( | Si | ln | Si | | Si | ln | Si | )
| S | ln | S | | S | ln | S |
i
volume
balancing
pixels in each color bin i
prefer to be together
(either inside object
or background)
|S |
color consistency
|S|
i
i
i
39. From entropy
to color consistency
L1 color separation
works better in practice
[Tang et al. ICCV 2013]
(also, simpler construction)
In many applications, this term
can be either dropped or replaced
with simple unary ballooning
[Tang et al. ICCV 2013]
connect pixels in each color bin
to corresponding auxiliary nodes
convex function of
cardinality |S|
(non-submodular)
|Si|
( | Si | ln | Si | | Si | ln | Si | )
| S | ln | S | | S | ln | S |
i
volume
balancing
wpq [s p
sq ]
pq N
boundary
smoothness
color consistency
|Si|
|S|
i
i
40.
One Cut
[Tang, et al., ICCV’13]
guaranteed global minimum
box segmentation
smoothness + color consistency
connect pixels in each color bin
to corresponding auxiliary nodes
Grabcut is
sensitive to bin size
linear
ballooning
inside the box
41. One Cut
[Tang, et al., ICCV’13]
from seeds
guaranteed global minimum
connect pixels in each color bin
to corresponding auxiliary nodes
saliency-based
segmentation
box segmentation
smoothness + color consistency
linear
ballooning
inside the box
ballooning from
hard constraints
linear
ballooning from
saliency measure
42. photo-consistency +
smoothness + color consistency
Color consistency can be integrated into
binary stereo
photo-consistency+smoothness
+ color consistency
connect pixels in each color bin
to corresponding auxiliary nodes
44. General Trust Region Approach
(overview)
E(S) H(S) B(S)
hard
d
submodular
(easy)
S0
Trust
region
~
min E(S) U0 (S) B(S)
||S S0 || d
1st-order approximation for H(S)
45. General Trust Region Approach
(overview)
• Constrained optimization
~
minimize E(S) U 0 (S) B(S)
s.t. || S S 0 || d
• Unconstrained Lagrangian Formulation
minimize Lλ (S) U0 (S) B(S) λ || S S0 ||
can be approximated with unary terms
[Boykov,Kolmogorov,Cremers,Delong, ECCV’06]
45
46. Approximating L2 distance || S S0 ||
dp - signed distance map from C0
dCs2 ds
dC ,dC
C0
2
d p dp
C
d p (s p s o )
p
2
p
unary potentials [Boykov et al. ECCV 2006]
47. Trust Region
Approximation
Linear approx.
at S0
non-submodular term
ln Pr(I p |
|S|
S0
submodular terms
Sp
)
p
appearance log-likelihoods
wpq [s p
sq ]
pq N
boundary length
volume constraint
trust
region
L2 distance to S0
d p (s p
p
so )
p
S0
submodular
approx.
49. Back to entropy-based segmentation
Approximations
(local minima near the box)
Interactive
segmentation
with box
global minimum
non-submodular term
volume
balancing
submodular terms
color consistency
|S|
boundary smoothness
+
|Si|
i
i
+
wpq [s p
pq N
sq ]
55. Higher-order smoothness & curvature
for discrete regularization
• Geman and Geman 1983
(line process, simulated annealing)
• Second-order stereo and surface reconstruction
– Li & Zuker 2010
– Woodford et al. 2009
– Olsson et al. 2012-13
(loopy belief propagation)
(fusion of proposals, QPBO)
(fusion of planes, nearly submodular)
• Curvature in segmentation:
–
–
–
–
–
–
Schoenemann et al. 2009
(complex, LP relaxation, many extra variables)
Strandmark & Kahl 2011
(complex, LP relaxation,…)
El-Zehiry & Grady 2010
(grid, 3-clique, only 90 degree accurate, QPBO)
Shekhovtsov et al. 2012
(grid patches, approximately learned, QPBO)
Olsson et al. 2013
(grid patches, integral geometry, partial enumeration)
Nieuwenhuis et al 2014? (grid, 3-cliques, integral geometry, trust region)
this talk
good approximation of curvature, better and faster optimization
practical !
56. the rest of the talk:
• Absolute curvature regularization on a grid
[Olsson, Ulen, Boykov, Kolmogorov - ICCV 2013]
• Squared curvature regularization on a grid
[Nieuwenhuis, Toppe, Gorelick, Veksler, Boykov - arXiv 2013]
57. Absolute Curvature
| | ds
S
Motivating example: for any convex shape
| | ds
S
• no shrinking bias
• thin structures
2
58. Absolute Curvature
| | ds
S
n
n
easy to estimate
via approximating
polygons
polygons also work for
[Bruckstein et al. 2001]
p
59. curvature on a cell complex
(standard geometry)
/4
/2
4- or 3-cliques on a cell complex
• Schoenemann et al. 2009
• Strandmark & Kahl 2011
/4
/2
solved via LP relaxations
60. curvature on a cell complex
(standard geometry)
/4
/2
zero gap
cell-patch cliques on a complex
/4
• Olsson et al., ICCV 2013
partial enumeration + TRWS
/2
P1
P2
P3
P4
P5
P6
reduction to pair-wise
Constrain Satisfaction Problem
- new graph: patches are nodes
- curvature is a unary potential
- patches overlap, need consistency
- tighter LP relaxation
61. curvature on a cell complex
(standard geometry)
curvature on a pixel grid
(integral geometry)
A+F+G+H =
2A+B= /2
/4
/2
0
A
A
H
A
B
/4
G
0
0
D
A+C= /4
E
/2
F
representative cell-patches
/4
/2
/2
0
/4
0
D+E+F = /2
representative pixel-patches
/2
0
F
0
C
A
/4
A
B
C
D
E
F
G
H
0
0
0
0
0
0
0
0
65. Nieuwenhuis et al., arXiv 2013
general intuition example
3-cliques
p-
p
S
p+
with configurations
(0,1,0) and (1,0,1)
S
more responses where curvature is higher
66. N
2
n-1
C
…
( s ) ds
| n|
sn
n 1
n 1
3
n
Δ
n+1
…
N
2
n
rn
2
i
1
| n|
sn
i( n )
n+2
N
5x5 neighborhood
N-1
…
C
Δ
Δ
i
i
i
ci
i( n )
67. n
zoom-in
rn
d
r =1/
d3
R( , d ) | |
4
Thus, appropriately weighted 3-cliques estimate squared curvature integral
70. Model is OK on given segments.
But, how do we optimize non-submodular
3-cliques (010) and (101)?
1. Standard trick: convert to non-submodular
pair-wise binary optimization
2. Our observation: QPBO does not work
(unless non-submodular regularization is very weak)
Fast Trust Region [CVPR13, arXiv]
uses local submodular approximations
78. Conclusions
• Optimization of Entropy is a useful informationtheoretic interpretation of color model estimation
• L1 color separation is an easy-to-optimize objective
useful in its own right [ICCV 2013]
• Global optimization matters: one cut [ICCV13]
• Trust region, auxiliary cuts, partial enumeration
General approximation techniques
- for high-order energies [CVPR13]
- for non-submodular energies [arXiv’13]
outperforming state-of-the-art combinatorial optimization methods
Notes de l'éditeur
Grabcut energy is NP-hard and is therefore optimized iteratively, resulting in local minimum.Analysis of the energy shows that it can be decomposed into three terms:Volume Balancing – which is supermodular and therefore NP-hardColor separation – which is submodular and easy to optimize Boundary length – easy to optimize as well.We propose to replace the NP-hard term with application dependent unary termsAnd advocate an even simpler color separation term.The resulting energy is submodular and global minimum is guaranteed in one graph-cut.We show state-of-the art results in several applications.
Grabcut energy is NP-hard and is therefore optimized iteratively, resulting in local minimum.Analysis of the energy shows that it can be decomposed into three terms:Volume Balancing – which is supermodular and therefore NP-hardColor separation – which is submodular and easy to optimize Boundary length – easy to optimize as well.We propose to replace the NP-hard term with application dependent unary termsAnd advocate an even simpler color separation term.The resulting energy is submodular and global minimum is guaranteed in one graph-cut.We show state-of-the art results in several applications.
Grabcut energy is NP-hard and is therefore optimized iteratively, resulting in local minimum.Analysis of the energy shows that it can be decomposed into three terms:Volume Balancing – which is supermodular and therefore NP-hardColor separation – which is submodular and easy to optimize Boundary length – easy to optimize as well.We propose to replace the NP-hard term with application dependent unary termsAnd advocate an even simpler color separation term.The resulting energy is submodular and global minimum is guaranteed in one graph-cut.We show state-of-the art results in several applications.
Grabcut energy is NP-hard and is therefore optimized iteratively, resulting in local minimum.Analysis of the energy shows that it can be decomposed into three terms:Volume Balancing – which is supermodular and therefore NP-hardColor separation – which is submodular and easy to optimize Boundary length – easy to optimize as well.We propose to replace the NP-hard term with application dependent unary termsAnd advocate an even simpler color separation term.The resulting energy is submodular and global minimum is guaranteed in one graph-cut.We show state-of-the art results in several applications.
Grabcut energy is NP-hard and is therefore optimized iteratively, resulting in local minimum.Analysis of the energy shows that it can be decomposed into three terms:Volume Balancing – which is supermodular and therefore NP-hardColor separation – which is submodular and easy to optimize Boundary length – easy to optimize as well.We propose to replace the NP-hard term with application dependent unary termsAnd advocate an even simpler color separation term.The resulting energy is submodular and global minimum is guaranteed in one graph-cut.We show state-of-the art results in several applications.
Iterative optimization algorithm Approximation model is constructed near current solutionThe model is only “trusted” locally within a small radius around the current solution – trust regionThe approximate model is globally optimized within the current trust region – trust region sub-problem.The size of the trust region is adjusted