NIPS2008: tutorial: statistical models of visual images
1. Statistical Image Models
Eero Simoncelli
Howard Hughes Medical Institute,
Center for Neural Science, and
Courant Institute of Mathematical Sciences
New York University
3. Photographic Images
Diverse specialized structures:
ā¢ edges/lines/contours
ā¢ shadows/highlights
ā¢ smooth regions
ā¢ textured regions
Occupy a small region of the full space
4. spa
ce o
f all
ima
ges
typical images
One could describe this set as a
deterministic manifold....
5.
6.
7. ā¢ Step edges are rare (lighting, junctions, texture, noise)
8. ā¢ Step edges are rare (lighting, junctions, texture, noise)
ā¢ One scaleās texture is another scaleās edge
9. ā¢ Step edges are rare (lighting, junctions, texture, noise)
ā¢ One scaleās texture is another scaleās edge
ā¢ Need seamless transitions from isolated features to
dense textures
10. spa
ce o
f all
ima
ges
typical images
One could describe this set as a
deterministic manifold....
11. spa
ce o
f all
ima
ges
typical images
One could describe this set as a
deterministic manifold....
But seems more natural to use probability
12. spa
ce o
f all
ima
ges
typical images
One could describe this set as a
P(x)
deterministic manifold....
But seems more natural to use probability
17. Density models
nonparametric parametric/
constrained
build a histogram use ānatural constraintsā
from lots of (geometry/photometry
observations... of image formation,
computation, maxEnt)
18. Density models
nonparametric parametric/
historical trend constrained
(technology driven)
build a histogram use ānatural constraintsā
from lots of (geometry/photometry
observations... of image formation,
computation, maxEnt)
25. Evolution of image models
I. (1950ās): Fourier + Gaussian
II. (mid 80ās - late 90ās): Wavelets + kurtotic marginals
III. (mid 90ās - present): Wavelets + local context
ā¢ local amplitude (contrast)
ā¢ local orientation
IV. (last 5 years): Hierarchical models
26. a.
Pixel correlation b.
1
Correlation
I(x+2,y)
I(x+4,y)
I(x+1,y)
I(x,y) I(x,y) I(x,y) 0
10
Spatia
27. a.
Pixel correlation b.
1
Correlation
I(x+2,y)
I(x+4,y)
I(x+1,y)
I(x,y) I(x,y) I(x,y) 0
10
b. Spatia
1
Correlation
I(x+4,y)
I(x,y) 0
10 20 30 40
Spatial separation (pixels)
31. Translation invariance
Assuming translation invariance,
=> covariance matrix is Toeplitz (convolutional)
=> eigenvectors are sinusoids
=> can diagonalize (decorrelate) with F.T.
32. Translation invariance
Assuming translation invariance,
=> covariance matrix is Toeplitz (convolutional)
=> eigenvectors are sinusoids
=> can diagonalize (decorrelate) with F.T.
Power spectrum captures full covariance structure
38. Maximum entropy (maxEnt)
The density with maximal entropy satisfying
E (f (x)) = c
is of the form
pME (x) ā exp (āĪ»f (x))
where Ī» depends on c
Examples: f (x) = x 2
f (x) = |x|
41. Gaussian model is weak
Ļ ā2 ā1
1/f2
F F -1
P(x) P(c)
a. b.
F 2 ā1
Ļ F
42. Gaussian model is weak
Ļ ā2 ā1
1/f2
F F -1
P(x) P(c)
a. b.
F 2 ā1
Ļ F
a. b. c.
20 20 4
-20 -20 -4
-20 20 -20 20 -4 4
43. Bandpass Filter Responses
0
10
Response histogram
Gaussian density
Probability
-2
10
-4
10
500 0 500
Filter Response
[Burt&Adelson 82; Field 87; Mallat 89; Daugman 89, ...]
44. āIndependentā Components Analysis
(ICA)
a. b. c. d.
20 20 4 4
-20 -20 -4 -4
-20 20 -20 20 -4 4 -4 4
For Linearly Transformed Factorial (LTF) sources:
guaranteed independence
(with some minor caveats)
[Comon 94; Cardoso 96; Bell/Sejnowski 97; ...]
45. ICA on image blocks
[Olshausen/Field ā96; Bell/Sejnowski ā97]
[example obtained with FastICA, Hyvarinen]
46. Marginal densities
log(Probability)
log(Probability)
log(Probability)
log(Probability)
p = 0.46 p = 0.58 p = 0.48
!H/H = 0.0031 !H/H = 0.0011 !H/H = 0.0014
Wavelet coefficient value Wavelet coefficient value Wavelet coefficient value
Fig. 4. Log histograms of a single wavelet subband of four example images (see Fig. 1 for image
histogram, tails are truncated so as to show 99.8% of the distribution. Also shown (dashed lines) are
corresponding to equation (3). Text indicates the maximum-likelihood value of p used for the ļ¬tte
Well-ļ¬t by a generalized Gaussian:
the relative entropy (Kullback-Leibler divergence) of the model and histogram, as a fraction of th
histogram.
P (x) ā exp ā|x/s| p
non-Gaussian than others. By the mid 1990s, a number
of authors had developed methods of optimizing a ba-
sis of ļ¬lters in order to to maximize the non-Gaussianity
of the responses [e.g., 36, 4]. Often these methods oper-
[Mallat 89; Simoncelli&Adelson 96; Moulin&Liu 99; ...]
ate by optimizing a higher-order statistic such as kurto-
47. Kurtosis vs. bandwidth
16
14
12
Sample Kurtosis
10
8
6
4
0 0.5 1 1.5 2 2.5 3
Filter Bandwidth (octaves)
Note: Bandwidth matters much more than orientation
[see Bethge 06]
[after Field 87]
52. Trouble in paradise
ā¢ Biology: Visual system uses a cascade
- Whereās the retina? The LGN?
- What happens after V1? Why donāt responses get
sparser? [Baddeley etal 97; Chechik etal 06]
53. Trouble in paradise
ā¢ Biology: Visual system uses a cascade
- Whereās the retina? The LGN?
- What happens after V1? Why donāt responses get
sparser? [Baddeley etal 97; Chechik etal 06]
ā¢ Statistics: Images donāt obey ICA source model
- Any bandpass ļ¬lter gives sparse marginals [Baddeley 96]
=> Shallow optimum [Bethge 06; Lyu & Simoncelli 08]
- The responses of ICA ļ¬lters are highly dependent
[Wegmann & Zetzsche 90, Simoncelli 97]
54. Conditional densities
1 1
0.6 0.6
0.2 0.2
-40 0 40 50 -40 0 40
40
0
-40
-40 0 40
Linear responses are not independent, even for optimized ļ¬lters!
[Simoncelli 97; Schwartz&Simoncelli 01]
CSH-02
57. Modeling heteroscedasticity
(i.e., variable variance)
Method 1: Conditional Gaussian
P (xn |{xk }) ā¼ N 0; wnk |xk | + Ļ
2 2
k
[Simoncelli 97; Buccigrossi&Simoncelli 99;
see also ARCH models in econometrics!]
58. Joint densities
adjacent near far other scale other ori
150 150 150 150 150
100 100 100 100 100
50 50 50 50 50
0 0 0 0 0
!50 !50 !50 !50 !50
!100 !100 !100 !100 !100
!150 !150 !150 !150 !150
!100 0 100 !100 0 100 !100 0 100 !500 0 500 !100 0 100
150 150 150 150 150
100 100 100 100 100
50 50 50 50 50
0 0 0 0 0
!50 !50 !50 !50 !50
!100 !100 !100 !100 !100
!150 !150 !150 !150 !150
!100 0 100 !100 0 100 !100 0 100 !500 0 500 !100 0 100
ā¢ Nearby: densities are approximately circular/elliptical
Fig. 8. Empirical joint distributions of wavelet coefļ¬cients associated with different pairs of basis functions, for a single
image of a New York City street scene (see Fig. 1 for image description). The top row shows joint distributions as contour
plots, with lines drawn at equal intervals of log probability. The three leftmost examples correspond to pairs of basis func-
ā¢ Distant: densities are approximately factorial
tions at the same scale and orientation, but separated by different spatial offsets. The next corresponds to a pair at adjacent
scales (but the same orientation, and nearly the same position), and the rightmost corresponds to a pair at orthogonal orien-
tations (but the same scale and nearly the same position). The bottom row shows corresponding conditional distributions:
brightness corresponds to frequency of occurance, except that each column has been independently rescaled to ļ¬ll the full
range of intensities. [Simoncelli, ā97; Wainwright&Simoncelli, ā99]
62. non-Gaussian elliptical observations
and models of natural images:
- Zetzsche & Krieger, 1999;
- Huang & Mumford, 1999;
- Wainwright & Simoncelli, 2000;
- HyvƤrinen and Hoyer, 2000;
- Parra et al., 2001;
- Srivastava et al., 2002;
- Sendur & Selesnick, 2002;
- Teh et al., 2003;
- Gehler and Welling, 2006
- Lyu & Simoncelli, 2008
- etc.
63. Modeling heteroscedasticity
Method 2: Hidden scaling variable for each patch
Gaussian scale mixture (GSM)
[Andrews & Mallows 74]:
ā
x= zu
ā¢ u is Gaussian, z > 0
ā¢ z and u are independent
ā¢ x is elliptically symmetric, with covariance ā Cu
ā¢ marginals of x are leptokurtotic
[Wainwright&Simoncelli 99]
64. GSM - prior on z
ā¢ Empirically, z is approximately lognormal
[Portilla etal, icip-01]
exp (ā(log z ā Āµl )2 /(2Ļl ))
2
pz (z) = 2 )1/2
z(2ĻĻl
ā¢ Alternatively, can use Jeffreyās noninformative prior
[Figueiredo&Nowak, ā01; Portilla etal, ā03]
pz (z) ā 1/z
72. II. BLS for non-Gaussian prior
ā¢ Assume marginal distribution [Mallat ā89]:
P (x) ā exp ā|x/s| p
ā¢ Then Bayes estimator is generally nonlinear:
p = 2.0 p = 1.0 p = 0.5
[Simoncelli & Adelson, ā96]
80. GSM summary
ā¢ GSM captures local variance
ā¢ Underlying Gaussian leads to simple computation
ā¢ Excellent denoising results
ā¢ Whatās missing?
ā¢ Global model of z variables [Wainwright etal 99;
Romberg etal ā99; Hyvarinen/Hoyer ā02; Karklin/
Lewicki ā02; Lyu/Simoncelli 08]
ā¢ Explicit geometry: phase and orientation
81. Global models for z
ā¢ Non-overlapping neighborhoods, tree-structured z
[Wainwright etal 99; Romberg etal ā99]
z
u
Coarse scale
Fine scale
ā¢ Field of GSMs: z is an exponentiated GMRF, u is
a GMRF, subband is the product
[Lyu&Simoncelli 08]
82. D MACHINE INTELLIGENCE, VOL. X, NO. X, XX 200X 9
State-of-the-art denoising
Lena Boats
" "
# #
!" !"
ā()*+,
ā()*+,
!$ !$
!' !'
!& !&
"## ! "#"! $! !# %! "## ! "#"! $! !# %! "##
Ļ Ļ
FoGSM BM3D kSVD
thods for three diļ¬erent images. Plotted are diļ¬erences in PSNR for diļ¬erent input noise levels (Ļ) between
BLS-GSM [17], kSVD [39] and FoE [27]). The PSNR values for these methods were taken from
GSM FoE
[Lyu&Simoncelli, PAMI 08]
86. Multi-scale gradient basis
ā¢ Multi-scale bases: efļ¬cient representation
ā¢ Derivatives: good for analysis
ā¢ Local Taylor expansion of image structures
ā¢ Explicit geometry (orientation)
87. Multi-scale gradient basis
ā¢ Multi-scale bases: efļ¬cient representation
ā¢ Derivatives: good for analysis
ā¢ Local Taylor expansion of image structures
ā¢ Explicit geometry (orientation)
ā¢ Combination:
ā¢ Explicit incorporation of geometry in basis
ā¢ Bridge between PDE / harmonic analysis
approaches
89. Importance of local orientation
Randomized orientation Randomized magnitude
[Hammond&Simoncelli 05]
90. Reconstruction from orientation
Original Quantized to 2 bits
ā¢ Reconstruction by projections onto convex sets
ā¢ Resilient to quantization
[Hammond&Simoncelli 06]
91. Image patches related by rotation
two-band steerable
[Hammond&Simoncelli 06] pyramid coefficients
92. raw rotated
patches patches
PCA of normalized
gradient patches
--- Raw Patches
Rotated Patches
[Hammond&Simoncelli 06]
93. Orientation-Adaptive GSM model
Model a vectorized patch of wavelet coefficients as:
patch rotation operator
hidden magnitude/orientation variables
[Hammond&Simoncelli 06]
94. Orientation-Adaptive GSM model
Model a vectorized patch of wavelet coefficients as:
patch rotation operator
hidden magnitude/orientation variables
Conditioned on ; is zero mean gaussian with covariance
[Hammond&Simoncelli 06]
95. Estimation of C(Īø) from noisy data
noisy patch
unknown, approximate by measured from noisy data.
Assuming independent and noise rotationally invariant
(assuming w.l.o.g. E[z] =1 )
[Hammond&Simoncelli 06]
97. Bayesian MMSE Estimator
condition on and integrate
over hidden variables
[Hammond&Simoncelli 06]
98. Bayesian MMSE Estimator
condition on and integrate
over hidden variables
[Hammond&Simoncelli 06]
99. Bayesian MMSE Estimator
condition on and integrate
over hidden variables
Wiener estimate
[Hammond&Simoncelli 06]
100. Bayesian MMSE Estimator
condition on and integrate
over hidden variables
Wiener estimate
has covariance
separable prior for
hidden variables
[Hammond&Simoncelli 06]
101. Bayesian MMSE Estimator
condition on and integrate
over hidden variables
Wiener estimate
has covariance
separable prior for
hidden variables
[Hammond&Simoncelli 06]
102. Ļ = 40
noisy
2.81 dB
gsm2 oagsm
12.4 dB 13.1 dB
103. Locally adaptive covariance
ā¢ Karklin & Lewicki 08: Each patch is Gaussian,
with covariance constructed from a weighted outer-
product of ļ¬xed vectors:
p(x) = G (x; C(y)) log C(y) = yn Bn
n
T
p(y) = exp(ā|yn |) Bn = wnk bk bk
n k
ā¢ Guerrero-Colon, Simoncelli & Portilla 08: Each
patch is a mixture of GSMs (MGSMs):
p(x) = Pk p(zk ) G(x; zk Ck ) dzk
k
104. MGSMs generative model
ā ā ā
Patch x chosen from { z1 u1 , z2 u2 , ... zK uK }
with probabilities {P1 , P2 , ..., PK }
Parameters:
ā¢ Covariances Ck
ā¢ Scale densities pk (zk )
ā¢ Component probabilities Pk
ā¢ Number of components K
Parameters can be ļ¬t to data of one or more images
by maximizing likelihood (EM-like)
[Guerrero-Colon, Simoncelli, Portilla 08]
105. MGSM āsegmentationā
image 1 2 4
First six
eigenvectors
of GSM
covariance
matrices
[Guerrero-Colon, Simoncelli, Portilla 08]
107. Potential of local homogeneous
models?
Consider an implicit model:
maxEnt subject to constraints on subband coefļ¬cients:
ā¢ marginal statistics [var,skew,kurtosis]
ā¢ local raw correlations
ā¢ local variance correlations
ā¢ local phase correlations
[Portilla & Simoncelli 00;
cf. Zhu, Wu & Mumford 97]
122. Summary
ā¢ Fusion of empirical data with structural principles
ā¢ Statistical models have led to state-of-the-art image
processing, and are relevant for biological vision
ā¢ Local adaptation to {variance, orientation,
phase, ...} gives improvement, but makes learning
harder
ā¢ Cascaded representations emerge naturally
ā¢ Thereās still much room for improvement!
123. Cast
ā¢ Local GSM model: Martin Wainwright, Javier Portilla
ā¢ GSM Denoising: Javier Portilla, Martin Wainwright,
Vasily Strela
ā¢ Variance-adaptive compression: Robert Buccigrossi
ā¢ Local orientation and OAGSM: David Hammond
ā¢ Field of GSMs: Siwei Lyu
ā¢ Mixture of GSMs: Jose-Antonio Guerrero-ColĆ³n,
Javier Portilla
ā¢ Texture representation/synthesis: Javier Portilla