NIPS2008: tutorial: statistical models of visual images

Statistical Image Models

Eero Simoncelli

Howard Hughes Medical Institute,
Center for Neural Science, and
Courant Institute of Mathematical Sciences
New York University

Photographic Images
Diverse specialized structures:
• edges/lines/contours
• shadows/highlights
• smooth regions
• textured regions

Photographic Images
Diverse specialized structures:
• edges/lines/contours
• shadows/highlights
• smooth regions
• textured regions

Occupy a small region of the full space

spa
ce o
f all
ima
ges

typical images

One could describe this set as a
deterministic manifold....

• Step edges are rare (lighting, junctions, texture, noise)

• One scale’s texture is another scale’s edge

• One scale’s texture is another scale’s edge
• Need seamless transitions from isolated features to
dense textures

spa
ce o
f all
ima
ges

typical images

But seems more natural to use probability

spa
ce o
f all
ima
ges

typical images

P(x)
But seems more natural to use probability

“Applications”
• Engineering: compression, denoising, restoration,
enhancement/modiﬁcation, synthesis, manipulation

[Hubel ‘95]

“Applications”
• Engineering: compression, denoising, restoration,
enhancement/modiﬁcation, synthesis, manipulation

• Science: optimality principles for neurobiology (evolution,
development, learning, adaptation)

[Hubel ‘95]

Density models

nonparametric parametric/
constrained

Density models

constrained

build a histogram
from lots of
observations...

Density models

constrained

build a histogram use “natural constraints”
from lots of (geometry/photometry
observations... of image formation,
computation, maxEnt)

Density models

historical trend constrained
(technology driven)

build a histogram use “natural constraints”
from lots of (geometry/photometry
observations... of image formation,
computation, maxEnt)

histogram

Original image

Range: [0, 237]
Dims: [256, 256]

0 50 100 150 200 250

histogram

Original image

Range: [0, 237]
Dims: [256, 256]

0 50 100 150 200 250

histogram

Equalized image

Range: [1.99, 238]
Dims: [256, 256]

0 50 100 150 200 250

General methodology

Observe “interesting” Transform to
Joint Statistics Optimal Representation

General methodology

Observe “interesting” Transform to
Joint Statistics Optimal Representation

“Onion peeling”

Evolution of image models
I. (1950’s): Fourier + Gaussian

II. (mid 80’s - late 90’s): Wavelets + kurtotic marginals

III. (mid 90’s - present): Wavelets + local context
• local amplitude (contrast)
• local orientation
IV. (last 5 years): Hierarchical models

a.
Pixel correlation b.
1

Correlation
I(x+2,y)

I(x+4,y)
I(x+1,y)

I(x,y) I(x,y) I(x,y) 0
10
Spatia

a.
Pixel correlation b.
1

Correlation
I(x+2,y)

I(x+4,y)
I(x+1,y)

I(x,y) I(x,y) I(x,y) 0
10
b. Spatia
1
Correlation
I(x+4,y)

I(x,y) 0
10 20 30 40
Spatial separation (pixels)

Translation invariance

Assuming translation invariance,



=> covariance matrix is Toeplitz (convolutional)




=> eigenvectors are sinusoids





=> can diagonalize (decorrelate) with F.T.





=> can diagonalize (decorrelate) with F.T.

Power spectrum captures full covariance structure

Spectral power
Structural:

Assume scale-invariance:
F (sω) = s F (ω)
p

then:
1
F (ω) ∝ p
ω

[Ritterman 52; DeRiugin 56; Field 87; Tolhurst 92; Ruderman/Bialek 94; ...]

Spectral power
Structural: Empirical:
6

Assume scale-invariance: 5

F (sω) = s F (ω)
p 4

Log power
3

10
then: 2

1
F (ω) ∝ p
ω 1

0
0 1 2 3

Log spatialfrequency (cycles/image)
10

[Ritterman 52; DeRiugin 56; Field 87; Tolhurst 92; Ruderman/Bialek 94; ...]

Principal Components Analysis
(PCA) + whitening
a. b. c.
20 20 4

-20 -20 -4
-20 20 -20 20 -4 4

PCA basis for image blocks

PCA is not unique

Maximum entropy (maxEnt)
The density with maximal entropy satisfying
E (f (x)) = c
is of the form
pME (x) ∝ exp (−λf (x))

where λ depends on c

Examples: f (x) = x 2
f (x) = |x|

Model I (Fourier/Gaussian)
Coefficient
density: Basis set: Image:

:

:

:

:

Gaussian model is weak

ω −2 −1
1/f2
F F -1

P(x) P(c)


ω −2 −1
1/f2
F F -1

P(x) P(c)

a. b.

F 2 −1
ω F


ω −2 −1
1/f2
F F -1

P(x) P(c)

a. b.

F 2 −1
ω F
a. b. c.
20 20 4

-20 -20 -4
-20 20 -20 20 -4 4

Bandpass Filter Responses
0
10
Response histogram
Gaussian density
Probability

-2
10

-4
10
500 0 500
Filter Response

[Burt&Adelson 82; Field 87; Mallat 89; Daugman 89, ...]

“Independent” Components Analysis
(ICA)
a. b. c. d.
20 20 4 4

-20 -20 -4 -4
-20 20 -20 20 -4 4 -4 4

For Linearly Transformed Factorial (LTF) sources:
guaranteed independence
(with some minor caveats)

[Comon 94; Cardoso 96; Bell/Sejnowski 97; ...]

ICA on image blocks

[Olshausen/Field ’96; Bell/Sejnowski ’97]
[example obtained with FastICA, Hyvarinen]

Marginal densities
log(Probability)

log(Probability)

log(Probability)

log(Probability)
p = 0.46 p = 0.58 p = 0.48
!H/H = 0.0031 !H/H = 0.0011 !H/H = 0.0014

Wavelet coefficient value Wavelet coefficient value Wavelet coefficient value

Fig. 4. Log histograms of a single wavelet subband of four example images (see Fig. 1 for image
histogram, tails are truncated so as to show 99.8% of the distribution. Also shown (dashed lines) are
corresponding to equation (3). Text indicates the maximum-likelihood value of p used for the fitte
Well-fit by a generalized Gaussian:
the relative entropy (Kullback-Leibler divergence) of the model and histogram, as a fraction of th
histogram.

P (x) ∝ exp −|x/s| p
non-Gaussian than others. By the mid 1990s, a number
of authors had developed methods of optimizing a ba-
sis of filters in order to to maximize the non-Gaussianity
of the responses [e.g., 36, 4]. Often these methods oper-
[Mallat 89; Simoncelli&Adelson 96; Moulin&Liu 99; ...]
ate by optimizing a higher-order statistic such as kurto-

Kurtosis vs. bandwidth
16

14

12
Sample Kurtosis

10

8

6

4

0 0.5 1 1.5 2 2.5 3
Filter Bandwidth (octaves)

Note: Bandwidth matters much more than orientation
[see Bethge 06]
[after Field 87]

Octave-bandwidth representations

Spatial
Frequency
Selectivity:

Filter:

Model II (LTF)
Coefficient
density: Basis set: Image:

:

:

:

LTF also a weak model...

Sample Gaussianized

Sample ICA-transformed
and Gaussianized

Trouble in paradise
• Biology: Visual system uses a cascade
- Where’s the retina? The LGN?
- What happens after V1? Why don’t responses get
sparser? [Baddeley etal 97; Chechik etal 06]

Trouble in paradise
• Biology: Visual system uses a cascade
- Where’s the retina? The LGN?
- What happens after V1? Why don’t responses get
sparser? [Baddeley etal 97; Chechik etal 06]

• Statistics: Images don’t obey ICA source model
- Any bandpass ﬁlter gives sparse marginals [Baddeley 96]
=> Shallow optimum [Bethge 06; Lyu & Simoncelli 08]
- The responses of ICA ﬁlters are highly dependent
[Wegmann & Zetzsche 90, Simoncelli 97]

Conditional densities
1 1

0.6 0.6

0.2 0.2

-40 0 40 50 -40 0 40

40

0

-40

-40 0 40

Linear responses are not independent, even for optimized ﬁlters!
[Simoncelli 97; Schwartz&Simoncelli 01]
CSH-02

• Large-magnitude subband coefﬁcients are found at
neighboring positions, orientations, and scales.

Modeling heteroscedasticity
(i.e., variable variance)

Method 1: Conditional Gaussian

P (xn |{xk }) ∼ N 0; wnk |xk | + σ
2 2

k

[Simoncelli 97; Buccigrossi&Simoncelli 99;
see also ARCH models in econometrics!]

Joint densities
adjacent near far other scale other ori
150 150 150 150 150

100 100 100 100 100

50 50 50 50 50

0 0 0 0 0

!50 !50 !50 !50 !50

!100 !100 !100 !100 !100

!150 !150 !150 !150 !150

!100 0 100 !100 0 100 !100 0 100 !500 0 500 !100 0 100

150 150 150 150 150

100 100 100 100 100

50 50 50 50 50

0 0 0 0 0

!50 !50 !50 !50 !50

!100 !100 !100 !100 !100

!150 !150 !150 !150 !150

!100 0 100 !100 0 100 !100 0 100 !500 0 500 !100 0 100

• Nearby: densities are approximately circular/elliptical
Fig. 8. Empirical joint distributions of wavelet coefﬁcients associated with different pairs of basis functions, for a single
image of a New York City street scene (see Fig. 1 for image description). The top row shows joint distributions as contour
plots, with lines drawn at equal intervals of log probability. The three leftmost examples correspond to pairs of basis func-

• Distant: densities are approximately factorial
tions at the same scale and orientation, but separated by different spatial offsets. The next corresponds to a pair at adjacent
scales (but the same orientation, and nearly the same position), and the rightmost corresponds to a pair at orthogonal orien-
tations (but the same scale and nearly the same position). The bottom row shows corresponding conditional distributions:
brightness corresponds to frequency of occurance, except that each column has been independently rescaled to ﬁll the full
range of intensities. [Simoncelli, ‘97; Wainwright&Simoncelli, ‘99]

ICA-transformed joint densities
d=2 d=16 d=32

12 12 12

10 10 10
kurtosis

8 8 8

6 6 6

4 4 4
0 !/4 !/2 3!/4 ! 0 !/4 !/2 3!/4 ! 0 !/4 !/2 3!/4 !
orientation
data (ICA’d): sphericalized: factorialized:

ICA-transformed joint densities
d=2 d=16 d=32

12 12 12

10 10 10
kurtosis

• Local densities are elliptical (but non-Gaussian)
8 8 8

6 6 6

• Distant densities are factorial
4 4 4
0 !/4 !/2 3!/4 ! 0 !/4 !/2 3!/4 ! 0 !/4 !/2 3!/4 !
orientation
data (ICA’d): [Wegmann&Zetzsche ‘90; Simoncelli ’97; + many recent models]
sphericalized: factorialized:

Spherical vs LTF
0.2
blk blk 0.4 blk
blk size = 3x3 0.2
blk size = 7x7 blk size = 11x11
spherical spherical spherical
factorial factorial 0.35 factorial
0.15
0.3
0.15
0.25
0.1
0.2
0.1
0.15
0.05 0.1
0.05
0.05
0 0 0
3 6 9 12 15 18 20 3 6 9 12 15 18 20 3 6 9 12 15 18 20
kurtosis kurtosis kurtosis

3x3 7x7 15x15
data (ICA’d): sphericalized: factorialized:

• Histograms, kurtosis of projections of image blocks onto random
unit-norm basis functions.
• These imply data are closer to spherical than factorial
[Lyu & Simoncelli 08]

non-Gaussian elliptical observations
and models of natural images:

- Zetzsche & Krieger, 1999;
- Huang & Mumford, 1999;
- Wainwright & Simoncelli, 2000;
- Hyvärinen and Hoyer, 2000;
- Parra et al., 2001;
- Srivastava et al., 2002;
- Sendur & Selesnick, 2002;
- Teh et al., 2003;
- Gehler and Welling, 2006
- Lyu & Simoncelli, 2008
- etc.

Modeling heteroscedasticity
Method 2: Hidden scaling variable for each patch
Gaussian scale mixture (GSM)
[Andrews & Mallows 74]:
√
x= zu

• u is Gaussian, z > 0
• z and u are independent
• x is elliptically symmetric, with covariance ∝ Cu
• marginals of x are leptokurtotic
[Wainwright&Simoncelli 99]

GSM - prior on z
• Empirically, z is approximately lognormal
[Portilla etal, icip-01]

exp (−(log z − µl )2 /(2σl ))
2
pz (z) = 2 )1/2
z(2πσl

• Alternatively, can use Jeffrey’s noninformative prior
[Figueiredo&Nowak, ‘01; Portilla etal, ‘03]

pz (z) ∝ 1/z

GSM simulation
Image data GSM simulation
! !
#" #"

" "
#" #"
!!" " !" !!" " !"

[Wainwright & Simoncelli, NIPS*99]

Model III (GSM)
Coefficient density: Basis set: Image:

X X

X X

sqrt(z)
X X

u

√
Original coefﬁcients Normalized by z
!2
!4

!4 !5

marginal

Log probability

Log probability
!6
!6
!7
[Ruderman&Bialek 94]
!8 !8

!9
!10
!500 0 500 !10
!5 0 5

100 8

50 6

joint 0 4 [Schwartz&Simoncelli 01]
!50 2

!100 0
!100 !50 0 50 100 0 2 4 6 8

subband

6
Model Encoding Cost (bits/coeff)

Model Encoding cost (bits/coeff)
5.5

5
5

4.5 4

4
3
3.5

3 2
Gaussian Model First Order Ideal
2.5 Generalized Laplacian Conditional Model

1
3 4 5 1 2 3 4 5 6
Empirical First Order Entropy (bits/coeff) Empirical Conditional Entropy

[Buccigrossi & Simoncelli 99]

Bayesian denoising
• Additive Gaussian noise:
y =x+w
2 2
P (y|x) ∝ exp[−(y − x) /2σw ]

• Bayes’ least squares solution is conditional mean:
x(y) = IE(x|y)
ˆ
= dxP(y|x)P(x)x/P(y)

I. Classical

If signal is Gaussian, BLS estimator is linear:

denoised (ˆ)
x
2
σx
x(y) =
ˆ 2 2
·y
σx + σn

=> suppress ﬁne scales, noisy (y)
retain coarse scales

Non-Gaussian coefﬁcients
"
#"
-*./01.*,6'.)07+48
94:..'41,;*1.')5,,
2+0343'(')5

!%
#"

!$
#"
!!"" " !""
&'()*+,-*./01.*

[Burt&Adelson ‘81; Field ‘87; Mallat ‘89; Daugman ‘89; etc]

II. BLS for non-Gaussian prior
• Assume marginal distribution [Mallat ‘89]:

P (x) ∝ exp −|x/s| p

• Then Bayes estimator is generally nonlinear:

p = 2.0 p = 1.0 p = 0.5
[Simoncelli & Adelson, ‘96]

MAP shrinkage

p=2.0 p=1.0 p=0.5

[Simoncelli 99]

ESTIMATED COEFF. Example estimators

!"

+'1&2/1+3)*%+,,-
"
!w

!!"
#"
!"
"
"
NOISY COEFF. !#"
$%&'()./0+$1 !!" $%&'()*%+,,-

Estimators for the scalar and single-neighbor cases
[Portilla etal 03]

Comparison to other methods
"'& "'&
" "
,456748+91:;<=
/456,(74-.)/-0123

!"'& !"'& :>6965#8*>?6<
!! !!
!!'& !!'&
!# !#
!#'& !#'&
)89!:1:;<=>?
!$ .=9@A?-9?=@BC8D !$
!$'& !$'&
!" #" $" %" &" !" #" $" %" &"
()*+,-*.)/-0123 ()*+,-*.)/-0123

Results averaged over 3 images
[Portilla etal 03]

Noisy
Original
(22.1 dB)

Matlab’s
BLS-GSM
wiener2
(30.5 dB)
(28 dB)

Noisy
Original
(8.1 dB)

UndWvlt
BLS-GSM
Thresh
(21.2 dB)
(19.0 dB)

Real sensor noise

400 ISO denoised

GSM summary
• GSM captures local variance
• Underlying Gaussian leads to simple computation
• Excellent denoising results
• What’s missing?
• Global model of z variables [Wainwright etal 99;
Romberg etal ‘99; Hyvarinen/Hoyer ‘02; Karklin/
Lewicki ‘02; Lyu/Simoncelli 08]

• Explicit geometry: phase and orientation

Global models for z
• Non-overlapping neighborhoods, tree-structured z
[Wainwright etal 99; Romberg etal ’99]

z

u
Coarse scale

Fine scale

• Field of GSMs: z is an exponentiated GMRF, u is
a GMRF, subband is the product
[Lyu&Simoncelli 08]

D MACHINE INTELLIGENCE, VOL. X, NO. X, XX 200X 9

State-of-the-art denoising
Lena Boats
" "

# #

!" !"
∆()*+,

∆()*+,
!$ !$

!' !'

!& !&

"## ! "#"! $! !# %! "## ! "#"! $! !# %! "##
σ σ

FoGSM BM3D kSVD
thods for three different images. Plotted are differences in PSNR for different input noise levels (σ) between
BLS-GSM [17], kSVD [39] and FoE [27]). The PSNR values for these methods were taken from
GSM FoE
[Lyu&Simoncelli, PAMI 08]

Measuring
Orientation

2-band steerable pyramid: Image decomposition in
terms of multi-scale gradient measurements
[Simoncelli et.al., 1992; Simoncelli & Freeman 1995]

Multi-scale gradient basis
• Multi-scale bases: efﬁcient representation

• Derivatives: good for analysis
• Local Taylor expansion of image structures
• Explicit geometry (orientation)

• Derivatives: good for analysis
• Local Taylor expansion of image structures
• Explicit geometry (orientation)
• Combination:
• Explicit incorporation of geometry in basis
• Bridge between PDE / harmonic analysis
approaches

orientation

magnitude

orientation

[Hammond&Simoncelli 06; cf. Oppenheim and Lim 81]

Importance of local orientation
Randomized orientation Randomized magnitude

[Hammond&Simoncelli 05]

Reconstruction from orientation
Original Quantized to 2 bits

• Reconstruction by projections onto convex sets
• Resilient to quantization

Image patches related by rotation

two-band steerable
[Hammond&Simoncelli 06] pyramid coefficients

raw rotated
patches patches
PCA of normalized
gradient patches

--- Raw Patches
Rotated Patches


Orientation-Adaptive GSM model

Model a vectorized patch of wavelet coefficients as:

patch rotation operator

hidden magnitude/orientation variables


Orientation-Adaptive GSM model

Model a vectorized patch of wavelet coefficients as:

patch rotation operator

hidden magnitude/orientation variables

Conditioned on ; is zero mean gaussian with covariance


Estimation of C(θ) from noisy data
noisy patch

unknown, approximate by measured from noisy data.
Assuming independent and noise rotationally invariant

(assuming w.l.o.g. E[z] =1 )


Bayesian MMSE Estimator


condition on and integrate
over hidden variables



Wiener estimate



Wiener estimate

has covariance

separable prior for
hidden variables


σ = 40

noisy
2.81 dB

gsm2 oagsm
12.4 dB 13.1 dB

Locally adaptive covariance
• Karklin & Lewicki 08: Each patch is Gaussian,
with covariance constructed from a weighted outer-
product of ﬁxed vectors:
p(x) = G (x; C(y)) log C(y) = yn Bn
n
T
p(y) = exp(−|yn |) Bn = wnk bk bk
n k

• Guerrero-Colon, Simoncelli & Portilla 08: Each
patch is a mixture of GSMs (MGSMs):

p(x) = Pk p(zk ) G(x; zk Ck ) dzk
k

MGSMs generative model
√ √ √
Patch x chosen from { z1 u1 , z2 u2 , ... zK uK }

with probabilities {P1 , P2 , ..., PK }
Parameters:
• Covariances Ck
• Scale densities pk (zk )
• Component probabilities Pk
• Number of components K

Parameters can be ﬁt to data of one or more images
by maximizing likelihood (EM-like)
[Guerrero-Colon, Simoncelli, Portilla 08]

MGSM “segmentation”
image 1 2 4

First six
eigenvectors
of GSM
covariance
matrices
[Guerrero-Colon, Simoncelli, Portilla 08]

MGSM
“segmentation”
Eigenvectors of GSM
components represent
invariant subspaces:
“generalized complex
cells”

Potential of local homogeneous
models?
Consider an implicit model:
maxEnt subject to constraints on subband coefﬁcients:

• marginal statistics [var,skew,kurtosis]
• local raw correlations
• local variance correlations
• local phase correlations
[Portilla & Simoncelli 00;
cf. Zhu, Wu & Mumford 97]

Visual texture

Homogeneous, with repeated structures

Visual texture

Homogeneous, with repeated structures
“You know it when you see it”

All Images

Texture Images

Equivalence class (visually indistinguishable)

Iterative synthesis algorithm
Analysis

Example Transform Measure
Texture Statistics

Synthesis Measure
Statistics

Random Transform Inverse Synthesized
Adjust
Seed Transform Texture

[Portilla&Simoncelli 00; cf. Heeger&Bergen ‘95]

Texture mixtures

Convex combinations in parameter space

Texture mixtures

Convex combinations in parameter space
=> Parameter space includes non-textures

Summary
• Fusion of empirical data with structural principles
• Statistical models have led to state-of-the-art image
processing, and are relevant for biological vision

• Local adaptation to {variance, orientation,
phase, ...} gives improvement, but makes learning
harder

• Cascaded representations emerge naturally
• There’s still much room for improvement!

Cast
• Local GSM model: Martin Wainwright, Javier Portilla
• GSM Denoising: Javier Portilla, Martin Wainwright,
Vasily Strela

• Variance-adaptive compression: Robert Buccigrossi
• Local orientation and OAGSM: David Hammond
• Field of GSMs: Siwei Lyu
• Mixture of GSMs: Jose-Antonio Guerrero-Colón,
Javier Portilla

• Texture representation/synthesis: Javier Portilla

NIPS2008: tutorial: statistical models of visual images

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (6)

Similar to NIPS2008: tutorial: statistical models of visual images

Similar to NIPS2008: tutorial: statistical models of visual images (20)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

NIPS2008: tutorial: statistical models of visual images