2. Overview Image and Video Search
Lecture 1 visual search, the problem
color-spatial-textural-temporal features
measures and invariances
Lecture 2 descriptors
words and similarity
where and what
Lecture 3 data and metadata
performance
speed
4. A brief history of television
From broadcasting to narrowcasting
~1955 ~1985 ~2005
…to thin casting
2008
2010
5. Any other purpose than tv?
Surveillance to alert events
Forensics to find evidence / to protect misuse
Social media to sort responses
Safety to prevent terrorism
Agriculture to sort fruit
News to reuse archived footage
Business to have efficient access
eBusiness to mine consumer data
Science to understand visual cognition
Family “I have it somewhere on this disk”
6. How big? The answer from the web
The web is video
8. How big? Answer from the archive
Yearly influx Next 6 years
15.000 hours of video 137.200 hours of video
1 Pbyte per year 22.510 hours of film
2.900.000 photo’s
13. Problem 4: The story of a video
This is the narrative gap
14. Problem 5: No shared intuition
Query-by-keyword
Find shots of people Query-by-concept
shaking hands
Query-by-examples
Query
What sources
Prediction
This is the query-context gap
15. System 1: histogram matching
Histogram as a summary of color characteristics.
This image cannot currently be displayed.
Swain and Ballard, IJCV 1991
16. 1 Conclusion
As content grows, many applications of image search.
Deep cognitive and computer science problems.
With simple means one gets visually simple results.
19. (R,G,B)
∫ e ( λ ) ρ ( λ ) f R ( λ ) dλ
R λ
G = ∫ e ( λ ) ρ ( λ ) f G ( λ ) dλ
B λ
∫ e(λ ) ρ (λ ) f B (λ )dλ
λ
20. (r, g, b) in (R,G,B)
R
r R+G+ B
G
g =
b R + G + B
B
R+G+ B
Independent of shadow!
21. The sensation of spectra
Hue: dominant wavelength λ(EH)
Saturation: purity of the colour (EH - EW)/EH
Intensity: brightness of the colour EW
EH
E
W
“white” “green”
22. The sensation of spectra: opponent
Human perception combines (R,G,B) response
of the eye in opponent colors
R+G + B
Luminance λ
1
BlueYellow = ( R − G )
2 λ
PuperGreen
1 (2 B − R − G ) λ
4
Maximizes perceived contrast!
23. Color Gaussian space
E 0.06 0.63 0.27 R
Eλ = 0.30 0.04 − 0.35 G
E 0.34 − 0.60 0.17 B
λλ
Maximizes information content!
Geusebroek PAMI 2002
26. Taxonomy of diff-image structure
T-junction Junction
Highlight
Corner
These junctions later bring recognition
27. Gabor texture
The 2D Gabor function is:
x2 + y2
1 −
h ( x, y ) = e 2δ 2
e 2πj ( ux + vy )
2πσ 2
Tuning parameters: u, v, σ
Manjunath and Ma on Gabor for texture in Fourier-space
29. Gabor GIST descriptor
Calculate Gabor responses locally
Create histograms as before
Distinguishes things like naturalness, openness,
roughness, expansion, and ruggedness
Slide credit: James Hays and Alexei Efros Olivia IJCV 2001
30. Receptive field in f(x,t)
Gaussian equivalent over x and t:
zero order first order t
Burghouts TIP 2006
31. Gaussians measure differentials
Taylor expansion at x
For discretely sampled signal use the Gaussians
The preferred brand of filters: separable by dimension
rotation symmetric
no new maxima
fast implementations.
32. Receptive fields: overview
All observables up to first order color,
second order spatial scales, eight
frequency bands & first order in t.
33. System 2: Blobworld, textured world
Group blobs based on color and Tamura texture
User specifies query blob and features
System returns images with similar regions
Carson PAMI 2002
34. 2 Conclusion
Powerful features capture uniqueness.
A large set is needed for open-ended search.
The Gauss family is the preferred brand of filters.
Fast recursive implementation:
Geusebroek, Van de Weijer & Smeulders 2002
36. The need for invariance
There are a million appearances to one object
The same part of the same shoe does not have the same
appearance in the image. This is the sensory gap.
Remove unwanted variance as early as you can.
37. Invariance: definition
A feature g is invariant under condition (transform)
caused by accidental conditions at the time of recording,
iff g observed on equal objects and is constant:
39. Color invariance
C = mb (n , s ) ∫ e(λ )cb (λ ) f C (λ )d λ + ms (n , s , v ) ∫ e(λ )cs (λ ) f C (λ ) d λ
λ λ
cb (λ ) surface albedo scene & viewpoint invariant
e(λ ) illumination scene dependent
n object surface normal object shape variant
s illumination direction scene dependent
v viewer’s direction viewpoint variant
f C (λ ) sensor sensitivity scene dependent
41. C is viewpoint invariant
R G
c1 ( R, G, B) = arctan c2 ( R, G, B ) = arctan c3 ( R, G, B
max{G, B} max{R, B}
E space C space
Gevers TIP 2000
42. Hue is viewpoint invariant
3 𝐺−𝐵
𝑅−𝐺 + 𝑅−𝐵
H = arctan , H is a scalar
43. Differential invariants C’, W’, M’
C’ is for matte objects and uneven white light:
Eλλ
E
Cλ = λ Cλλ =
E E
Eλ x E − Eλ E x
Cλx =
E2
W’ is for matte planar objects and even white light:
Ex Eλ x
Wx = Wλx =
E E
M’ is for matte objects and monochromatic light:
Eλ x E − Eλ E x
N λx =
E2 Geusebroek PAMI 2002
44. Retained discrimination
shadows shading highlights ill. intensity ill. color
E - - - - -
H + + + + -
W & W’ - + - + -
C & C’ + + - + -
M & M’ + + - + +
L + + + + -
E 990
H 315
Retained from 1000 colors σ = 3: W’ 995
C’ 850
M’ 900
Geusebroek PAMI 2003
45. 3 Conclusion
Know your variances and invariants.
Good invariant features make algorithms simple.