3. Evolution of medical images
• 1895, Conrad Röntgen discovers
X–rays
• Approximately 100 years later:
anatomical, functional, motion
• Any aspect can be visualized and
quantified
Imaging
modalities
Microscopy
Visible light
Magnetic
resonance
X–Rays
Nuclear
imaging
Ultrasound
5
4. Use of medical images
Geneva
University
Hospitals
during 2012
Magnetic
resonance
X–Rays
Nuclear
imaging
30,645 CT exams
12,819 MRI exams
1,426 PET exams
30%of world
storage
capacity
* estimation
6
5. Dimensions of medical images
2D
2D + time
3D
3D + time
3D + other
E.g.: dermatography,
radiography, angiography.
E.g.: echography,
endoscopy.
E.g.: CT, MRI, PET.
E.g.: functional MRI.
E.g.: Dual Energy CT.
7
6. Computer Aided Tools
• Multimodal information
• Partly annotated
• Multidimensional
HOW
to make sense?
CAD
CBIR
9
7. Visual features
High di-
mensional
approaches
Shape de-
scription
Point–
based
Surface–
based
Topology–
based
Full–
support
descrip-
tion
Geometry–
based
Spectral–
based
Statistical
& stochas-
tic meth-
ods
Video
specific
methods
Low di-
mensional
approaches
Spin
images
Silhouettes
and depth
images
Slice &
frame
analysis
10
8. Visual similarity
Ii = log 1
Pi
• Information:
• Specific definition
• Low level features
• Similarity
• General definition
• Higher level concepts (semantic
gap)
11
9. Bag of visual words
• BoVW aims at shortening the semantic gap
• Consists of:
1. Partition a n–dimensional feature space into K disjoint regions
2. Measure features at m sampling points of an image
3. Assign each sample to one of the K regions
4. K–bin histogram is the image descriptor
12
10. Scientific contributions
Feature ex-
traction and
modelling using
BOVW
Multiscale
texture
descriptors
Multiscale
analysis
of ROIs
Optimal
Vocabu-
lary Size
Optimal
Bag length
Optimal vo-
cabularies
in DECT
Vocabulary
Pruning
Language
modelling
Ground
truth
generation
14
11. Section outline
Motivation and introduction
Technical contributions
Multi–scale texture description
A visual grammar
ROI detector
Experiments
Concluding remarks
15
13. Multi–scale texture description
Texture
The feel, appearance or consistency of a surface or a substance.
— Oxford Dictionaries
Texture contains important information about the structural
arrangement of surfaces and their relationship to the
surrounding environment.
— Haralick et al.
18
14. Wavelet analysis
Wavelet analysis
ψs,τ(t) =
1
s
ψ
t − τ
s
Ψs,τ(ω) =
1
s
|s| Ψ(sω)e−jωτ
• ψ(t) must be zero mean
• Ψ(ω) is a bandpass filter
• Finite set of scale parameters s
• Scaling function ϕ(t) used to
cover the low frequencies
19
16. Isotropic wavelet analysis
• Gaussian–based functions to analyze isotropic image texture
• Difference of Gaussians is an approximation to Laplacian of Gaussians
(Mexican Hat)
Difference of Gaussians
gσ(x) =
1
σxσyσz (2π)3
e
−
(xδx)2
2σ2
x
+
(yδy)2
2σ2
y
+
(zδz)2
2σ2
z
ψj(x) = gσ1 (x) − gσ2 (x)
σ2 = 1.6σ1
21
17. Riesz transform
• Multidimensional extension of the Hilbert transform
• Steerable
Nth order 3D Riesz transform
R(n1,n2,n3)f(ω) =
n1 + n2 + n3
n1!n2!n3!
(−jω1)n1 (−jω2)n2 (−jω3)n3
||ω||n1+n2+n3
ˆf(ω)
for all combinations of (n1, n2, n3) with n1 + n2 + n3 = N and n1,2,3 ∈ N.
N+2
2
templates
R(n1,n2,n3)
22
19. Beyond bag of visual words
• Widely used
• Strong performance variation
• Clustering:
• Large clusters, small vocabularies
• Small clusters, large vocabularies
Language
modelling of
BOVW
Vocabulary
Size
Meaning
Word
to word
relations
25
20. From words to grammar
Grammar
The whole system and structure of a language or of languages in
general, usually taken as consisting of syntax and morphology
(including inflections) and sometimes also phonology and semantics.
— Oxford Dictionaries
26
21. From words to grammar
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
xx
x
xx x
x
x
x
x
x
xx
x
x
x
x
x
x x
x
x
x
x
x
x
x
x
x x
x
x
xx
x
x
x
x xx x
x
x
xx
x
xx
x
x
x
x
x
x
x
x x
x x
x
x
x x
xx
x
x
x x
x
x xx
x xx xx
x
x
x
x x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x x
x
x
x
x
x
x
xx
xx
x
x
x
x
xx
x
x
xx
x xx
x
x
x
x
xxx
x
x
x
x
x
x x
x
x
x
xx xx
x
x
x
x
x
x
x
x
x
x
x
Visual
Grammar
Meaning
Synonymy
Polysemy
27
22. Visual topics
PLSA–based definition
A visual topic is an unobserved or latent variable z ∈ Z = z1, . . . , zNZ so that the
probability of observing the word wn in the visual instance Ii:
P(wn, Ii) =
NZ
j=1
P(wn|zj)P(zj|Ii).
P(zj|Ii) P(wn|zj)
WNW×NZ
image topics visual words
28
23. The word-topic matrix
WNW×NZ =
P(w1|z1) · · · P(w1|zNZ )
P(w2|z1) · · · P(w2|zNZ )
.
.
.
...
.
.
.
P(wNW |z1) · · · P(wNW |zNZ )
→
t1,1 · · · t1,NZ
t2,1 · · · t2,NZ
.
.
.
...
.
.
.
tNW,1 · · · tNW,NZ
• Rows: relevant topics for a word
• Columns: relevant words for a topic
• Use the ratio of words as a topic–based significance tn,j:
• tn,j = 1 → the most significant word for the topic
• tn,j = 1/NW → the least significant word for the topic
29
24. Visual meaningfulness
Definition
The visual meaningfulness of a visual word wn is its maximum topic–based
significance level:
mn =
maxj tn,j if maxj tn,j ≥ Tmeaning
0 otherwise
• Words below the meaningfulness threshold can be truncated.
30
26. Word to word relations
Example
• A single class might have several
visual appearances
• Several classes might partially
share the visual appearance
• Two visual words with the same
meaning, belong to different
histogram bins and cannot be
compared
• Identifying synonymy allows to
compare these words
bimodal class
partially shared
appearance
word 1
word 2
word 3
word 4
word 5
32
27. Synonymy graphs
• Word 3 is partially linked to words
1 and 2
• Words 4 and 5 are also linked
word 1
word 2
word 3word 4
word 5
33
28. Visual synonymy
Definition
A pair of visual words wn, wm can be considered synonyms if the following
three conditions are met:
1. There is at least one visual topic zj to which both wn and wm belong.
2. wn and wm have a similar contextual distribution with the rest of the
words.
3. wn and wm have a complementary distribution in the collection.
34
29. Synonymy value
Definition
The synonymy value of two words wn, wm is the maximum significance value
for which both words are significant for the same visual topic.
t1,1 · · · t1,NZ
t2,1 · · · t2,NZ
...
...
...
tNW,1 · · · tNW,NZ
σnm = σmn = max
j
min
n,m
tn,j, tm,j
35
31. Word ambiguity and dimensionality
• Some visual words are sources
of ambiguity if they relate to
various appearances
• Their presence in the histogram is
not discriminative
• Possible solution: identify
polysemy and reduce their
weight
topic A topic B
37
32. Visual polysemy
Definition
A visual word wn is polysemic in strict sense if all the following conditions are
met:
1. wn if there are at least two visual topics zj, zk to which the visual word
belongs (wide sense polysemy)
2. There is a visual word wm, which is a synonym of wn and belongs to the
topic zj
3. There is a visual word wl, which is a synonym of wn and belongs to the
topic zj
4. wm, wl are not synonyms
38
33. Polysemy threshold
Definition
The polysemy threshold of a visual word wn, Tn
polysemy, is the largest value
that satisfies that there are at least two topics for which the word is
significant above the threshold:
t1,1 · · · t1,NZ
t2,1 · · · t2,NZ
...
...
...
tNW,1 · · · tNW,NZ
tn,j ≥ Tn
polysemy
≥ 2; ∀j = 1, . . . , NZ
39
36. Section outline
Motivation and introduction
Technical contributions
Multi–scale texture description
A visual grammar
ROI detector
Experiments
Concluding remarks
42
37. Local analysis
• Medical images contain large
amounts of information
• Abnormalities and clinically
relevant patterns occur only in
reduced regions of interest
• Local context description :
• Dense sampling
• Keypoint–based analysis
43
38. Geodesic detection of regional extrema
1. Multi–scale difference of
Gaussians relates to saliency
2. Use geodesic operations to
obtain regional extrema:
2.1 Fill hole / grind peak
2.2 Substract from the original DoG
image
2.3 Label each fully connected
component larger than a
structuring element.
44
39. Section outline
Motivation and introduction
Technical contributions
Experiments
Texture analysis of 2D lung CT
Texture analysis of 3D brain MRI
Texture analysis of 4D lung CT
Texture analysis of 4D lung CT using ROIs
Visual grammar for description of 2D images
Visual grammar for description of 3D medical images
Concluding remarks
45
40. Texture analysis of 2D lung CT
• Interstitial lung diseases
• TALISMAN dataset acquired at
Geneva University Hospitals
• 90 HRCT scans from 85 patients
• 1679 annotated regions
• 6 classes
• fibrosis
• ground glass
• emphysema
• micronodules
• healthy tissue
• consolidation
47
41. Texture analysis of 2D lung CT
k-means
clustering
visual vocabulary
word-1 = (f11,f12,...,f1N)
word-2= (f21,f22,...,f2N)
...
word-k= (fk1,fk2,...,fkN)
4 scales
Wavelet
Transform
Energy of
Coe cients
Dataset
Histogram of
visual words
for each region
k-dimensional
discrete feature space
48
42. Texture analysis of 2D lung CT
• Optimal number of visual
words between 100 and
300
• Overall performance
decreases with larger
vocabularies
Keep only meaningful words
0 50 100 150 200 250 300 350 400 450 500
20
30
40
50
60
70
80
Number of Visual Words
P@1(%)
Consolidation
Emphysema
Fibrosis
Ground Glass
Healthy
Micronodules
Geometric mean
49
43. Texture analysis of 3D brain MRI
• Texture–based segmentation of the
cerebellum
• IBSR dataset provided by MGH.
• MRI from 18 adult subjects
• Manual segmentations
• Cerebellum cortex
• Cerebellum white matter
51
44. Texture analysis of 3D brain MRI
Training Set
Testing Set
Histogram
Equalization
5 Scales
DoG 3D Wavelet
k-means
Clustering
NxNxN block
visual words
histogram
Nearest Neighbor
Search
Visual word
assignment
Visual Words
Histograms
Feature Space
Training Set
Training Set
Training Set
Testing Set
PREPROCESSING
FEATURE EXTRACTION
CLASSIFICATION
Visual
Vocabulary
52
45. Texture analysis of 3D brain MRI
• Performance improves with larger
block sizes
• Rest of brain
• Cerbellum cortex
• Performance does not improve
• Cerebellum white matter
Data–driven regions of interest
53
46. Texture analysis of 4D lung CT
• Pulmonary embolism retrieval
• Dual Energy CT dataset acquired
at Geneva University Hospitals
• 25 patients
• 4D data
• x,y,z
• Energy level of acquisition
• Ground truth
• Severity (Qanadli index)
• Lobe based
55
47. Texture analysis of 4D lung CT
k-means
clustering
55-dimensional
continuous feature space
visual vocabulary
word-1 = (f11,f12,...,f1N)
word-2= (f21,f22,...,f2N)
...
word-k= (fk1,fk2,...,fkN)
voxeli = closest word
5 scales
Wavelet
Transform
Energy of
Coefficients
.
.
.
Energy level 1 Energy level 11
Histogram of
visual words
for each lobe
voxeli = (fi1,fi2,..,fiN)
Lung lobes mask
1
5
4
3
2
k-dimensional
discrete feature space
56
48. Texture analysis of 4D lung CT
• Performance improves with
4D data
• 63% for P@1
• 62% for P@5
• 60% for P@10
• Optimal configuration
• 2 scales, 100–150 words
• Intensive computation
• High dimensional feature
space
Analyze only part of the data:
ROIs and meaningful words.
Words Scales P@1(%) P@5(%) P@10(%)
50 1 55 56 56
100 1 58 55 57
150 1 58 56 56
50 2 62 58 55
100 2 62 62 60
150 2 63 62 60
50 3 58 54 55
100 3 60 59 58
150 3 57 62 58
50 5 45 52 51
100 5 57 52 51
150 5 58 52 52
57
49. Texture analysis of 4D lung CT using ROIs
• Pulmonary embolism detection
• Improvements over previous
approaches
• ROI–based analysis
• Optimal combination of
energy–based vocabularies
59
50. Texture analysis of 4D lung CT using ROIs
• Improvements in performance
• Optimal combination of
energy–based vocabularies
• Multi–scale regions of interest
Finer–grain analysis of significant
words and synergies among them
Lobe DECT Words Energy levels SECT
LR 84 % 5 (50,130) 52 %
LL 84 % 5 (100,140) 48 %
MR 80 % 5 (40,50,130,140) 52 %
UL 76 % 25 (40,70,80,90) 60 %
UR 80 % 25 (90,120) 56 %
60
51. Visual grammar for description of 2D images
• Classification and retrieval of
images from the biomedical
literature
• ImageCLEFmed modality
classification task
• 1000 training and 1000 test
images
• 31 hierarchical categories
62
52. Visual grammar for description of 2D images
• SIFT–based visual vocabularies
• Varying number of visual topics
from 25 to 350 in steps of 25
• Varying meaningfulness threshold
from 50% to 100%
63
53. Visual grammar for description of 2D images
• Statistically significant
improvement over state-of-the art
baseline
• Vocabulary reductions without
effect on the accuracy
• Up to 20% of the original
vocabulary size
Analyze synonymy relations among
multiple vocabularies
0 50 100 150 200 250 300 350 400 450 500
20
25
30
35
40
45
50
55
60
65
Effective number of visual words
Classificationaccuracy(%)
Baseline Grammar Statistical significance threshold
64
54. Visual grammar for description of 3D medical images
• Organ identification task
• VISCERAL dataset
• Full body CT scans
• 15 Contrast–enhanced
• 15 Not enhanced
• 10 anatomical structures, 8 classes
66
55. Visual grammar for description of 3D medical images
• Riesz–based texture features
• 3 scales
• Riesz order 2.
• Organ–specific vocabularies
• 1000 random samples within the organ
• 20 visual words per organ
• Visual Grammar transformation
67
56. Visual grammar for description of 3D medical images
• Good results for organ
identification
• Reduction of vocabulary
size with respect to
baseline without visual
grammar
68
57. Visual grammar for description of 3D medical images
• Good results for organ
identification
• Reduction of vocabulary
size with respect to
baseline without visual
grammar
0 20 40 60 80 100 120 140 160 180 200
10
20
30
40
50
60
70
80
Vocabulary Size
ClassificationAccuracy(%)
68
59. Conclusions
Feature ex-
traction and
modelling using
BOVW
Multiscale
texture
descriptors
Multiscale
analysis
of ROIs
Optimal
Vocabu-
lary Size
Optimal
Bag length
Optimal vo-
cabularies
in DECT
Vocabulary
Pruning
Language
modelling
Evaluation of DoG and Riesz Wavelets and BOVW
Data–driven ROI for
local analysis of lung
texture
Optimal vocabulary
size by learning
informative words
BOVW need to cover
anatomically
meaningful areas
Specific vocabularies,
combined, provide
better insight into
patterns
Removal of words
using language
modelling, does not
impact accuracy
Visual Grammar
transformations
improve accuracy and
reduce descriptor size
70
60. Shortcomings
• Visual grammar model is slow to train for large vocabularies, synonymy
requires further restrictions (sparsity)
• Semantics is covered, but there’s other aspects that can still be explored
• Variations of a visual word (morphology)
• Combination rules of words in proximity (syntax)
• Bag of visual words has evolved into VLAD and Fisher Vectors, which in
some aspects are more robust.
71
61. Future work
• Extend the visual grammar evaluation
• Extend the visual grammar to cover various languages
• Synergies between isotropic and steerable texture descriptors
• Synergies between text and visual description
• Synergies between color and texture description
• Extend the language modelling to identify
• Paradigmatic relations
• Absence of visual words
72