Description and retrieval of medical visual information based on language modelling

1
Description and retrieval of medical
visual information based on language
modelling
Antonio Foncubierta-Rodríguez

Table of contents
Motivation and introduction
Technical contributions
Experiments
Concluding remarks
2

Evolution of medical images
• 1895, Conrad Röntgen discovers
X–rays
• Approximately 100 years later:
anatomical, functional, motion
• Any aspect can be visualized and
quantiﬁed
Imaging
modalities
Microscopy
Visible light
Magnetic
resonance
X–Rays
Nuclear
imaging
Ultrasound
5

Use of medical images
Geneva
University
Hospitals
during 2012
Magnetic
resonance
X–Rays
Nuclear
imaging
30,645 CT exams
12,819 MRI exams
1,426 PET exams
30%of world
storage
capacity
* estimation
6

Dimensions of medical images
2D
2D + time
3D
3D + time
3D + other
E.g.: dermatography,
radiography, angiography.
E.g.: echography,
endoscopy.
E.g.: CT, MRI, PET.
E.g.: functional MRI.
E.g.: Dual Energy CT.
7

Computer Aided Tools
• Multimodal information
• Partly annotated
• Multidimensional
HOW
to make sense?
CAD
CBIR
9

Visual features
High di-
mensional
approaches
Shape de-
scription
Point–
based
Surface–
based
Topology–
based
Full–
support
descrip-
tion
Geometry–
based
Spectral–
based
Statistical
& stochas-
tic meth-
ods
Video
speciﬁc
methods
Low di-
mensional
approaches
Spin
images
Silhouettes
and depth
images
Slice &
frame
analysis
10

Visual similarity
Ii = log 1
Pi
• Information:
• Specific definition
• Low level features
• Similarity
• General definition
• Higher level concepts (semantic
gap)
11

Bag of visual words
• BoVW aims at shortening the semantic gap
• Consists of:
1. Partition a n–dimensional feature space into K disjoint regions
2. Measure features at m sampling points of an image
3. Assign each sample to one of the K regions
4. K–bin histogram is the image descriptor
12

Scientiﬁc contributions
Feature ex-
traction and
modelling using
BOVW
Multiscale
texture
descriptors
Multiscale
analysis
of ROIs
Optimal
Vocabu-
lary Size
Optimal
Bag length
Optimal vo-
cabularies
in DECT
Vocabulary
Pruning
Language
modelling
Ground
truth
generation
14

Section outline
Multi–scale texture description
A visual grammar
ROI detector
Experiments
Concluding remarks
15

Multidimensional description
• 3D models
• External structure
• Shape analysis
• Deformation quantiﬁcation
• Volumetric images
• Internal structure
• Pattern analysis
• Early stage detection
17

Texture
The feel, appearance or consistency of a surface or a substance.
— Oxford Dictionaries
Texture contains important information about the structural
arrangement of surfaces and their relationship to the
surrounding environment.
— Haralick et al.
18

Wavelet analysis
Wavelet analysis
ψs,τ(t) =
1
s
ψ
t − τ
s
Ψs,τ(ω) =
1
s
|s| Ψ(sω)e−jωτ
• ψ(t) must be zero mean
• Ψ(ω) is a bandpass ﬁlter
• Finite set of scale parameters s
• Scaling function ϕ(t) used to
cover the low frequencies
19

Wavelet analysis: ﬁlterbanks
0 ω
0
|Ψs(ω)|
←− B −→← B
2
→
s = 1s = 2s = 4
20

Isotropic wavelet analysis
• Gaussian–based functions to analyze isotropic image texture
• Difference of Gaussians is an approximation to Laplacian of Gaussians
(Mexican Hat)
Difference of Gaussians
gσ(x) =
1
σxσyσz (2π)3
e
−
(xδx)2
2σ2
x
+
(yδy)2
2σ2
y
+
(zδz)2
2σ2
z
ψj(x) = gσ1 (x) − gσ2 (x)
σ2 = 1.6σ1
21

Riesz transform
• Multidimensional extension of the Hilbert transform
• Steerable
Nth order 3D Riesz transform
R(n1,n2,n3)f(ω) =
n1 + n2 + n3
n1!n2!n3!
(−jω1)n1 (−jω2)n2 (−jω3)n3
||ω||n1+n2+n3
ˆf(ω)
for all combinations of (n1, n2, n3) with n1 + n2 + n3 = N and n1,2,3 ∈ N.
N+2
2
templates
R(n1,n2,n3)
22

Riesz ﬁlterbanks
• Multiscale
• Steerable bandpass
ﬁlters
• Fourier domain
23

Beyond bag of visual words
• Widely used
• Strong performance variation
• Clustering:
• Large clusters, small vocabularies
• Small clusters, large vocabularies
Language
modelling of
BOVW
Vocabulary
Size
Meaning
Word
to word
relations
25

From words to grammar
Grammar
The whole system and structure of a language or of languages in
general, usually taken as consisting of syntax and morphology
(including inﬂections) and sometimes also phonology and semantics.
— Oxford Dictionaries
26

From words to grammar
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
xx
x
xx x
x
x
x
x
x
xx
x
x
x
x
x
x x
x
x
x
x
x
x
x
x
x x
x
x
xx
x
x
x
x xx x
x
x
xx
x
xx
x
x
x
x
x
x
x
x x
x x
x
x
x x
xx
x
x
x x
x
x xx
x xx xx
x
x
x
x x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x x
x
x
x
x
x
x
xx
xx
x
x
x
x
xx
x
x
xx
x xx
x
x
x
x
xxx
x
x
x
x
x
x x
x
x
x
xx xx
x
x
x
x
x
x
x
x
x
x
x
Visual
Grammar
Meaning
Synonymy
Polysemy
27

Visual topics
PLSA–based deﬁnition
A visual topic is an unobserved or latent variable z ∈ Z = z1, . . . , zNZ so that the
probability of observing the word wn in the visual instance Ii:
P(wn, Ii) =
NZ
j=1
P(wn|zj)P(zj|Ii).
P(zj|Ii) P(wn|zj)
WNW×NZ
image topics visual words
28

The word-topic matrix
WNW×NZ =


P(w1|z1) · · · P(w1|zNZ )
P(w2|z1) · · · P(w2|zNZ )
.
.
.
...
.
.
.
P(wNW |z1) · · · P(wNW |zNZ )

 →


t1,1 · · · t1,NZ
t2,1 · · · t2,NZ
.
.
.
...
.
.
.
tNW,1 · · · tNW,NZ


• Rows: relevant topics for a word
• Columns: relevant words for a topic
• Use the ratio of words as a topic–based significance tn,j:
• tn,j = 1 → the most significant word for the topic
• tn,j = 1/NW → the least significant word for the topic
29

Visual meaningfulness
Deﬁnition
The visual meaningfulness of a visual word wn is its maximum topic–based
signiﬁcance level:
mn =
maxj tn,j if maxj tn,j ≥ Tmeaning
0 otherwise
• Words below the meaningfulness threshold can be truncated.
30

Meaningfulness transformation
Deﬁnition
h = (n(w1), n(w2), . . . , n(wNW ))T
M =




m1 0 · · · 0
0 m2 · · · 0
...
...
...
...
0 0 · · · mNW




hM
= Mh
n(wM
i
) = mi · n(wi)
31

Word to word relations
Example
• A single class might have several
visual appearances
• Several classes might partially
share the visual appearance
• Two visual words with the same
meaning, belong to different
histogram bins and cannot be
compared
• Identifying synonymy allows to
compare these words
bimodal class
partially shared
appearance
word 1
word 2
word 3
word 4
word 5
32

Synonymy graphs
• Word 3 is partially linked to words
1 and 2
• Words 4 and 5 are also linked
word 1
word 2
word 3word 4
word 5
33

Visual synonymy
Deﬁnition
A pair of visual words wn, wm can be considered synonyms if the following
three conditions are met:
1. There is at least one visual topic zj to which both wn and wm belong.
2. wn and wm have a similar contextual distribution with the rest of the
words.
3. wn and wm have a complementary distribution in the collection.
34

Synonymy value
Definition
The synonymy value of two words wn, wm is the maximum significance value
for which both words are significant for the same visual topic.




t1,1 · · · t1,NZ
t2,1 · · · t2,NZ
...
...
...




σnm = σmn = max
j
min
n,m
tn,j, tm,j
35

Synonymy transformation
Deﬁnition
S =




1 s12 · · · s1NW
s21 1 · · · s2NW
...
...
...
...
sNW1 sNW2 · · · 1



 , sij = sji =



1 if i = j
σij if wi, wj are synonyms
0 otherwise
Transformed histogram:
hS
= Sh; n(wS
i
) = n(wi) +
i=j
sijn(wj)
36

Word ambiguity and dimensionality
• Some visual words are sources
of ambiguity if they relate to
various appearances
• Their presence in the histogram is
not discriminative
• Possible solution: identify
polysemy and reduce their
weight
topic A topic B
37

Visual polysemy
Deﬁnition
A visual word wn is polysemic in strict sense if all the following conditions are
met:
1. wn if there are at least two visual topics zj, zk to which the visual word
belongs (wide sense polysemy)
2. There is a visual word wm, which is a synonym of wn and belongs to the
topic zj
3. There is a visual word wl, which is a synonym of wn and belongs to the
topic zj
4. wm, wl are not synonyms
38

Polysemy threshold
Definition
The polysemy threshold of a visual word wn, Tn
polysemy, is the largest value
that satisfies that there are at least two topics for which the word is
significant above the threshold:




t1,1 · · · t1,NZ
t2,1 · · · t2,NZ
...
...
...




tn,j ≥ Tn
polysemy
≥ 2; ∀j = 1, . . . , NZ
39

Polysemy transformation
Deﬁnition
P =




p1 0 · · · 0
0 p2 · · · 0
...
...
...
...
0 0 · · · pNW



 ; pi = 1 − Ti
polysemy
Transformed histogram:
hP
= Ph; n(wP
i
) = pi · n(wi)
40

Grammatical similarity
Visual
Grammar
Meaning
Synonymy
Polysemy
vocabulary
pruning
bin to bin
weighting
vocabulary
weighting
simgram(Ii, Ij) =
(S · P · M · hi)T · (S · P · M · hj)
(S · P · M · hi) · (S · P · M · hj)
41

Section outline
A visual grammar
ROI detector
Experiments
Concluding remarks
42

Local analysis
• Medical images contain large
amounts of information
• Abnormalities and clinically
relevant patterns occur only in
reduced regions of interest
• Local context description :
• Dense sampling
• Keypoint–based analysis
43

Geodesic detection of regional extrema
1. Multi–scale difference of
Gaussians relates to saliency
2. Use geodesic operations to
obtain regional extrema:
2.1 Fill hole / grind peak
2.2 Substract from the original DoG
image
2.3 Label each fully connected
component larger than a
structuring element.
44

Section outline
Experiments
Texture analysis of 2D lung CT
Texture analysis of 3D brain MRI
Texture analysis of 4D lung CT using ROIs
Visual grammar for description of 2D images
Visual grammar for description of 3D medical images
Concluding remarks
45

• Interstitial lung diseases
• TALISMAN dataset acquired at
Geneva University Hospitals
• 90 HRCT scans from 85 patients
• 1679 annotated regions
• 6 classes
• ﬁbrosis
• ground glass
• emphysema
• micronodules
• healthy tissue
• consolidation
47

k-means
clustering
visual vocabulary
word-1 = (f11,f12,...,f1N)
word-2= (f21,f22,...,f2N)
...
word-k= (fk1,fk2,...,fkN)
4 scales
Wavelet
Transform
Energy of
Coe cients
Dataset
Histogram of
visual words
for each region
k-dimensional
discrete feature space
48

• Optimal number of visual
words between 100 and
300
• Overall performance
decreases with larger
vocabularies
Keep only meaningful words
0 50 100 150 200 250 300 350 400 450 500
20
30
40
50
60
70
80
Number of Visual Words
P@1(%)
Consolidation
Emphysema
Fibrosis
Ground Glass
Healthy
Micronodules
Geometric mean
49

• Texture–based segmentation of the
cerebellum
• IBSR dataset provided by MGH.
• MRI from 18 adult subjects
• Manual segmentations
• Cerebellum cortex
• Cerebellum white matter
51

Training Set
Testing Set
Histogram
Equalization
5 Scales
DoG 3D Wavelet
k-means
Clustering
NxNxN block
visual words
histogram
Nearest Neighbor
Search
Visual word
assignment
Visual Words
Histograms
Feature Space
Training Set
Training Set
Training Set
Testing Set
PREPROCESSING
FEATURE EXTRACTION
CLASSIFICATION
Visual
Vocabulary
52

• Performance improves with larger
block sizes
• Rest of brain
• Cerbellum cortex
• Performance does not improve
• Cerebellum white matter
Data–driven regions of interest
53

• Pulmonary embolism retrieval
• Dual Energy CT dataset acquired
at Geneva University Hospitals
• 25 patients
• 4D data
• x,y,z
• Energy level of acquisition
• Ground truth
• Severity (Qanadli index)
• Lobe based
55

k-means
clustering
55-dimensional
continuous feature space
visual vocabulary
word-1 = (f11,f12,...,f1N)
word-2= (f21,f22,...,f2N)
...
word-k= (fk1,fk2,...,fkN)
voxeli = closest word
5 scales
Wavelet
Transform
Energy of
Coefficients
.
.
.
Energy level 1 Energy level 11
Histogram of
visual words
for each lobe
voxeli = (fi1,fi2,..,fiN)
Lung lobes mask
1
5
4
3
2
k-dimensional
discrete feature space
56

• Performance improves with
4D data
• 63% for P@1
• 62% for P@5
• 60% for P@10
• Optimal conﬁguration
• 2 scales, 100–150 words
• Intensive computation
• High dimensional feature
space
Analyze only part of the data:
ROIs and meaningful words.
Words Scales P@1(%) P@5(%) P@10(%)
50 1 55 56 56
100 1 58 55 57
150 1 58 56 56
50 2 62 58 55
100 2 62 62 60
150 2 63 62 60
50 3 58 54 55
100 3 60 59 58
150 3 57 62 58
50 5 45 52 51
100 5 57 52 51
150 5 58 52 52
57

• Pulmonary embolism detection
• Improvements over previous
approaches
• ROI–based analysis
• Optimal combination of
energy–based vocabularies
59

• Improvements in performance
• Optimal combination of
energy–based vocabularies
• Multi–scale regions of interest
Finer–grain analysis of signiﬁcant
words and synergies among them
Lobe DECT Words Energy levels SECT
LR 84 % 5 (50,130) 52 %
LL 84 % 5 (100,140) 48 %
MR 80 % 5 (40,50,130,140) 52 %
UL 76 % 25 (40,70,80,90) 60 %
UR 80 % 25 (90,120) 56 %
60

• Classiﬁcation and retrieval of
images from the biomedical
literature
• ImageCLEFmed modality
classiﬁcation task
• 1000 training and 1000 test
images
• 31 hierarchical categories
62

• SIFT–based visual vocabularies
• Varying number of visual topics
from 25 to 350 in steps of 25
• Varying meaningfulness threshold
from 50% to 100%
63

• Statistically signiﬁcant
improvement over state-of-the art
baseline
• Vocabulary reductions without
effect on the accuracy
• Up to 20% of the original
vocabulary size
Analyze synonymy relations among
multiple vocabularies
0 50 100 150 200 250 300 350 400 450 500
20
25
30
35
40
45
50
55
60
65
Effective number of visual words
Classificationaccuracy(%)
Baseline Grammar Statistical significance threshold
64

• Organ identiﬁcation task
• VISCERAL dataset
• Full body CT scans
• 15 Contrast–enhanced
• 15 Not enhanced
• 10 anatomical structures, 8 classes
66

• Riesz–based texture features
• 3 scales
• Riesz order 2.
• Organ–speciﬁc vocabularies
• 1000 random samples within the organ
• 20 visual words per organ
• Visual Grammar transformation
67

• Good results for organ
identiﬁcation
• Reduction of vocabulary
size with respect to
baseline without visual
grammar
68

• Good results for organ
identiﬁcation
• Reduction of vocabulary
size with respect to
baseline without visual
grammar
0 20 40 60 80 100 120 140 160 180 200
10
20
30
40
50
60
70
80
Vocabulary Size
ClassificationAccuracy(%)
68

Section outline
Experiments
Concluding remarks
69

Conclusions
Feature ex-
traction and
modelling using
BOVW
Multiscale
texture
descriptors
Multiscale
analysis
of ROIs
Optimal
Vocabu-
lary Size
Optimal
Bag length
Optimal vo-
cabularies
in DECT
Vocabulary
Pruning
Language
modelling
Evaluation of DoG and Riesz Wavelets and BOVW
Data–driven ROI for
local analysis of lung
texture
Optimal vocabulary
size by learning
informative words
BOVW need to cover
anatomically
meaningful areas
Speciﬁc vocabularies,
combined, provide
better insight into
patterns
Removal of words
using language
modelling, does not
impact accuracy
Visual Grammar
transformations
improve accuracy and
reduce descriptor size
70

Shortcomings
• Visual grammar model is slow to train for large vocabularies, synonymy
requires further restrictions (sparsity)
• Semantics is covered, but there’s other aspects that can still be explored
• Variations of a visual word (morphology)
• Combination rules of words in proximity (syntax)
• Bag of visual words has evolved into VLAD and Fisher Vectors, which in
some aspects are more robust.
71

Future work
• Extend the visual grammar evaluation
• Extend the visual grammar to cover various languages
• Synergies between isotropic and steerable texture descriptors
• Synergies between text and visual description
• Synergies between color and texture description
• Extend the language modelling to identify
• Paradigmatic relations
• Absence of visual words
72

Description and retrieval of medical visual information based on language modelling

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Description and retrieval of medical visual information based on language modelling

Similaire à Description and retrieval of medical visual information based on language modelling (20)

Dernier

Dernier (20)

Description and retrieval of medical visual information based on language modelling