How can beautiful algorithmic findings be helpful in our everyday life? One of the answers to this question lies in the area of healthcare applications. Nowadays machine learning methods are becoming more and more useful in medicine. They are able not only to assist medical specialists in processing large amounts of data, but also to help in diagnostics and patient follow-up.
This course is devoted to the discussion of some interesting applications of machine learning methods to automatically analyse medical images and physiologic signals. Medical images acquired by means of special equipment represent internal structures of the human body and/or processes in it. The most modern technologies for acquisition of such images are magnetic resonance imaging (MRI) and computed tomography (CT). Physiologic signals usually refer to cardiologic time series such as electrocardiograms (ECG), but can also represent other physiological data, for example, stride intervals of human gait.
Several important problems will be highlighted along with successful solutions involving machine learning methods including examples both from the worldwide practice and the author’s own research. Description of the basic principles of the algorithms used will provide a good opprotunity to strengthen the knowledge acquired from the other courses of the school.
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Olga Senyukova - Machine Learning Applications in Medicine
1. MACHINE LEARNING APPLICATIONS
IN MEDICINE
Olga Senyukova
Graphics & Media Lab
Faculty of Computational Mathematics and Cybernetics
Lomonosov Moscow State University
5. Computed tomography (CT)
1972, Sir Godfrey Hounsfield
X-rays are computer-processed to produce
tomographic images
https://en.wikipedia.org/wiki/CT_scan
9. Electrocardiography (ECG)
1901, Einthoven
Recording of the electrical activity of the heart by
electrodes placed on the body
intensivecarehotline.com
10. RR time series
RR time series (interbeat intervals lengths) are widely
used for ECG analysis
www.elsevier.es
12. Analysis: what for?
Normal or diseased?
Where is the diseased area?
What changes over time occur
(especially, after treatment)?
Does the specific condition take
place (e.g. overtraining of the
sportsman)?
…
www.fresher.ru
14. Main tasks: physiologic signals
Diagnostics
Healthy
Disease XXX
Disease YYY
Template Matching
Condition ZZZ
The same or
not???
15. Machine learning in medical imaging:
challenges
Slide by D. Rueckert
Images are often 3D or 4D:
# of voxels and # of extracted features is very large
Number of images for training is often limited:
large datasets means typically 100 to 1000 images
“small sample size problem”
16. Machine learning in medical imaging:
challenges
Training data is expensive
annotation of images is resource intensive (manpower,
cost, time)
sometimes possible to augment training bases using
unlabelled images
Training data is sometimes imperfect
training data may be wrongly labelled
e.g. diseases such as Alzheimer’s require confirmation
through pathology (difficult and costly to obtain)
Slide by D. Rueckert
17. The InnerEye project
Measuring brain tumors
Localizing and identifying vertebrae
Kinect for surgery
Source: A. Criminisi & the InnerEye team @ MSRC
19. Decision forests
Leo Breiman, 2001
A. Criminisi, J. Shotton (eds.). Decision Forests in
Computer Vision and Medical Image Analysis //
Advances in Computer Vision and Pattern
Recognition. 2013
Decision forest consists
of decision trees…
20. Decision tree
Each internal node: a split (test) function
Each leaf: class label (predictor)
Source: A. Konushin
21. Regression tree
input value
continuouslabel
• Green – high uncertainty
• Red – low uncertainty
• Thickness – the number of samples
from the training setSeveral following slides are adapted from
A. Criminisi and J. Shotton
22. Regression tree: training
• S0 – whole training set
• Sj – part of training set at the jth node
))(,;(~)|( 2
xyyNxyp y
23. Regression tree: training
Split function parameters at the jth node maximize
the information gain
At each part (L,R):
fit a line to the points
(e.g. least squares)
for each x we have
))(,;(~)|( 2
xyyNxyp y
),(maxarg
jj SI
j
j
i
jSyx RLi Syx
yy xxI
),( },{ ),(
))(log())(log(
y – green line
29. Randomness
Randomized node optimization: optimize a split
function at the jth node w.r.t. a small random subset
of parameter values
),(maxarg jj SI ),(maxarg jj SI
j
!!!
j
),,( jjjj τ
j
j
jτ
selects features from the whole feature set
is a weak learner type (axis-aligned, linear, etc.)
is a set of splitting thresholds
32. Anatomy localization
Key idea: all voxels in the image vote for the
position of the organ
Each organ is defined by its 3D axis-aligned
bounding box
Cc
),,,,,( F
c
H
c
P
c
A
c
R
c
L
cc bbbbbbb
C = {liver, spleen, kidneyL, kidneyR, …}
33. Anatomy localization
For each input voxel the distribution of
relative displacements to the organ bounding box
is obtained
),,( zyx vvvv
),,,,,()( F
c
H
c
P
c
A
c
R
c
L
cc ddddddd v
);( vf – feature response
34. Anatomy localization
Voxel clusters with the highest confidence of
prediction are considered to be salient regions for
localization of an organ
salient regions are shown in green
41. Selection of atlases
How to select atlases the most similar to our image?
Atlases should be clustered by disease/population
Manifold learning is used to efficiently discover
such clusters
42. Manifold learning
Several following slides are adapted from D. Rueckert
Embed the data to
the manifold
(project to less-
dimensional space)
Find a manifold
43. Manifold learning: Laplacian eigenmaps
Given a graph G = (V, E)
Each vertex vi corresponds to an image
Each edge weight wij defines the similarity between
image i and j
Define diagonal matrix T which contains the degree
sums for each vertex
j ijii wt
44. Manifold learning: Laplacian eigenmaps
2/12/1
)(
TWTTL
Normalized graph Laplacian
2
,
min jiji ij yyW
The eigen decomposition of L
provides manifold coordinates
yi for each vertex i (or image)
45. Manifold learning for multi-atlas
segmentation
We have two sets of images:
labeled (atlases)
unlabeled
We want to label all the unlabeled images
We can do it iteratively:
label a part of unlabeled images using the most similar
from already labeled
these images can be used as atlases for the next
iteration
48. Segmentation of brain lesions in MRI
Olga V. Senyukova, “Segmentation of blurred objects by
classification of isolabel contours”. Pattern Recognition,
2014
Data was provided by Children's Clinical and Research
Institute Emergency Surgery and Trauma
49. The proposed algorithm
Each MRI slice is processed separately
In order to improve speed and robustness the
regions containing lesions can be specified manually
Lesions inside these regions are segmented
automatically
51. Isolabel contours
In geography
each isolabel contour (one color):
constant height f(x,y)=h
In image processing
each isolabel contour (one color):
constant intensity f(x,y)=I
52. How to distinguish lesion contours?
Visually we can do it easily!
Let’s use the same set of features for automatic
classification of isolabel contours
53. Features of isolabel contours
In order to distinguish isolabel contours delineating
lesions 4 features were proposed
Imean Imean inside the contour / Imean inside BBox
Imax-IminIvariance
54. Labeled training base
Various regions on many images:
a user can click on lesion contours: they will get “lesion”
other isolabel contours will automatically get “non-lesion”
…, ,
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
…
[ɸ1, ɸ2, ɸ3, ɸ4] is
a feature vector
55. Binary classification via SVM
We have a binary classification task: each isolabel
contour belongs to one of two classes, lesions or
non-lesions
One of the best classifiers is SVM – Support Vector
Machine
original linear SVM: Vladimir Vapnick, Alexey
Chervonenkis, 1963
applying a kernel trick results in nonlinear SVM:
Bernhard Boser, Isabelle Guyon, Vladimir Vapnick,
1992
56. Linear SVM
support vectors margin
1:1 by ii wx
1:1 by ii wx
positive samples
negative samples
w/2
Maximizing
we solve quadratic
optimization problem:
w/2
wwT
2
1
1)( by ii xw
minimizing
subject to
byb iii i xxxw
Solution is a hyperplane:
ix
i
– support vectors
– learned weights
57. Nonlinear SVM
For linearly separable data linear SVM is excellent
What about the data that is not linearly separable?..
We can make it linearly separable by mapping it to
more-dimensional space
58. Nonlinear SVM: kernel trick
by iii i xx bKy iii i ),( xxInstead of we have
)()(),( jijiK xxxx where
2
exp),( jijiK xxxx
For classification of isolabel contours nonlinear SVM
with RBF (radial basis function) kernel is used
59. Ensemble-based analysis of RR and gait
Olga Senyukova
Valeriy Gavrishchaka, Department of Physics, West
Virginia University
Springer, 2013, 2015
60. RR and gait time series
Normal?
Huntington’s disease?
Parkinson’s disease?
…
Normal?
Arrhythmia?
Congestive heart failure?
…
61. Ensemble learning techniques
Ensemble can work better than a single classifier
…
accuracy: 0.61 accuracy: 0.73 accuracy: 0.65
Weak learner 1 Weak learner 2 Weak learner N
Ensemble of classifiers
accuracy: 0.9
62. AdaBoost
Freund and Schapire, 1997
On each iteration focuses on the most hard-to-
classify samples
63. AdaBoost
– training data, – labels
Initial weights of all N samples:
M iterations, from m = 1 to M:
find
set
update
Classifier output:
Nwi /1)0(
))(()( 1
M
m mmTsignH xx
Nii ,...,1, x }1;1{ iy
)]([)(minarg)(
1
imi
N
i
mj
T
m TyiwT
j
xx
m
m
m
1
log
2
1
m
imimm
m
Z
Tyiw
iw
)(exp)(
)(1
x
69. Ensemble decomposition learning
We apply ensemble-based classifier to a point x
Each x can be described by its ensemble
decomposition vector (EDL vector)
We can classify data points by comparing their EDL
vectors
M
m mmTH 1
)()( xx
)](,),(),([)( 2211 xxxx MMTTTD
70. EDL: learning
All available data
«normal/abnormal»
MSE DFA
AdaBoost
Indicators from nonlinear
dynamics
Building a general classifier
«normal/abnormal»
MSE1 DFA2 … MSENα1 + α2 + αN
Ensemble classifier
Training sample x
MSE1 DFA2 … MSENα1 + α2 + αN
Applying the ensemble
MSE
+1 (normal) -1 (abnormal) +1 (normal) -1 (abnormal)
DFA
)]1(*,),1(*,1*[)( 21 MD x
EDL vector
Disease XXX
71. EDL: testing
Input y
MSE1 DFA2 … MSENα1 + α2 + αN
Applying the ensemble
]1*,,1*),1(*[)( 21 MD y
)()( yx DD
?
x = yx ≠ y
EDL vector
no yes
In multi-class classification problem the class of y is the class of the training example
with the closest EDL vector
))()((min)()(: yy DxDDxDC ik C
i
Ck
y has a disease XXXy does not have a disease XXX