Olga Senyukova - Machine Learning Applications in Medicine

MACHINE LEARNING APPLICATIONS
IN MEDICINE
Olga Senyukova
Graphics & Media Lab
Faculty of Computational Mathematics and Cybernetics
Lomonosov Moscow State University

Medical data
Medical images
Physiologic signals
Other: narrative, textual, numerical, etc.

Medical images
X-Ray MRI
CTUltrasound

Computed tomography (CT)
 1972, Sir Godfrey Hounsfield
 X-rays are computer-processed to produce
tomographic images
https://en.wikipedia.org/wiki/CT_scan

Computed tomography (CT)
insightci.com.au

Magnetic resonance imaging (MRI)
 1973, Paul C. Lauterbur and Peter Mansfield
 Allows localizing the image by slices
Source: K. Toennies

Magnetic resonance imaging (MRI)
www.raleighrad.com

Electrocardiography (ECG)
 1901, Einthoven
 Recording of the electrical activity of the heart by
electrodes placed on the body
intensivecarehotline.com

RR time series
RR time series (interbeat intervals lengths) are widely
used for ECG analysis
www.elsevier.es

Human gait time series
reylab.bidmc.harvard.edu

Analysis: what for?
 Normal or diseased?
 Where is the diseased area?
 What changes over time occur
(especially, after treatment)?
 Does the specific condition take
place (e.g. overtraining of the
sportsman)?
 …
www.fresher.ru

Main tasks: images
Detection
aneurysm
Segmentation
T
Matching (Registration)

Main tasks: physiologic signals
Diagnostics
 Healthy
 Disease XXX
 Disease YYY
Template Matching
Condition ZZZ
The same or
not???

Machine learning in medical imaging:
challenges
Slide by D. Rueckert
 Images are often 3D or 4D:
 # of voxels and # of extracted features is very large
 Number of images for training is often limited:
 large datasets means typically 100 to 1000 images
 “small sample size problem”

Machine learning in medical imaging:
challenges
 Training data is expensive
 annotation of images is resource intensive (manpower,
cost, time)
 sometimes possible to augment training bases using
unlabelled images
 Training data is sometimes imperfect
 training data may be wrongly labelled
 e.g. diseases such as Alzheimer’s require confirmation
through pathology (difficult and costly to obtain)

The InnerEye project
Measuring brain tumors
Localizing and identifying vertebrae
Kinect for surgery
Source: A. Criminisi & the InnerEye team @ MSRC

Anatomy localization via regression
forests
A. Criminisi, et al.
Med Image Analysis
2013

Decision forests
 Leo Breiman, 2001
 A. Criminisi, J. Shotton (eds.). Decision Forests in
Computer Vision and Medical Image Analysis //
Advances in Computer Vision and Pattern
Recognition. 2013
Decision forest consists
of decision trees…

Decision tree
 Each internal node: a split (test) function
 Each leaf: class label (predictor)
Source: A. Konushin

Regression tree
input value
continuouslabel
• Green – high uncertainty
• Red – low uncertainty
• Thickness – the number of samples
from the training setSeveral following slides are adapted from
A. Criminisi and J. Shotton

Regression tree: training
• S0 – whole training set
• Sj – part of training set at the jth node
))(,;(~)|( 2
xyyNxyp y

Regression tree: training
 Split function parameters at the jth node maximize
the information gain
 At each part (L,R):
 fit a line to the points
(e.g. least squares)
 for each x we have
))(,;(~)|( 2
xyyNxyp y
),(maxarg 

jj SI
j

    









j
i
jSyx RLi Syx
yy xxI
),( },{ ),(
))(log())(log( 
y – green line

Different models
Predictor models
Constant Polynomial and linear Probabilistic linear
Weak learners (split functions)
Axis-aligned Generic oriented
hyperplane
Conic section

Regression forest
d
dxx  ),...,( 1v

Randomness
 Bagging: each tree is trained on a random subset
of the whole training set

Randomness
 Randomized node optimization: optimize a split
function at the jth node w.r.t. a small random subset
of parameter values
),(maxarg  jj SI ),(maxarg  jj SI
 j
!!!
j
),,( jjjj τ 
j
j
jτ
selects features from the whole feature set
is a weak learner type (axis-aligned, linear, etc.)
is a set of splitting thresholds

Anatomy localization
 Key idea: all voxels in the image vote for the
position of the organ
 Each organ is defined by its 3D axis-aligned
bounding box
Cc
),,,,,( F
c
H
c
P
c
A
c
R
c
L
cc bbbbbbb 
C = {liver, spleen, kidneyL, kidneyR, …}

For each input voxel the distribution of
relative displacements to the organ bounding box
is obtained
),,( zyx vvvv
),,,,,()( F
c
H
c
P
c
A
c
R
c
L
cc ddddddd v
);( vf – feature response

Voxel clusters with the highest confidence of
prediction are considered to be salient regions for
localization of an organ
salient regions are shown in green

Context-rich features
Features: mean intensity in randomly displaced boxes

Features for CT and MRI
CT: we can rely
on absolute
intensity values
MRI: only intensity
difference makes
sense

Learning clinically useful information
from medical images
 Biomedical Image Analysis Group
 Department of Computing
 Daniel Rueckert

Segmentation using registration

Multi-atlas segmentation using classifier
fusion

Multi-atlas segmentation using classifier
fusion and selection

Selection of atlases
 How to select atlases the most similar to our image?
 Atlases should be clustered by disease/population
 Manifold learning is used to efficiently discover
such clusters

Manifold learning
Several following slides are adapted from D. Rueckert
Embed the data to
the manifold
(project to less-
dimensional space)
Find a manifold

Manifold learning: Laplacian eigenmaps
 Given a graph G = (V, E)
 Each vertex vi corresponds to an image
 Each edge weight wij defines the similarity between
image i and j
 Define diagonal matrix T which contains the degree
sums for each vertex
 j ijii wt

Manifold learning: Laplacian eigenmaps
2/12/1
)( 
 TWTTL
Normalized graph Laplacian
2
,
min jiji ij yyW 
The eigen decomposition of L
provides manifold coordinates
yi for each vertex i (or image)

Manifold learning for multi-atlas
segmentation
 We have two sets of images:
 labeled (atlases)
 unlabeled
 We want to label all the unlabeled images
 We can do it iteratively:
 label a part of unlabeled images using the most similar
from already labeled
 these images can be used as atlases for the next
iteration

Manifold learning for multi-atlas
segmentation
Wolz et al., Neuroimage, 2010

Example
Wolz et al., Neuroimage, 2010

Segmentation of brain lesions in MRI
 Olga V. Senyukova, “Segmentation of blurred objects by
classification of isolabel contours”. Pattern Recognition,
2014
 Data was provided by Children's Clinical and Research
Institute Emergency Surgery and Trauma

The proposed algorithm
 Each MRI slice is processed separately
 In order to improve speed and robustness the
regions containing lesions can be specified manually
 Lesions inside these regions are segmented
automatically

Algorithm overview
Input region
Isolabel contours
I(x,y)=const
Closed isolabel
contours
Nonlinear SVM
classification

Isolabel contours
In geography
each isolabel contour (one color):
constant height f(x,y)=h
In image processing
each isolabel contour (one color):
constant intensity f(x,y)=I

How to distinguish lesion contours?
 Visually we can do it easily!
 Let’s use the same set of features for automatic
classification of isolabel contours

Features of isolabel contours
In order to distinguish isolabel contours delineating
lesions 4 features were proposed
Imean Imean inside the contour / Imean inside BBox
Imax-IminIvariance

Labeled training base
 Various regions on many images:
 a user can click on lesion contours: they will get “lesion”
 other isolabel contours will automatically get “non-lesion”
…, ,
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
…
[ɸ1, ɸ2, ɸ3, ɸ4] is
a feature vector

Binary classification via SVM
 We have a binary classification task: each isolabel
contour belongs to one of two classes, lesions or
non-lesions
 One of the best classifiers is SVM – Support Vector
Machine
 original linear SVM: Vladimir Vapnick, Alexey
Chervonenkis, 1963
 applying a kernel trick results in nonlinear SVM:
Bernhard Boser, Isabelle Guyon, Vladimir Vapnick,
1992

Linear SVM
support vectors margin
1:1  by ii wx
1:1  by ii wx
positive samples
negative samples
w/2
Maximizing
we solve quadratic
optimization problem:
w/2
wwT
2
1
1)(  by ii xw
minimizing
subject to
byb iii i   xxxw 
Solution is a hyperplane:
ix
i
– support vectors
– learned weights

Nonlinear SVM
 For linearly separable data linear SVM is excellent
 What about the data that is not linearly separable?..
 We can make it linearly separable by mapping it to
more-dimensional space

Nonlinear SVM: kernel trick
by iii i  xx bKy iii i  ),( xxInstead of we have
)()(),( jijiK xxxx  where
 2
exp),( jijiK xxxx  
For classification of isolabel contours nonlinear SVM
with RBF (radial basis function) kernel is used

Ensemble-based analysis of RR and gait
 Olga Senyukova
 Valeriy Gavrishchaka, Department of Physics, West
Virginia University
 Springer, 2013, 2015

RR and gait time series
Normal?
Huntington’s disease?
Parkinson’s disease?
…
Normal?
Arrhythmia?
Congestive heart failure?
…

Ensemble learning techniques
Ensemble can work better than a single classifier
…
accuracy: 0.61 accuracy: 0.73 accuracy: 0.65
Weak learner 1 Weak learner 2 Weak learner N
Ensemble of classifiers
accuracy: 0.9

AdaBoost
 Freund and Schapire, 1997
 On each iteration focuses on the most hard-to-
classify samples

AdaBoost
 – training data, – labels
 Initial weights of all N samples:
 M iterations, from m = 1 to M:
 find
 set
 update
 Classifier output:
Nwi /1)0(

))(()( 1 

M
m mmTsignH xx 
Nii ,...,1, x }1;1{ iy
)]([)(minarg)(
1
imi
N
i
mj
T
m TyiwT
j
xx  







 

m
m
m



1
log
2
1
 
m
imimm
m
Z
Tyiw
iw
)(exp)(
)(1
x


Final model
)](72.0)(70.0)(42.0[ 321 xxx TTTsign 

Ensemble decomposition learning
 We apply ensemble-based classifier to a point x
 Each x can be described by its ensemble
decomposition vector (EDL vector)
 We can classify data points by comparing their EDL
vectors
 

M
m mmTH 1
)()( xx 
)](,),(),([)( 2211 xxxx MMTTTD  

EDL: learning
All available data
«normal/abnormal»
MSE DFA
AdaBoost
Indicators from nonlinear
dynamics
Building a general classifier
«normal/abnormal»
MSE1 DFA2 … MSENα1 + α2 + αN
Ensemble classifier
Training sample x
Applying the ensemble
MSE
+1 (normal) -1 (abnormal) +1 (normal) -1 (abnormal)
DFA
)]1(*,),1(*,1*[)( 21  MD  x
EDL vector
Disease XXX

EDL: testing
Input y
Applying the ensemble
]1*,,1*),1(*[)( 21 MD  y
 )()( yx DD
?
x = yx ≠ y
EDL vector
no yes
In multi-class classification problem the class of y is the class of the training example
with the closest EDL vector
))()((min)()(: yy DxDDxDC ik C
i
Ck 
y has a disease XXXy does not have a disease XXX

Results
 CHF/Arrhythmia classification
 Real data from
http://www.physionet.org/physiobank

Thank you for attention!
knizhnayaraduga.ru

Olga Senyukova - Machine Learning Applications in Medicine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Olga Senyukova - Machine Learning Applications in Medicine

Similar to Olga Senyukova - Machine Learning Applications in Medicine (20)

Recently uploaded

Recently uploaded (20)

Olga Senyukova - Machine Learning Applications in Medicine