The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

The Art and Power of Data-Driven Modeling: Statistical
and Machine Learning Approaches
1
PhD in Applied Mathematics
Past: Postdoctoral research
on brain MRI segmentation
Current: Applied machine
learning in materials science
Nataliya Portman
Postdoctoral Fellow
Faculty of Science, UOIT, Oshawa, ON Canada
“AI with the best” online conference
September 24, 2016

• Statistical versus machine learning:
- Principles
- Goals
- Applications in biomedical sciences
• Automatic brain tissue classification of infant brain
MRI (Montreal Neurological Institute)
- Challenges of automated segmentation
- Combined solution: Kernel-based classifier +
perceptual image quality model
• Conclusion
Overview
2
Nataliya Portman

3
Statistical Learning
• Learning is a process of probabilistic inference
• Instance space X (quantities of interest, e.g., wind)
• Hypothesis space H (e.g., h1=strong, h2=weak)
• Training samples D (observed data, N recordings of wind)
Nataliya Portman
The Posterior
The probability that
hypothesis h is true
given the evidence D.
The
Evidence
The probability
of getting the
evidence D if the
hypothesis h
were true.
The Prior
The probability
of h being
true, before
gathering
evidence.
The marginal probability of the
evidence (Probability of D over
all possible hypotheses).
Common statistical
learning methods:
• Bayesian
• Maximum a posteriori
(MAP)
• Maximum likelihood
3

Bayesian Learning
• An unknown quantity is a random variable
• Requires the hypothesis prior P(hi)
• Combines prior probabilities with observed data
• Predictions are made by using all the hypotheses
weighted by their probabilities
Usually, a hypothesis determines a probability
distribution over the unknown quantity of interest X
(e.g., parameters of the Gaussian distribution).
4
Nataliya Portman
The posterior
The predictive
probability

5
Bayesian Learning?
Nataliya Portman

MAP Learning
• For each hypothesis h in H, calculate the
posterior probability
• Output the hypothesis hMAP with the highest
posterior probability
6
Nataliya Portman

Maximum Likelihood Learning
• Assumes a prior P(h) is uniform over the space of
hypotheses H
• Chooses an h that maximizes P(D|h)
• Reasonable approach when there is no reason to
prefer one hypothesis over another a priori
• A good approximation to MAP and Bayesian learning
when the dataset is large
7
Nataliya Portman

MAP Learning implementation
Distribution of grey level intensities of 3D adult brain
MRI
•Training dataset D: 3D brain MR images
• Hypothesis space per voxel: {h1,h2,h3} with h1=WM, h2=GM,
h3=CSF
• Probability models of each tissue type:
• Tissue class priors: P(WM), P(GM), P(CSF)
Output: posterior probabilities (“soft” segmentation)
Decision
boundaries
8
Nataliya Portman

MAP Learning: Expectation-Maximization Algorithm
We estimate initial tissue class priors
• Interactively select representative voxels for each
tissue type from each individual scan in the training
dataset (and fit the Gaussians)
• Compute the ratios of each tissue class voxels
with respect to all the representative voxels in the
training data.
9
Nataliya Portman

MAP Learning: Expectation-Maximization Algorithm
Expectation step, mth iteration: Compute
Maximization step: Update of the Gaussian parameters
corresponding to the new posterior distribution obtained at the
expectation step.
If D is the training dataset then P(h | D) is a probabilistic
brain atlas
10
Nataliya Portman

Clinical applications
• Statistical learning is used in diagnostics classification.
• Example: Diagnostics in oncology (e.g., the diagnosis of a
tumor as being “benign” or “malignant”).
• Relies on logistic regression model of the conditional
probability
• Regression coefficients are estimated from a sample of N
individuals with known covariate values x(n)=(x1
(n), x2
(n),…,xp
(n)
,)
and known class h(n) in {0,1} via the minimization of a distance
measure. 11
Nataliya Portman
G. Schwarzer et al., Statistics in Medicine, 2000

Clinical applications (Machine Learning?)
X1
X2
X3
X4
h
P( h=1 | x )=f( x, w, W )
wij Wi
Neural Networks is another approach to model the conditional
probability with a logistic transfer function.
G. Schwarzer et al., Statistics in Medicine, 2000.
• Lacks an easy
interpretation of NN
model parameters
• Generates
implausible
functions
12
Nataliya Portman

Given the training dataset of N observations of
K-dimensional feature vector X and the
corresponding outcomes Y,
learn a mapping f(X) that minimizes the loss
L(Y,f(X)).
X Unknown Y
13Algorithm
Machine learning
Nataliya Portman

14
Machine learning
Modeling reduces to a problem of function optimization
Machine learning = algorithmic modeling
Target: find an algorithm that predicts the outcome for new samples
outside of the training dataset
Algorithms:
• Support Vector Machines
• Artificial Neural Networks
• Convolutional Neural Networks
• Random Forests
• Boosting
• Decision Trees
Nataliya Portman

Brain tissue classification of infant brain MRI
McConnell Brain Imaging
Centre
Montreal Neurological Institute
McGill University
Postdoctoral fellow
15
Nataliya Portman

The NIH (National Institutes of Health) pediatric “Objective-2” MRI
database is the largest demographically diverse U.S. sample that consists
of 69 subjects aged 10 days to 4.5 years of age.
16

17
Nataliya Portman

Child =
Greater intensity
variation due to
myelination of WM
Adult:
Noise
Intensity non-uniformity +
Partial Volume Effect
Natural tissue intensity
variation
Challenges with existing software:
• CIVET pipeline (developed at MNI) fails to perform
automatic accurate automatic classification into GM,
WM and the CSF
• General anatomical image processing pipelines
such as FSL (Smith et al., 2004) and SPM
(Ashburner, 1997) poorly detect major tissue classes
in NIH “Objective-2” dataset. 18
Nataliya Portman

Three major segmentation
frameworks (supervised):
Expectation-Maximization
[VanLeemput et al., 1999],
[Tohka et al., 2004],
[Prastawa and Gerig,
2004], [Xue et al., 2007],
[Murgasova et al., 2007]
Registration-based
[Collins et al., 1999],
[Murgasova et al., 2007]
Label Fusion
[Weisenfeld and Warfield,
2009]
19
Nataliya Portman

Methodological limitations
• Global estimation of tissue intensity distributions (EM, Label
fusion).
Due to biological intensity variation and Partial Volume Effect
(PVE) tissue intensity distribution in infant MRI can differ from
the Gaussian (EM).
• Supervised (atlas-dependent) approach that assumes small
deviations from average brain anatomy (EM, Registration-based).
20
Nataliya Portman

Imagine….
Human Visual System (HVS)
Information
extraction
Computer Vision
that we have built an intelligent machine (software) that effectively identifies
brain structures with the same accuracy as our Human Visual System.21

Classification machine requirements:
• Does not depend on a probabilistic brain atlas
• Does not assume global models of tissue intensity distributions
• Objectively evaluates the quality of classification as perceived by
the Human Visual System
• Multichannel
• Flexible, can be extended to multiclass classification
Impact:
Alleviates an agonizing pain of
• probabilistic atlas construction
• manual segmentation
• improves accuracy of segmentation of child brain MRI
• accelerates research rate in the field of early brain development
• revolutionizes the field of MRI segmentation 22
Nataliya Portman

Birth of a “Visionary”
The “Visionary” is a MATLAB software that accomplishes a
challenging task of brain tissue classification in child brain MRI.
Perceptual image quality model: In absence of “ground truth”
it tries to mimic human perception of the quality of
classification Structural SIMilarity Index (SSIM).
The philosophy underlying the SSIM approach: the Human
Visual System is highly adapted to extract structural
information from images.
How is “Visionary” built?
23
Nataliya Portman

- the local means of the corresponding image patches x and y,
- the local standard deviations (respectively),
- the small positive constants to stabilize each term.
Visionary classified image T1w template (08-11mon)
MSSIM quantifies the degree of structural similarity between input and classified images.
MSSIM=0.8614
24
Nataliya Portman

The choice of the reference depends on the age of the subject.
T1w serves as a reference for MR brain data for ages 8 months and later.
Age: 02-05
months
- =
25
Nataliya Portman

26
Nataliya Portman
KFDA-Kernel Fisher Discriminant Analysis

Modified KFDA criterion:
- spatial regularization term
in the feature space,
K and H are the kernel and
negative Laplacian matrices,
M and N are between-class and
within-class covariance matrices.
• Feature selection method ( tissue intensities, morphological measurements,
etc. ) in machine learning
• KFDA separability criterion measures the discriminating ability of a feature
or a subset of features to distinguish between different classes.
• The power of KFDA lies in its generality (does not assume multivariate
probability models of the classes) and closed form solution (algebraic).
Input and KFDA-classified data in
stereotaxic and intensity spaces
Kernel Fisher Discriminant Analysis
27
Nataliya Portman

Results for
the brain
template 08
to 11 months
28
Nataliya Portman

MSSIM=0.8234 MSSIM=0.8537
WM, GM and CSF detection in brain MRI template for ages 08 to 11 months.
T1w PVE Visionary
Myelinated WM detection in the brain MRI template for ages 02 to 05 months.
29
Nataliya Portman

T2w PVE Visionary unmyelinated WM
Objective-2 template, age range: 02-05 months
30
Nataliya Portman

Objective-2 template, age range: 44-60 months
Reference Initialization Visionary
(label transfer from an older brain)
31
Nataliya Portman

32
Nataliya Portman
…. still unpublished.

• Machine learning (ML) methods provide algorithmic models
for an unknown mapping between predictor and outcome
variables
• ML techniques are differently motivated, the goal is to
forecast the outcome with acceptable accuracy, to be
transferrable to new datasets
• Statistical learning methods are focused on estimation of the
probability distribution over hypothesis space
• In biomedical applications, models that explain the data are
preferable as they allow to reveal statistically significant
influences of some covariates on the outcome
• In order to devise an appropriate method for data processing
and analysis, one has to understand the data, namely, the
source of noise and signal variation and mathematical
assumptions of inference methodology
33
Conclusion
Nataliya Portman

The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Similar to The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman (20)

More from WithTheBest

More from WithTheBest (20)

Recently uploaded

Recently uploaded (20)

The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Editor's Notes