Remo Monti - DL for Clinical Brain MRI Segmentation

Deep Learning for
Clinical Brain MRI
Segmentation
MDC Deep Learning Club
Remo Monti
May 2018

Overview
• Introduction to MRI
• Introduction to Image Segmentation
• Brain Ventricle Segmentation
• Our Machine Learning Problem
• DCNN Architectures for Image Segmentation
• U-Net, V-Net
• Dice Coefficient and Dice Loss
• Results
• Discussion

A Brief Introduction to MRI
• MRI = (nuclear) Magnetic resonance imaging
• MRI scanners use strong magnetic fields and
radio waves to generate images of the organs in
the body.
• Certain atomic nuclei are able to absorb and emit
radio frequency energy when placed in an
external magnetic field.
• Hydrogen atoms are most often used to generate
a detectable radio-frequency signal, that is
received by antennas in close proximity to the
anatomy being examined.
www.healthcare.siemens.ch
https://en.wikipedia.org/wiki/Magnetic_resonance_imaging

• MRI can be divided into Excitation, Relaxation, Acquisition, Computing
and Display
Excitation
• Protons align their magnetic fields (spin axes) in parallel
or anti- parallel to the outer magnetic field
• Imbalance between parallel and anti- parallel spins
leads to NET magnetization
• Alignment can be perturbed by a radio-frequency
signal. This process is called Excitation.
mri:Physics For anyone who does not have a degree in physics (Evert J Blink)

• T1 images measure the tissue-specific relaxation times after
excitation of the induced magnetic field on the Z-axis
• During relaxation the hydrogen
atoms emit energy in the form
of radio-waves, which are
measured by nearby sensors.
emission
Excitation Relaxation (T1)
T1 Relaxation Time
(tissue specific!)

• T2 images measure the loss of phase in the XY-plane. The process of
getting from a total in-phase situation to a total out-of-phase
situation is called T2 relaxation.
• After excitation the spins of the hydrogen atoms
are in phase. The phase is lost over time (A-E).
T2 Relaxation Time
(tissue specific!)

T1 vs T2 Images
• T1 and T2 relaxation are two independent processes, which happen
simultaneously.
• T1 happens along the Z-axis
• T2 happens in the X-Y plane
• Different tissues appear
bright/dark in T1/T2 images
• T1: water = dark
• T2: water = bright
T1 T2

A Brief Introduction to Image Segmentation
• In image segmentation we assign a class to every input pixel (or voxel)
Figure: Pelt, Sethian (PNAS 2018)

Brain MRI Segmentation – Ventricles
• The ventricular system is a set of four
interconnected cavities (ventricles) in
the brain, where the cerebrospinal
fluid (CSF) is produced.
• Hypothesis:
Ø Ventricles increase size before / during
Multiple Sclerosis episodes
Ø“Brain inflammation” measurable by
looking at ventricles
By Polygon data were generated by Life Science Databases(LSDB). - Polygon data are from BodyParts3D., CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=26756796

Brain MRI Segmentation – Ventricles
T2 ImagesAxial Sagittal
Coronal
512 x 512
512 x 44
512 x 44
Voxel Sizes
x : 0.5 mm
y : 0.5 mm
z : 3.0 mm
Jason Millward

Machine Learning Problem
• Learn to predict segmentation (drawn using T2 imgs) from T1 imgs
• Nscans > 220 , resolution 512 x 512 x 44 (0.5 x 0.5 x 3 mm)
• T1 and T2 images + segmentation (Jason Millward)
• “MS-dataset”
• Model should produce satisfactory results on different dataset
“Day2Day” (Filevich, 2017)
• T1 images of healthy subjects measured over many time points
• resolution 192 x 256 x 256 ( 1 x 1 x 1 mm)

Machine Learning Problem
Learn from this…
… predict on this

DCNN Architectures for Image Segmentation
• “Standard” Deep Convolutional Neural Net (DCNN) architectures that rely on
pooling in order to aggregate spatial context and deconvolution/upscaling to
restore the original image dimensions are not the best choice for image
segmentation
• The reason for this is the loss of spatial information associated with pooling:
256 x 256 x 1 256 x 256 x nclasses
256 x 256 x c1
128 x 128 x c2
64 x 64 x c3
32 x 32 x c4
c1 < c2 < c3 < c4
Convolution
Max Pooling
Up-Scaling
Number of channels:
nrow x ncol x nchannel
Input Desired Output

DCNN Architectures for Image Segmentation
• There are multiple ways to tackle the problem of loss of spatial
information:
• Skip connections
• 2D: U-Net, Ronneberger 2015
• 3D: V-Net, Milletari 2016
• Dense Architectures
• Huang 2016
Ø Original publication
• MSD-Net, Pelt 2017
• HyperDense-Net, Dolz 2018
Ø Recent examples…

2D Models: U-Net
• 2 Convolutions without padding
at every ”level” of the Network
• Skip Connections propagate
information from early to later
layers (after cropping)
Levels
0
1
2
3
4

3D Models: V-Net
• Residual blocks of varying depths
at every ”level” of the Network
• Skip Connections propagate
information from early to later
layers
• ”Down Conv” and “Up Conv”
layers between levels (instead of
MaxPool and UpScale)
• Can be used with 2D input too!
• U-Net and V-Net make use of
the same idea: Skip connections

V-Net Building Blocks
Residual Block
Conv3D
k=2, s=2
Conv3DTranspose
k=2, s=2
Concatenate
Conv3D
Add
PReLu
Keras Layers

U-Net vs V-Net
• One disadvantage of 3D architectures is the large memory footprint
• This is especially true if we want to feed the entire scan (i.e. all slices) to the network
at once
• V-Net : one batch contains (batch-size * nslices ) 2D-images
• U-Net: batch-size directly determines number of 2D images in one batch
• In other words:
• For the U-Net, each slice is one training example
• For the V-net, each scan is one training example
• Nvnet < Nunet
• Is it harder to learn 3D kernels?
Ø Convolutions are not rotation-invariant
Ø 3D adds an additional rotational axis

Quality of Predicted Segmentation
• Quantitative vs Ground Truth
• (weighted) cross-entropy
• Dice Coefficient
• …
• Qualitative
• Do the predicted contours look as if
they were produced by an expert?
• Radiology Turing Test:
ØIf there is an attending physician on one
side of a wall (A), and a computer or
radiologist on the other, can the attending
physician tell the difference?
ISMRM April 2017 Computer Aided Diagnosis

The Dice Coefficient
… similar to Intersection over Union
Let R be the reference segmentation (gold standard) with voxel values rn for the foreground class and
voxel n over N image elements. Let P with values pn be the corresponding predicted probabilistic map.
!" =
2 ∗ ∑' (')' + +
∑'(('+)') + +
Dice Coefficient is between 0 (no overlap) and 1 (perfect overlap).

Generalized Dice Loss
Sum over all voxels
Sum over all classes
Let R be the reference segmentation (gold standard) with voxel value rln for class l and voxel n over N
image elements. Let P with values pln be the corresponding predicted probabilistic map.
Sudre 2017

V-Net: Our Implementation on T1 Scans
Milletari 2016 Our Architecture
Input Shape
x, y, z, nchannel
128, 128, 64, 1 256, 256, 32, 1
Kernel Size
x, y, z
5, 5, 5 3, 3, 2
Strides for Up-
and Down Conv
2, 2, 2 2, 2, 1
nfilters
@ 1 … nlevels
16, 32, 64, 128, 256 32, 64, 128, 256
i.e. one level less
Residual Block
depth
@ 1 … nlevels
1, 2, 2, 2, 3 1, 2, 2, 3
Loss
„Dice-based loss“ (?) Generalized Dice Loss

V-Net: Our Implementation on T1 + T2 Scans
• Input has 2 Channels (T1, T2)
• First layer finds „useful“ combinations of T1 & T2 Channels
Ø Conv3D(filters=32, kernel_size=(1,1,1))
Ø This only works if subject hasn’t moved…
…
32 * Conv1,1,1
256, 256, 32, 2
256, 256, 32, 32
Vnet

Results
• Currently, validation is performed on all scans of one single subject (12 timepoints)
• None of the scans from that subject are part of the training set
Model Input # Param avg DC sd DC
Unet* T1 56,442,132 0.824 ± 0.036
Vnet T1 8,074,338 0.855 ± 0.030
Vnet T1+T2 8,092,322 0.869 ± 0.019
* Our implementation of the Unet uses residual blocks and up- and down-convolutions just like the Vnet

Results per Slice
0 7 15
SliceLocation //2z-axis
//2
SliceLocation
Nvoxels
Ventricle Size

Results on Validation Set (1 Patient)
Vnet T1

Results on Day2Day data
• Not that good.
Vnet T1

Discussion
• 3D Deep Convolutional Neural Networks provide state of the art
performance for image segmentation
• A 3D model with comparable architecture but just 1/8 of the
parameters of a 2D model outperforms the 2D model
• The performance gain comes at the cost of higher memory needs
• Our V-Net does not easily generalize to the Day2Day data
• Data Augmentation?
• Transfer Learning?

Acknowledgements
AG Niendorf
Jason Millward
Sonia Waiczies
Andreas Pohlmann
AG Lippert
Christoph Lippert
Aiham Taleb
Sharyar Khorasani

References
• Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image
segmentation."International Conference on Medical image computing and computer-assisted intervention.
Springer, Cham, 2015.
• Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. "V-net: Fullyconvolutional neural networks for
volumetric medical imagesegmentation."3D Vision (3DV), 2016 Fourth International Conference on. IEEE,
2016.
• Filevich, Elisa, et al. "Day2day: investigating daily variability of magnetic resonance imaging measures over
half a year."BMC neuroscience18.1 (2017): 65.
• Huang, G., Z. Liu, and K. Q. Weinberger. "Densely Connected Convolutional Networks. arXiv Preprint, 1–12."
(2016).
• Dolz, Jose, et al. "HyperDense-Net: A hyper-densely connected CNN for multi-modal image
segmentation."arXiv preprint arXiv:1804.02967(2018).
• Sudre, Carole H., et al. "Generalised Dice overlap as a deep learning loss function for highly unbalanced
segmentations."Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision
Support. Springer, Cham, 2017. 240-248.
• Pelt, Daniël M., and James A. Sethian. "A mixed-scale dense convolutional neural network for image
analysis."Proceedings of the National Academy of Sciences(2017): 201715832.

Remo Monti - DL for Clinical Brain MRI Segmentation

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Remo Monti - DL for Clinical Brain MRI Segmentation