Erickson Nascimento, Federal University of Minas Gerais - "On the development of a Visual-Temporal-awareness Rheumatic Heart Disease classifier for Echocardiographic Videos"
Boost Fertility New Invention Ups Success Rates.pdf
On the development of a Visual-Temporal-awareness Rheumatic Heart Disease classifier for Echocardiographic Videos
1. • Rheumatic Heart Disease (RHD) is a heart condition caused by abnormal immune
response to streptococcal infection,
• streptococcal: a bacteria normally associated with poor sanitation and
hygiene conditions.
• The burden of RHD is concentrated in low-income countries,
• health resources are scarce.
• Echocardiographic (echo) screening is the gold standard for diagnosis of latent
RHD;
• personnel shortages limit broad implementation.
• To address this issue, we aimed to develop a machine-learning model for automatic
identification to be used in further steps of our solution for RHD screening for
prioritization of follow-up.
1
2. Preprocessing phase
• Videos clipped at 16 frames
• Rotation and resizing to 128x171 pixels (required by the DNN chosen)
• Whitening (process that subtracts the pixels in each video by the mean of the
videos in the original training data)
2
Video Pre-processing
Before whitening After whitening
Frame of a video
with doppler
Frame of a video
without doppler
3. Methodology
• Videos with and without doppler were considered separately.
• Undersampling according to the borderline-RHD class
• Classify an exam directly, i.e., there is no view classification
• Use of the C3D neural network proposed by Tran et al. [2015], originally
trained with the Sports-1M dataset
• Changed the classification layer according to the problem modeling
followed
• Fine-tuned the parameters with the training set
3
Methodology
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks,
ICCV 2015
4. Modified version of the C3D architecture (as showed below)
• Input: 16 frames from a video of an exam;
• 50 epochs with early stopping;
• Batch size of 16;
• Learning rate of 0.001 and a random crop strategy.
4
Network architecture
Normal
or
RHD
positive
Visual feature extraction Classifier
5. Preliminary experiments to understand the capability of the network in extracting visual
features and separating the 2 classes of interest.
We biased the training to maximize the Borderline accuracy.
Results of confusion matrix per video considering two classes: RHD positive and
negative:
• accuracy: 0.628 (95% CI, 0.573 – 0.682)
• specificity: 0.615 (95% CI, 0.435 – 0.795)
• sensibility: 0.641 (95% CI, 0.432 – 0.850)
5
Results per video
and 2 classes
6. • Hyperparameter tuning (hyperband)
• Take advantage of visual features from the doppler images;
• Analyze the visual features the networks use to classify the exams (interpretability)
and compare with those used by doctors;
• Build a network architecture with 2 arms (see figure below), considering both
doppler images and raw images from the exams.
6
Doing
Normal
or
RHD positive
DopplerImageRawImage