Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Hybrid	Machine	Learning	Methods	for	
the	Interpretation	and	Integration	of	
Heterogeneous	Multimodal	Data
Madalina	Fiterau...
2Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Vital	Signs Gait	Kinematics
Longitudinal	dataAccelerometer...
X-rays MRIs
Stereo	Recordings	(video)
Structured	
Information
Notes
3Hybrid Models for Heterogeneous and Multimodal Data
M...
X-rays MRIs
Stereo	Recordings	(video)
Structured	
Information
Notes
4Hybrid Models for Heterogeneous and Multimodal Data
M...
X-rays MRIs
Stereo	Recordings	(video)
Structured	
Information
Notes
5Hybrid Models for Heterogeneous and Multimodal Data
M...
6Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Integrate
Interpret
multimodal,	multisource data	and	
lear...
7Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Integrate
Interpret
Hybrid	Systems
VIPR
Visualizations	for...
8Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Weak	Supervision	
for	Cardiac	MRI	
Classification
Future	R...
VIPR:	Visualizations	for	
Informative	Projection	Recovery
9
Collaborators:
Artur Dubrawski,	CMU	SCS
Donghan (Jarod)	Wang,	...
Application:	Alert	Classification
10
§ Heart	Rate<40	or	>140
§ Respiratory	Rate<8	or	>36
§ Systolic	Blood	Pressure<80	or	>...
40 60 80 100 120 140 160 180 200 220 240
value-HR-mean
80
82
84
86
88
90
92
94
96
98
100
value-SPO2-mean
Defining	interpre...
Feature	Selection,	with	a	Twist
12Informative Projection Recovery
0 0.2 0.4 0.6 0.8 1 1.2 1.4
value-HR-data--den
0
0.2
0.4...
Sparse	Predictive	Structures	
13Informative Projection Recovery
X
Y
Y
X
Z
VIPR	– a	quick	overview
14Informative Projection Recovery
Z
split	on	Y split	on	X,	Y
split	on	X
VIPR	– a	quick	overview
15Informative Projection Recovery
Selecting	Informative	Projections
16
1
2
3
4
5
6
7
Data	Points
Projections
Loss	Matrix	(L)
cj
Informative Projection Recov...
Selecting	Informative	Projections
Penalty	– limits	
#	of	projections
1
2
3
4
5
6
7
Data	Points
Projections
Loss	Matrix	(L)...
The	Combinatorial	Problem
Penalty	– limits	
#	of	projections
1
2
3
4
5
6
7
Data	Points
Projections
18
Selection	Matrix	(B)...
The	Combinatorial	Problem
some	points	use	
suboptimal	projections
1
2
3
4
5
6
7
Data	Points
Projections
19
Selection	Matri...
Integer	Linear	Program
1
2
3
4
5
6
7
Data	Points
Projections
20
Selection	Matrix	(B)
Informative Projection Recovery
§ ILP...
Iterative	Convex	Procedure
21Informative Projection Recovery
1
2
3
4
5
6
7
Data	Points
Projections
1
2
3
4
5
6
7
Data	Poin...
VIPR	– a	quick	overview
22Informative Projection Recovery
Min	Respiratory	Rate
Heart	Rate	Data	Density
23
artifact
true alert
Informative Projection Recovery
§ 2	Informative	
Proje...
24
Heart	Rate	Density
Oxygen	Saturation	Density
artifact
true alert
Alert	Classification	with	VIPR
Finger	Plethysmograph
N...
More	Research	on	Informative	Projections
§ Informative	projection	retrieval	for	regression	and	clustering
§ Finding	inform...
Deep	Neural	Decision	Forests
Deep Neural Decision Forests 26
This	research	was	partially	completed	during	an	internship	at...
Hybrid	Models
27Deep Neural Decision Forests
Dataset	
(tabular)
Classifier	
(Random	Forests)	
Feature
Engineering
Hybrid	M...
Deep	Learning	+	Accurate	Classifier
Deep Neural Decision Forests 28
§ End-to-end deep	learning	architecture
§ Challenge:	n...
Back-propagation	Trees
§ RF	structure	adapted	to	allow	back	propagation	
Deep Neural Decision Forests 29
θ
Y.	Lecun,	L.	Bo...
§ Soft	routing	of	samples
§ Class	distributions	in	leaf	nodes
• optimal	given	a	routing
§ Likelihood	term
• weighted	sum	o...
µℓ (x;Θ) = dn (x;Θ)1ℓ←n
n∈φℓ
∏ (1− dn (x;Θ)1n→ℓ
)
Modeling	Node	Splits
Deep Neural Decision Forests 31
Sigmoid	functiond1
...
Merging	Decision	Forests	to	Networks
Deep Neural Decision Forests 32
§ Each	output	of	the	DeepNet becomes	a	feature	for	th...
ImageNet Experiment
§ Millions	of	images
§ 1000	synsets (classes)
§ Modified	GoogLeNet*,	replaced	Softmax layers	with	BPF
...
ShortFuse:	Learning	Time	Series	
Representations	in	the	Presence	of	
Structured	Information	
ShortFuse: Learning Time Seri...
35
Biomedical	Time	Series	Representations	
in	the	Presence	of	Structured	Information
Demographics
Clinical	tests
Medical	h...
Osteoarthritis	Progression	
ShortFuse: Learning Time Series Representations with Structured Information 36
§ Knee	osteoart...
37
Osteoarthritis	
Progression
obese
Activity	counts
peak	intensity
fobese
Deep	Net
Effect	of	Structured	Information
Short...
obese
fobese
peak	intensity
mean
fnormal
Activity	counts
normal	
weight
Deep	Net
38
Osteoarthritis	
Progression
Effect	of	...
§ Hybrid	convolutions
§ Each	filter	uses	a	different	set	of	covariates
39
GenderAge Height Weight
12 M 154 77
Covariates	i...
40
GenderAge Height Weight
12 M 154 77
Kernel
Covariates	introduced	in	the	representation	learning	process.
Hybrid	CNN
Sho...
41
GenderAge Height Weight
12 M 154 77
+⊗
….	Deep	
Network
Kernel
Covariates	introduced	in	the	representation	learning	pro...
Hybrid	CNN
§ CNN	used	for	the	biomedical	applications
§ Convolutional	layers	replaced	with	hybrid	convolutions
§ Equivalen...
Osteoarthritis	Progression	Results	
ShortFuse: Learning Time Series Representations with Structured Information 43
Osteoar...
Osteoarthritis	Progression	Results	
ShortFuse: Learning Time Series Representations with Structured Information 44
Osteoar...
Cerebral	Palsy
Birth-acquired	condition	which	affects	mobility.
ShortFuse: Learning Time Series Representations with Struc...
Gait	Kinematics
§ Time	series:	Joint	angles	obtained	during	the	subject's	gait	
cycle	from	motion	capture	using	markers
Sh...
Cerebral	Palsy	Treatment
Surgical	treatment	(skeletal,	muscular)	is	invasive.	
Results	vary	greatly,	making	treatment	plan...
Cerebral	Palsy	Treatment
ShortFuse: Learning Time Series Representations with Structured Information 48
Binary	classificat...
Weak	Supervision	for	the	Classification	
of	Aortic	Valve	Malformations	
from	Cardiac	MRIs
Weak Supervision for Cardiac MRI...
Source:	www.umcvc.org
§ Congenital	malformation
§ Incidence:	0.5-2%
§ Associated	with
poor	health	outcomes
§ Diagnosed	fol...
UK	Biobank
§ >	500,000	subjects	total
§ For	100,000:
• Medical	imaging
• Genotyping
§ Phase-contrast	MRI
• Initial	release...
Gold	Standard	Labels
§ 412	patients;	12,360	individual	MRI	frames
• development set:	100	controls	and	6	BAV	patients
q sel...
Weak Supervision for Cardiac MRI Classification 53
Probabilistic	labels
Train	Deep	Net
Data	programming	
paradigm	in	Chris...
MRI	Preprocessing
Weak Supervision for Cardiac MRI Classification 54
Image	credit:	Jason	Fries
Labeling	Heuristics
Weak Supervision for Cardiac MRI Classification 55
BAV TAV
Primitive Observation LF
Area	 ABAV >	ATAV ...
Generative	Model
Weak Supervision for Cardiac MRI Classification 56
!1
!2
!3 !4
y
!5
Generative	Model
Probabilistic	traini...
Discriminative	Model
Weak Supervision for Cardiac MRI Classification 57
MAG	aortic	valve	box	
+	probabilistic	labels
…
Den...
Classification	Performance
Weak Supervision for Cardiac MRI Classification 58
Credit:	Jason	Fries
Survival	Analysis
Weak Supervision for Cardiac MRI Classification 59
Credit:	Jason	Fries
Major	Adverse	Cardiac	Event	(from...
Future	Research
60Future Research
Research	articles,	notes
Domain	insights
Related	multimodal	datasets
Hybrid	System
Analy...
§ Use	video	for	gait	lab	patients
§ For	osteoarthritis	study:	use	the	MRIs	and	X-rays	as	well.
Integrating	Specialized	Too...
Weakly-supervised	Transfer	of	
Models	and	Representations	
62Future Research
Model	trained	on	
healthy	adults
Weakly	super...
Online	Adaptive	Policies	for	Feature	
Selection	and	Representation	Learning	
63Future Research
Image	sources:	BioPac,	Medi...
Starting	up	at	UMass	Amherst
§ Fusion	of	Multi-resolution	Irregularly	Sampled	Time	Series
• students:	Iman	Deznabi,	Bhanu	...
Conclusion
65
VIPR
Visualizations	for	
Informative	Projection	
Recovery
DNDF
Deep	Neural	Decision	
Forests
ShortFuse
Learn...
Thanks!
66
New York, March 29th 2019
Madalina	Fiterau,	University	of	Massachusetts	Amherst
mfiterau e-mail:	mfiterau@cs.um...
Prochain SlideShare
Chargement dans…5
×

Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data

280 vues

Publié le

Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data

  1. 1. Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data Madalina Fiterau, University of Massachusetts Amherst 1 Advisors: Artur Dubrawski, CMU, Auton Lab Christopher Ré, Stanford CS Scott Delp, Stanford Bioengineering mfiterau e-mail: mfiterau@cs.umass.edu New York, March 29th 2019
  2. 2. 2Hybrid Models for Heterogeneous and Multimodal Data Motivation Vital Signs Gait Kinematics Longitudinal dataAccelerometerX-rays MRIs Stereo Recordings (video) Structured Information Notes
  3. 3. X-rays MRIs Stereo Recordings (video) Structured Information Notes 3Hybrid Models for Heterogeneous and Multimodal Data Motivation Vital Signs Gait Kinematics Longitudinal dataAccelerometer
  4. 4. X-rays MRIs Stereo Recordings (video) Structured Information Notes 4Hybrid Models for Heterogeneous and Multimodal Data Motivation Vital Signs Gait Kinematics Longitudinal dataAccelerometer Integrate
  5. 5. X-rays MRIs Stereo Recordings (video) Structured Information Notes 5Hybrid Models for Heterogeneous and Multimodal Data Motivation Vital Signs Gait Kinematics Longitudinal dataAccelerometer Integrate Interpret
  6. 6. 6Hybrid Models for Heterogeneous and Multimodal Data Motivation Integrate Interpret multimodal, multisource data and learn models that aid users the data. Hybrid Systems that Aim: build
  7. 7. 7Hybrid Models for Heterogeneous and Multimodal Data Motivation Integrate Interpret Hybrid Systems VIPR Visualizations for Informative Projection Recovery DNDF Deep Neural Decision Forests ShortFuse Learning Representations from Time Series and Structured Information
  8. 8. 8Hybrid Models for Heterogeneous and Multimodal Data Motivation Weak Supervision for Cardiac MRI Classification Future Research Directions Interpret Hybrid Systems Integrate
  9. 9. VIPR: Visualizations for Informative Projection Recovery 9 Collaborators: Artur Dubrawski, CMU SCS Donghan (Jarod) Wang, CMU, Auton Lab Dr. Gilles Clermont, University of Pittsburgh Dr. Marilyn Hravnak, University of Pittsburgh Dr. Michael R. Pinsky, University of Pittsburgh Informative Projection Recovery Github: https://github.com/inafiterau/VIPR
  10. 10. Application: Alert Classification 10 § Heart Rate<40 or >140 § Respiratory Rate<8 or >36 § Systolic Blood Pressure<80 or >200 § Diastolic Blood Pressure>110 § SPO2<85% window of 4 minutes preceding alert onset alert duration Features computed from time series include common statistics of each VS: mean, stdev, min, max, range of values, duty cycle ... Health alerts some are artifacts, not true alerts Informative Projection Recovery
  11. 11. 40 60 80 100 120 140 160 180 200 220 240 value-HR-mean 80 82 84 86 88 90 92 94 96 98 100 value-SPO2-mean Defining interpretability 11Informative Projection Recovery Imperfect separation Clear separation 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 value-HR-data--den value-SPO2-data--den Heart Rate Density* Oxygen Saturation Density Respiratory Rate Respiratory Rate Increase INFORMATIVE PROJECTION x *Density = Average / Typical Values Guillaume Obozinski, Ben Taskar, and Michael I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, April 2010. Related work on structured sparsity: artifact true alert
  12. 12. Feature Selection, with a Twist 12Informative Projection Recovery 0 0.2 0.4 0.6 0.8 1 1.2 1.4 value-HR-data--den 0 0.2 0.4 0.6 0.8 1 1.2 1.4 value-RR-data--denRespiratory Rate Density Noisy samples Blood Pressure Density Handled differently
  13. 13. Sparse Predictive Structures 13Informative Projection Recovery X Y
  14. 14. Y X Z VIPR – a quick overview 14Informative Projection Recovery
  15. 15. Z split on Y split on X, Y split on X VIPR – a quick overview 15Informative Projection Recovery
  16. 16. Selecting Informative Projections 16 1 2 3 4 5 6 7 Data Points Projections Loss Matrix (L) cj Informative Projection Recovery ... ... Axis-aligned, 1D, 2D, 3D minimal loss low loss high loss
  17. 17. Selecting Informative Projections Penalty – limits # of projections 1 2 3 4 5 6 7 Data Points Projections Loss Matrix (L) 17 cj Informative Projection Recovery minimal loss low loss high loss
  18. 18. The Combinatorial Problem Penalty – limits # of projections 1 2 3 4 5 6 7 Data Points Projections 18 Selection Matrix (B) § B binary selection matrix § bij is § 1, if projection j is to be used to solve point i and § 0, otherwise Informative Projection Recovery
  19. 19. The Combinatorial Problem some points use suboptimal projections 1 2 3 4 5 6 7 Data Points Projections 19 Selection Matrix (B) Informative Projection Recovery § B binary selection matrix § bij is § 1, if projection j is to be used to solve point i and § 0, otherwise § Learning B is NP-hard
  20. 20. Integer Linear Program 1 2 3 4 5 6 7 Data Points Projections 20 Selection Matrix (B) Informative Projection Recovery § ILP minimizes loss § Row constraints: sum to 1 § Column constraints: up to k non-0 maximize − " #$% & '# ( ℓ# subject to 0 ≤ bij ≤ pj ≤1 integer bij =1, j=1 m ∑ ∀i ∈ {1...n} pj ≤ k j=1 m ∑ Mk * = minMk ∈{(C,H,gmin )s.t.|H|<k} L(Mk , X) § Best k sub-models for training data
  21. 21. Iterative Convex Procedure 21Informative Projection Recovery 1 2 3 4 5 6 7 Data Points Projections 1 2 3 4 5 6 7 Data Points Projections Loss Matrix (L) Target Loss (T) !" = min ' ("' Madalina Fiterau and Artur Dubrawski. Projection Retrieval for Classification. In Advances in Neural Information Processing Systems 25, pages 3032–3040, NIPS 2012. Convex Program min ) ! − (( ∗ -)10 1 1 + 345(-) ( ∗ - "' = ("'-"'where
  22. 22. VIPR – a quick overview 22Informative Projection Recovery
  23. 23. Min Respiratory Rate Heart Rate Data Density 23 artifact true alert Informative Projection Recovery § 2 Informative Projections § Test point handled by one of them § Accuracy: 0.91, Precision: 0.93, Recall: 0.945 § Better accuracy than Random Forests and SVM (<0.9) Fiterau M, Dubrawski A, Chen L, Hravnak M, Clermont G, Pinsky MR. Automatic identification of artifacts in monitoring critically ill patients. Annual Congress of the European Society of Intensive Care Medicine 2014. Alert Classification with VIPR
  24. 24. 24 Heart Rate Density Oxygen Saturation Density artifact true alert Alert Classification with VIPR Finger Plethysmograph Noninvasive ECG Interpretability and performance are NOT at odds. Low density values indicate probe fell off Informative Projection Recovery
  25. 25. More Research on Informative Projections § Informative projection retrieval for regression and clustering § Finding informative projections with active learning § Studies on usability by domain experts § Theoretical guarantees § Related work on interpretability: 25Informative Projection Recovery Madalina Fiterau and Artur Dubrawski. Informative projection recovery for semi-supervised classification, clustering and regression. In International Conference on Machine Learning and Applications, volume 12, ICMLA 2013. Madalina Fiterau and Artur Dubrawski. Active learning for Informative Projection Recovery. In the Conference of the Association for the Advancement of Artificial Intelligence, volume 29, AAAI 2015. Fiterau M, Wang J, Dubrawski A, Clermont G, Hravnak M, Pinsky MR. Using expert review to calibrate semi-automated adjudication of vital sign alerts in step-down units. Society of Critical Care Medicine Annual Congress 2016. Star Research Award. Fiterau M, Dubrawski A, Chen L, Hravnak M, Bose E, Gilles C, Pinsky MR. Archetyping artifacts in monitored noninvasive vital signs data. Society of Critical Care Medicine Annual Congress 2015. Oral Presentation. PhD Thesis, Ch. 2.5 (VC dimension and Risk consistency); Under review: Compression scheme + Sample complexity Bing Liu, Minqing Hu, and Wynne Hsu. Intuitive representation of decision trees using general rules and exceptions. In Proceedings of Seventeeth National Conference on Artificial Intelligence (AAAI-2000). NOW: Lipton, Zachary C. "The mythos of model interpretability." arXiv preprint arXiv:1606.03490 (2016). Interpretable ML Symposium - NIPS 2017.
  26. 26. Deep Neural Decision Forests Deep Neural Decision Forests 26 This research was partially completed during an internship at MSR Cambridge, UK. Collaborators: Peter Kontschieder, Microsoft Research Antonio Criminisi, Microsoft Research Samuel Rota-Bulò, Fondazione Bruno Kessler
  27. 27. Hybrid Models 27Deep Neural Decision Forests Dataset (tabular) Classifier (Random Forests) Feature Engineering Hybrid Model C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR 2015
  28. 28. Deep Learning + Accurate Classifier Deep Neural Decision Forests 28 § End-to-end deep learning architecture § Challenge: need differentiable objective Decision tree ‘layers’
  29. 29. Back-propagation Trees § RF structure adapted to allow back propagation Deep Neural Decision Forests 29 θ Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998
  30. 30. § Soft routing of samples § Class distributions in leaf nodes • optimal given a routing § Likelihood term • weighted sum over set of all leaves L § Objective Back-propagation Trees Deep Neural Decision Forests 30 π1 ℓ ...πc ℓ dn (x;Θ) 1− dn (x;Θ)
  31. 31. µℓ (x;Θ) = dn (x;Θ)1ℓ←n n∈φℓ ∏ (1− dn (x;Θ)1n→ℓ ) Modeling Node Splits Deep Neural Decision Forests 31 Sigmoid functiond1 d2 d4 d5 d3 d6 d7 `4 Image by Samuel Rota-Bulò § Hierarchical routing along path Φl to leaf l Φl4 = {n1, n2, n5} µℓ4 (x;Θ) =σ (θ1 T x)(1−σ (θ2 T x))(1−σ (θ5 T x)) 1 if l belongs to left subtree of n 1 if l belongs to right subtree of n
  32. 32. Merging Decision Forests to Networks Deep Neural Decision Forests 32 § Each output of the DeepNet becomes a feature for the Backpropagation Forest Image credit: Samuel Rota-Bulò d1 d2 d4 ⇡1 ⇡2 d5 ⇡3 ⇡4 d3 d6 ⇡5 ⇡6 d7 ⇡7 ⇡8 f7f3f6f1f5f2f4 d8 d9 d11 ⇡9 ⇡10 d12 ⇡11 ⇡12 d10 d13 ⇡13 ⇡14 d14 ⇡15 ⇡16 f14f10f13f8f12f9f11FC Deep CNN with parameters ⇥
  33. 33. ImageNet Experiment § Millions of images § 1000 synsets (classes) § Modified GoogLeNet*, replaced Softmax layers with BPF Deep Neural Decision Forests 33 * C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. Description Top 5 Error GoogLeNet 10.07% 1 model, 1 crop 7.84% 1 model, 10 crops 7.08% 7 models, 1 crop 6.38% Can now introduce other covariates in the model via the BPF. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi and Samuel Rota-Bulo. Deep Neural Decision Forests, International Conference in Computer Vision, ICCV 2015.
  34. 34. ShortFuse: Learning Time Series Representations in the Presence of Structured Information ShortFuse: Learning Time Series Representations with Structured Information 34 This work was supported in part by the Mobilize Center, a National Institutes of Health Big Data to Knowledge (BD2K) Center of Excellence supported through Grant U54EB020405 Collaborators: Suvrat Bhooshan, Stanford CS Jason Fries, Stanford CS Charles Bournhonesque, Stanford ICME Jennifer Hicks, Stanford Bioenginnering Eni Halilaj, Stanford Bioenginnering Chris Re, Stanford CS Scott Delp, Stanford Bioenginnering
  35. 35. 35 Biomedical Time Series Representations in the Presence of Structured Information Demographics Clinical tests Medical history Short Fuse Time series Representations Structured information Prediction ShortFuse: Learning Time Series Representations with Structured Information N. Razavian and D. Sontag. Temporal convolutional neural networks for diagnosis from lab tests. 2015 A. Borovykh, S. Bohte, and C. W. Oosterlee. Conditional time series forecasting with CNNS. 2017 Z. Cui, W. Chen, and Y. Chen. Multi-scale convolutional neural networks for time series classification. 2016. Related work:
  36. 36. Osteoarthritis Progression ShortFuse: Learning Time Series Representations with Structured Information 36 § Knee osteoarthritis causes cartilage degeneration § Activity influences progression; other factors § Can we predict osteoarthritis progression? Joint Space Narrowing Activity counts Source: Wikipedia Gender Nutrition Age Physical exam Symptoms
  37. 37. 37 Osteoarthritis Progression obese Activity counts peak intensity fobese Deep Net Effect of Structured Information ShortFuse: Learning Time Series Representations with Structured Information
  38. 38. obese fobese peak intensity mean fnormal Activity counts normal weight Deep Net 38 Osteoarthritis Progression Effect of Structured Information ShortFuse: Learning Time Series Representations with Structured Information
  39. 39. § Hybrid convolutions § Each filter uses a different set of covariates 39 GenderAge Height Weight 12 M 154 77 Covariates introduced in the representation learning process. Hybrid CNN ShortFuse: Learning Time Series Representations with Structured Information X S = vector of d covariates n sequences t time points
  40. 40. 40 GenderAge Height Weight 12 M 154 77 Kernel Covariates introduced in the representation learning process. Hybrid CNN ShortFuse: Learning Time Series Representations with Structured Information X S = vector of d covariates n sequences t time points
  41. 41. 41 GenderAge Height Weight 12 M 154 77 +⊗ …. Deep Network Kernel Covariates introduced in the representation learning process. Hybrid CNN ShortFuse: Learning Time Series Representations with Structured Information X S = vector of d covariates n sequences t time points contains terms of the type
  42. 42. Hybrid CNN § CNN used for the biomedical applications § Convolutional layers replaced with hybrid convolutions § Equivalent modification for LSTM • Added parameters corresponding to the covariates ShortFuse: Learning Time Series Representations with Structured Information 42 Age Gender Height 12 M 154 Mass77 ... ... ... Convolution Pooling ... Convolution Pooling Fully Connected Output Joint motion waveforms Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré and Scott Delp. ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information. 3rd Conference on Machine Learning for Healthcare, MLHC 2017
  43. 43. Osteoarthritis Progression Results ShortFuse: Learning Time Series Representations with Structured Information 43 Osteoarthritis Initiative Dataset (OAI) – 1926 subjects. The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health (NIH). Task: Predict whether subjects are at risk for OA progression. Output: Joint space narrowing (JSN) > 0.7mm. Joint symptoms/function Medical history Nutrition Physical exam, measurements Subject characteristics, risk factors 650 covariates, out of which we selected 50. Activity counts Accelerometer data 7-day activity counts.
  44. 44. Osteoarthritis Progression Results ShortFuse: Learning Time Series Representations with Structured Information 44 Osteoarthritis Initiative Dataset (OAI) – 1926 subjects. The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health (NIH). Binary classification: fast/slow progression State of the art (engineered features, appended covariates): 67% Best representation learning without covariates: 71% Best representation learning with appended covariates: 72% ShortFuse: 74% accuracy Task: Predict whether subjects are at risk for OA progression. Output: Joint space narrowing (JSN) > 0.7mm.
  45. 45. Cerebral Palsy Birth-acquired condition which affects mobility. ShortFuse: Learning Time Series Representations with Structured Information 45
  46. 46. Gait Kinematics § Time series: Joint angles obtained during the subject's gait cycle from motion capture using markers ShortFuse: Learning Time Series Representations with Structured Information 46 Hip flexion angle Knee flexion angle Ankle flexion angle 0 20 40 60 80 100 0 5 10 15 20 25 30 35 40 45 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 -10 -5 0 5 10 15 20 0 20 40 60 80 100 0 5 10 15 20 25 30 35 40 45 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 -10 -5 0 5 10 15 20 0 20 40 60 80 100 0 5 10 15 20 25 30 35 40 45 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 -10 -5 0 5 10 15 20 Source: Gillette’s Children Specialty Care Gait Deviation Index
  47. 47. Cerebral Palsy Treatment Surgical treatment (skeletal, muscular) is invasive. Results vary greatly, making treatment planning difficult. ShortFuse: Learning Time Series Representations with Structured Information 47 § Psoas lengthening surgery § Positive outcome: • post-surgical Gait Deviation Index (GDI) > 90 • > 5 points improvement in Pelvis and Hip Dev. Index (PHiDI) psoas major iliacus iliopsoas
  48. 48. Cerebral Palsy Treatment ShortFuse: Learning Time Series Representations with Structured Information 48 Binary classification: good/bad surgical outcome State of the art (engineered features, appended covariates): 78% Best representation learning without covariates: 74% Best representation learning with appended covariates: 76% ShortFuse: 78% accuracy
  49. 49. Weak Supervision for the Classification of Aortic Valve Malformations from Cardiac MRIs Weak Supervision for Cardiac MRI Classification 49 To appear in Nature Communications. We acknowledge support from the NIH (U54 EB020405), DARPA under No. FA87501720095 (D3M), ONR under No. N000141712266 and No. N000141410102. Other Collaborators: Jared Dunmon, Stanford CS Ke Xiao, Stanford Medicine Helio Tejeda, Stanford Medicine Scott Delp, Stanford BioX Chris Ré, Stanford CSJason Fries Stanford CS James Priest, Stanford Med Principal Investigator Paper lead author Paroma Varma, Stanford CS
  50. 50. Source: www.umcvc.org § Congenital malformation § Incidence: 0.5-2% § Associated with poor health outcomes § Diagnosed following cardiovascular issues § May require surgical replacement of valve § Need: link genetic information to cardiac morphology § Limitations: variable data of diagnosis; absence of large imaging datasets specifically targeting subjects with BAV Bicuspid Aortic Valve (BAV) Disease Weak Supervision for Cardiac MRI Classification 50
  51. 51. UK Biobank § > 500,000 subjects total § For 100,000: • Medical imaging • Genotyping § Phase-contrast MRI • Initial release • 14,328 subjects • Measure blood flow • Multi-view • ‘Sliced’ view • 4-D tensors, 3 planes § No (BAV) labels L Weak Supervision for Cardiac MRI Classification 51
  52. 52. Gold Standard Labels § 412 patients; 12,360 individual MRI frames • development set: 100 controls and 6 BAV patients q selected via chart review of disease codes related to BAV q annotated by one cardiologist • validation set: 208 controls and 8 BAV patients q random uniform sampling q captures class distribution expected at test q annotated by one cardiologist • held-out test set: 88 controls and 3 BAV patients q random uniform sampling q annotated by 3 cardiologists + vote q agreement kappa = 0.354 q only used for the final evaluation Weak Supervision for Cardiac MRI Classification 52
  53. 53. Weak Supervision for Cardiac MRI Classification 53 Probabilistic labels Train Deep Net Data programming paradigm in Chris Ré’s group: Snorkel, Coral. Weak Supervision for MRI Classification MRI Sequences Processed Segments Preprocessing Domain Heuristics Final MRI Labels … Generative Model Weak Labels !1 !2 !3 !4 !5 !1 !2 !3 !4 y !5
  54. 54. MRI Preprocessing Weak Supervision for Cardiac MRI Classification 54 Image credit: Jason Fries
  55. 55. Labeling Heuristics Weak Supervision for Cardiac MRI Classification 55 BAV TAV Primitive Observation LF Area ABAV > ATAV !1 Eccentricity EBAV > ETAV !2 Perimeter PBAV > TAV !3 Intensity IBAV < ITAV !4 - A/P2 differs !5
  56. 56. Generative Model Weak Supervision for Cardiac MRI Classification 56 !1 !2 !3 !4 y !5 Generative Model Probabilistic training labels Labeling functions ! [SNORKEL] Ratner, A. J., De Sa, C. M., Wu, S., Selsam, D. & Re, C. Data programming: Creating large training sets, quickly. NIPS 2016. [GENERATIVE MODEL] Bach, S. H., He, B., Ratner, A. & Re, C. Learning the structure of generative models without labeled data, ICML 2017. [CORAL] Varma, P. et al. Inferring generative model structure with static analysis. NIPS 2017. Research on data programming:
  57. 57. Discriminative Model Weak Supervision for Cardiac MRI Classification 57 MAG aortic valve box + probabilistic labels … DenseNet 40-12 Attention BiLSTM Frame encoder Sequence encoder BAV/ TAV § DenseNet40-12 outperformed VGG16 and ResNet-50 § Data augmentation - crops, affine transformations
  58. 58. Classification Performance Weak Supervision for Cardiac MRI Classification 58 Credit: Jason Fries
  59. 59. Survival Analysis Weak Supervision for Cardiac MRI Classification 59 Credit: Jason Fries Major Adverse Cardiac Event (from (ICD-9, ICD-10, OPCS-4) N = 9,230
  60. 60. Future Research 60Future Research Research articles, notes Domain insights Related multimodal datasets Hybrid System Analysis + transferable models
  61. 61. § Use video for gait lab patients § For osteoarthritis study: use the MRIs and X-rays as well. Integrating Specialized Tools in Hybrid Systems 61Future Research Source: Gillette Children’s Specialty Care Source: Delp Lab § Text mining approaches
  62. 62. Weakly-supervised Transfer of Models and Representations 62Future Research Model trained on healthy adults Weakly supervised adaptation Model specialized for children. Model specialized for injured subjects. Image sources: Delp Lab, Gillette Children’s Specialty Care, CAMERA project Image Source: MedicalExpo
  63. 63. Online Adaptive Policies for Feature Selection and Representation Learning 63Future Research Image sources: BioPac, Medical Express, Research Gate, Journal of Circulation ... . . . Convolution Pooling . . . Convolution Pooling Fully Connected Output Optimize data collection: sources, sensor arrays. Cost: Acquisition, Invasiveness. Leverage user-engineered features in the representation learning pipeline.
  64. 64. Starting up at UMass Amherst § Fusion of Multi-resolution Irregularly Sampled Time Series • students: Iman Deznabi, Bhanu Pratap Singh § Multimodal Deep Learning to Forecast Disease Progression • use MRIs, X-rays for OA progression • combine DL with feature engineering • students: Joie Wu, Surya Teja § Transfer Learning across Thermal Imaging Datasets • person detection, face segmentation, body temp. estimation • students: Debasmita Ghose, Sneha Bhattacharya, Shasvat Desai § Incorporating Domain Knowledge in Bayesian Deep Learning • student: Aritra Gosh § Deep Causality • Student: Purva Purty 64Future Research
  65. 65. Conclusion 65 VIPR Visualizations for Informative Projection Recovery DNDF Deep Neural Decision Forests ShortFuse Learning Representations from Time Series and Structured Information Optimize feature selection and learning Weakly supervised transfer Incorporating data-specific techniques Weak Supervision for Cardiac MRI Classification
  66. 66. Thanks! 66 New York, March 29th 2019 Madalina Fiterau, University of Massachusetts Amherst mfiterau e-mail: mfiterau@cs.umass.edu

×