Patient-Focused Data Science: Machine Learning for Complex Diseases (AIM203-S) - AWS re:Invent 2018

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Patient-Focused Data Science at Takeda:
Insights About Complex Disease States with
Machine Learning
A I M 2 0 3 - S

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2
Dan Housman
Chief Technology Officer,
ConvergeHEALTH a division of
Deloitte Consulting LLP
Valerie Strezsak, PhD
Epidemiology
Takeda Pharmaceuticals
Jennifer Drahos, PhD, MPH
Global Outcomes Research
Takeda Pharmaceuticals
Speaking Today

About Takeda
Takeda is a patient-focused,
innovation-driven global
pharmaceutical company that builds
on a distinguished 236-year history,
aspiring to bring better health and a
brighter future for people worldwide.
We are focused on addressing critical
unmet needs in our core therapeutic
areas—oncology, gastroenterology
(GI) and neuroscience plus vaccines.

What We Do
Health economics and outcomes (HEOR) and epidemiology teams conduct rigorous scientific
studies that identify clinical, economic, and unmet patient needs and communicate product
value to regulators, HTA/payers, health care providers and patients and decision-making for R&D
teams
GOR TOOLKIT: THREE PILLARS
OF OUTCOMES RESEARCH
INNOVATIVE
OUTCOMES
RESEARCH
REAL
WORLD
EVIDENCE
PATIENT-
CENTRIC
RESEARCH
EXTERNAL COMMUNICATIONS
Demonstrate unmet need to key
decision makers meaningfully, for
example:
• Peer-reviewed publications
• Global value dossier
• Regulatory submissions
• Payer submissions
INTERNAL COLLABORATION
Characterize unmet needs and product
value to enhance development strategy
and decision making for R&D teams

The Deep Miner experiment began in September 2017 as a pilot aimed at
combining open source methods, proprietary data transformation, machine
learning and neural network algorithms to generate insight from real world data
Key Areas of Focus:
1. Document deep learning use cases of high value to Takeda
2. Demonstrate feasibility of scalable and reproducible analytical approach using Real World
Data in the OMOP Common Data Model
3. Create transformation processes using Deep Miner accelerator combined with custom
data transformation pipelines to convert Takeda data sets within AWS into deep learning
models
Designing Deep Learning Models for Scale
Problem
Definition
Feature Engineering /
Model Development
Results
Evaluation
Figure 1. General Experiment Framework

What is Real World Data?
Real world data (RWD) are the data relating to
patient health status and/or the delivery of health
services occurring outside of traditional randomized
clinical trials (RCTs). RWD can come from a number
of sources, for example:
• Electronic health records
• Insurance claims
• Product and disease registries
• Patient-generated/reported data
• Wearables/mobile devices
Real world evidence (RWE) is the clinical evidence
regarding the usage and potential benefits or risks of
a medical product derived from analysis of RWD
Real
World
Data

What Do We Do with RWD?
• Drug performance
• Predictive diagnostics /
phenotype
• Risk-based contracting
• Clinical trial cohort
feasibility
• Adherence Patterns
• Identify patient unmet
needs
• Determine healthcare
resource utilization and
costs
• Demonstrate burden of
disease
• Natural history of disease
• Etiology of disease
• Risk prediction
Preclinical
Exploratory
(Phase I/II)
Confirmatory
(Phase II/III)
Lifecycle
Management
• Influences which drug targets to develop or in-license
• Provides insights on burden of disease and patient unmet needs
• Increases medication adherence through targeted interventions
• Enhances ability to effectively target eligible patients
• Demonstrates real-world drug efficacy and safety
RWD is used to address questions across the lifecycle of drug R&D
How Does This Impact Our Commitment to the Patient?

What is Claims data?
Claims data are the administrative data generated for
each healthcare encounter (e.g., physician or hospital
visit) used to bill your insurance company. Claims are
coded using a universal system called the International
Classification of Disease (ICD).
Using ICD-9/10 codes, rich, patient-level data is
captured including:
• Diagnoses
• Procedures
• Filled prescription medications
• Labs ordered

RWD in Takeda’s Data Hub
Truven Health MarketScan and Optum
Clinformatics are licensed, third-party,
de-identified, individual-level commercial
claims data
What is OMOP?
A standard observational health data
model that provides a consistent data
structure and ontologies, specifically
optimized for large-scale analytics
Takeda’s Data Hub is large, but not enormous – making it a perfect playing ground for a
proof-of-concept machine learning project
Truven OMOP Optum OMOP
Years of Coverage 2000-2017 (17 Yrs) 2000-2017 (17 Yrs)
Total Number of
Unique Patients
128,859,057 152,910,998
Drug Exposures 87,707,616 1,688,515,381
Conditions 672,684,453 1,027,582,860
Procedures 331,188,830 259,052,007
Measurements Not available 511,162,836
Overview of Takeda Data Hub

10.3 million
adults in the United States had
at least one major depressive
episode with severe
impairment1
Real World Problem: Life with Treatment Resistant Depression (TRD)
16.2 million
adults in the United States
have had at least one major
depressive episode1
1. National Institute of Mental Health. 2018. Accessed from: https://www.nimh.nih.gov/health/statistics/major-depression.shtml
2. Souery D, Amsterdam J, deMontigny C, et al. Treatment resistant depression: methodological overview and operational criteria. Eur Neuropsychopharmacol. 1999;9:83–91
with major depressive disorder
(MDD) do not respond to typical
antidepressant medications2
Up to 30% of patients
I am crying. My husband took off work — again — because I
am crying and cannot stop. I've been crying for two days
straight…
Psychiatrists mess with my medication: dial one up and the
other down, then add another. My husband takes Family
Medical Leave to care for our children.
This can be what it’s like to live with treatment-resistant
depression. When it’s bad, it’s bad, and the kids suffer for it.
A day later, I end up in an outpatient mental health center…
I had been "withdrawing from an anti-psychotic," the
doctors discovered, and that’s why I couldn’t stop crying.
(Source: https://www.romper.com/p/i-have-treatment-resistant-depression-this-is-what-its-like-14465)

Source: 1. Hripcsak et al, 2016
Problem
Definition
Model Development
Results
Evaluation
Spotlighted Experiment: Modeling TRD
CHALLENGES
• Patient journey analysis are descriptive by nature.
• Treatment is not one-size-fits-all. Most common first line therapy
varies by location.1
Among patients with depression, can we predict which patients
will switch from a selected antidepressant drug (Escitalopram) or
drug class (SSRIs) to another drug?

Epidemiology Driven Approach
2. Define At-Risk
Population & Case
Definition
Define the phenotype at-
risk/positive indicators of
disease
7. Evaluate Model
Explore and analyze
outputs of model
3. Define a Control
Definition
Define the counterfactual
ideal/control phenotype
6. Run Model
Leverage testing and
training sets to run
model
1. Identify a Diagnosis
of Interest
Define a target
diagnosis, including
clinical code sets
5. Create Testing &
Training Set
Subset available data to
create a testing and
training set
4. Create Feature
Vectors
Transform sample data
into feature vectors
Research
Scientist
9. Share Results
Share findings with
other collaborators
8. Further Analysis
Re-evaluate modeling
parameters and consider
new hypothesis
Problem
Definition
Model Development
Results
Evaluation

Three general phases: design, run, and re-evaluate
2. Define At-Risk
Population & Case
Definition
disease
7. Evaluate Model
Explore and analyze
outputs of model
3. Define a Control
Definition
6. Run Model
model
of Interest
Define a target
clinical code sets
5. Create Testing &
Training Set
training set
4. Create Feature
Vectors
Research
Scientist
9. Share Results
Share findings with
other collaborators
8. Further Analysis
new hypothesis
Problem
Definition
Model Development
Results
Evaluation

Phase 1 (standard) – Define cases and controls
2. Define At-Risk
Population & Case
Definition
disease
7. Evaluate Model
Explore and analyze
outputs of model
3. Define a Control
Definition
6. Run Model
model
of Interest
Define a target
clinical code sets
5. Create Testing &
Training Set
training set
4. Create Feature
Vectors
Research
Scientist
9. Share Results
Share findings with
other collaborators
8. Further Analysis
new hypothesis
Problem
Definition
Model Development
Results
Evaluation

Phase 2 (using new methods) – Create features, train, and run model
2. Define At-Risk
Population & Case
Definition
disease
7. Evaluate Model
Explore and analyze
outputs of model
3. Define a Control
Definition
6. Run Model
model
of Interest
Define a target
clinical code sets
5. Create Testing &
Training Set
training set
4. Create Feature
Vectors
Research
Scientist
9. Share Results
Share findings with
other collaborators
8. Further Analysis
new hypothesis
Problem
Definition
Model Development
Results
Evaluation

Phase 3 (standard) – Evaluate model, refine, consider new hypotheses
2. Define At-Risk
Population & Case
Definition
disease
7. Evaluate Model
Explore and analyze
outputs of model
3. Define a Control
Definition
6. Run Model
model
of Interest
Define a target
clinical code sets
5. Create Testing &
Training Set
training set
4. Create Feature
Vectors
Research
Scientist
9. Share Results
Share findings with
other collaborators
8. Further Analysis
new hypothesis
Problem
Definition
Model Development
Results
Evaluation

Our experiment consists of 2 sub-experiments as defined below
Studying Depression Treatment Switches
Drug Criteria
Treatment Criteria
Patients who have at least 1 diagnosis of
depression
Patients who had at least 1 exposure to an
antidepressant medication
Patients with 1 or more treatment switches
after the index date
Case Selection
Diagnosis Criteria
Control Selection
Patients who never had any treatment
switches after the index date
Patients who had at least 1 exposure to an
antidepressant medication
• Age
• Gender
• Time censoring
Matching
• Matched cases and controls in
1:1 ratio
KNN algorithm
Patients who have at least 1 diagnosis of
depression
Experiment 1A
Depression Treatment Switch- Switch
between any antidepressant drug
(Escitalopram)
Experiment 1B
Depression Treatment Switch- Switch
between any antidepressant drug class
(SSRI)
Indicates improvement added during experimentation
Assumptions:
1. Index date is determined by the first exposure to anti-depressant drug
2. Any patients who had a diagnosis of events related to pregnancy,
schizophrenia, etc. (based on exclusion concept ids from PNAS paper) within
the period of 1 year prior to and 3 years post the index date is excluded
3. Patients must have 1 year of data prior and 3 years of data post index date
4. Antidepressant medication list constructed from the PNAS paper
Problem
Definition
Model Development
Results
Evaluation

Overall Analytical Approach
• Executing on RWD sets in OMOP CDM
• Using logistic regression and Random Forest classifiers to identify top
features
• Use significant feature set to build and train models across a variety of
modeling approaches (logistic regression, Random Forest, Recurrent Neural
Nets [RNNs], Long-Short Term Memory [LSTMs])
Problem
Definition
Model Development
Results
Evaluation

Technical Environment
• Analytical approach
segmented into a series
Jupyter notebooks
• Underlying technologies
include:
• Apache Spark
• Python plus Sklearn
• Tensorflow
Problem
Definition
Model Development
Results
Evaluation
Figure 2. Deep Miner Reference Architecture
VPC
Amazon Elastic
Block Storage
Amazon Elastic
Block Storage
Amazon Elastic
Block Storage
Amazon Elastic
Block Storage
Data Lake
EC2 CPU
Instance
HDFS cluster
ML AMI
(CPU-Based
Classical MLs)
EC2 CPU Instance

Example Infrastructure Specifications
Specifications based on conducting experiment using real world data asset with 10.6 million patient lives:
1. PySpark data processing cluster extracts features with Jupiter notebooks
2. Python model training cluster does feature engineering and linear model training with scikit-learn
3. GPU deep learning cluster speeds up neural network model training with TensorFlow
Type Cluster Specs (*depending on patient cohort size) Quantity CPU Memory EC2/EMR Price per Hour, US West
1
PySpark data
processing cluster
Driver: r3.xlarge (*r3.2xlarge)
Worker: r3.xlarge (*r3.2xlarge), n=1
Spot Bid Price: 20-25% of on-demand price
1
2 x 30.5 GB Memory, 4 Cores,
1 DBU (*2 x 61 GB Memory, 8
Cores, 2 DBU)
2 x $0.371 / $0.090
(*$0.741 / $0.180) (Northern California)
https://aws.amazon.com/emr/pricing/
2
Python model
training cluster
Driver Type: r3.2xlarge (*r3.8xlarge)
Worker Type: none, n=0
1
61 GB Memory, 8 Cores, 2
DBU (*244.0 GB Memory, 32
Cores, 8 DBU)
$0.741 / $0.180 (Northern California)
(*$2.964 / $0.270)
3
GPU deep learning
cluster
Driver Type: p2.xlarge
Worker Type: none (n=0)
1
244.0 GB Memory, 32 Cores, 8
DBU
$0.900 (Oregon)
https://aws.amazon.com/ec2/instance-
types/p2/
Common specs
Cluster Type: Serverless Pool (beta, Python/SQL)
On-demand/Spot Composition: 0 All Spot
On-demand/Spot Composition: 0 All SpotAuto Termination: Terminate after 30 minutes of inactivity
Availability Zone: us-west-2a
IAM Role: s3-deepminerpoc
Problem
Definition
Model Development
Results
Evaluation

Making a Patient’s Individual
Treatment Pathway Machine
Consumable

Feature Engineering: OMOP Concept Maps
From the cohort definition, we
joined multiple elements of
the patient journey (e.g.
condition occurrence, drug
exposure, observations)
leveraging the
interconnectivity created by
the OMOP CDM concept
mappings
Figure 3. Feature Vector Conversion Overview
CDM v5 Data Model
Inputs
The Result: Modeling thousands of
parameters about the patient
simultaneously
Extracted clinical features: 56,590,271
• Conditions: 4,915,475
• Drug: 18,848,074
• Observations: 32,826,722
Concepts: 28,458
Source: OHDSI
Problem
Definition
Model Development
Results
Evaluation

Feature Engineering: Bag of Concepts
Per patient demographics and concepts flattened into a single vector with
a TRUE/FALSE label
Demographics Condition Observation Drug Visit
Ancestor Concept 21604095
Methocarbamol (ATC-5)
Bit Position 12,093
Drug Concept 704946
ASA 325 MG / Methocarbamol
400 MG Oral Tablet
Problem
Definition
Model Development
Results
Evaluation
Figure 4. Bag of Concepts Feature Vector

Feature Engineering: Sequence of Concepts
Per patient demographics and concept(s) per visit form a matrix with a
TRUE/FALSE label
Demographics Condition Observation Drug Visit
Problem
Definition
Model Development
Results
Evaluation
Figure 5. Sequence of Concepts Feature Vector

Creating a Temporal Analysis that Scales for Deep Learning Experiments
Train machine learning
models
(Linear regression,
Random Forest)
Report and extract
top features from
models
Build temporal feature
vector with top features
only
Build feature vector for
data with all concepts
Train Deep
Neural network
(RNN/LSTM)
models
Evaluate
performance
To create intermediate understandable models and cost effective utilization of compute for complex temporal information
linear models were used to reduce the features included in temporal data sets used in deep neural network models
Problem
Definition
Model Development
Results
Evaluation

Testing and Training Set Creation
• Bag of concepts are loaded
in
• Data sets were split into a
10% test / 90% train and
fed into selected models
Problem
Definition
Model Development
Results
Evaluation

The Problem: When Lasso Classifiers and Random Forest models were asked to pick out the top features
contributing to a switch, the machine differentially pulled artifacts attributable to the case cohort being
older than the comparator cohort
Top Features on Optum
Code Name Vocabulary Weight
N06AX21 duloxetine ATC 2
N06AX16 venlafaxine ATC 2
85025 Blood count; complete (CBC), automated (Hgb, CPT4 Hct, RBC,
WBC and platelet count) and
automated differential WBC count
2
N06AB06 sertraline ATC 2
80061 Lipid panel This panel must include the CPT4 following:
Cholesterol, serum, total (82465) Lipoprotein, direct
measurement, high density cholesterol (HDL cholesterol)
(83718)
Triglycerides (84478)
2
N06AB03 fluoxetine ATC 2
N06AA09 amitriptyline ATC 2
1751-7 Albumin serum/plasma LOINC 1
C03AA03 hydrochlorothiazide ATC 1
84443 Thyroid stimulating hormone (TSH) CPT4 1
90657 Influenza virus vaccine, trivalent, split virus, CPT4 when
administered to children 6-35 months of
age, for intramuscular use
1
80053 Comprehensive metabolic panel This panel must CPT4 include
the following: Albumin (82040) Bilirubin,
total (82247) Calcium, total (82310) Carbon
dioxide (bicarbonate) (82374) Chloride (82435)
Creatinine (82565) Glucose (82947)
Phosphatase, alkaline (84075) Pot
1
89240 Unlisted miscellaneous pathology test CPT4 1
81003 Urinalysis, by dip stick or tablet reagent for CPT4 bilirubin,
glucose, hemoglobin, ketones,
leukocytes, nitrite, pH, protein, specific gravity,
urobilinogen, any number of these constituents;
automated, without microscopy
1
93010 Electrocardiogram, routine ECG with at least 12 CPT4 leads;
interpretation and report only
1
Top Features on Truven
300.4 Dysthymic disorder ICD9CM 82
530.8
1 Esophageal reflux ICD9CM 6
401.1 Benign essential hypertension ICD9CM 5
244.9 Unspecified acquired hypothyroidism ICD9CM 4
401.9 Unspecified essential hypertension ICD9CM 3
780.6 Fever, unspecified ICD9CM 2
272.4 Other and unspecified hyperlipidemia ICD9CM 2
723.1 Cervicalgia ICD9CM 2
493.9 Asthma, unspecified type, unspecified ICD9CM 2
722.1
Displacement of lumbar intervertebral disc
without myelopathy
ICD9CM 1
272.2 Mixed hyperlipidemia ICD9CM 1
300 Anxiety state, unspecified ICD9CM 1
305.1 Tobacco use disorder ICD9CM 1
276.8 Hypopotassemia ICD9CM 1
462 Acute pharyngitis ICD9CM 1
486 Pneumonia, organism unspecified ICD9CM 1
920 Contusion of face, scalp, and neck except eye(s) ICD9CM 1
719.4
6 Pain in joint, lower leg ICD9CM 1
729.5 Pain in limb ICD9CM 1
786.5 Chest pain, unspecified ICD9CM 1
218.9 Leiomyoma of uterus, unspecified ICD9CM 1
786.0
5 Shortness of breath ICD9CM 1
562.1
Diverticulosis of colon (without mention of
hemorrhage)
ICD9CM 1
I10 Essential (primary) hypertension ICD10 1
Problem
Definition
Model Development
Results
Evaluation
Making Unbalanced Comparisons

Balancing Imbalance: K-Nearest Neighbor Matching
1 2 3
>=60 days
At least 1 year post-index
Qualifying
Depression Dx
(Index Date)
Case
1 year prior to Dx 1:1 K-nearest neighbor (KNN) match on:
• Age
• Gender
• Time censoring
Who falls out of experiment:
• Patients with less than 2 drug exposures
• Patients with don’t meet minimum
observation period
• Patients who don’t have age or gender
captured
Problem
Definition
Model Development
Results
Evaluation
Experiment 1a
Cases: Patients who had a switch from antidepressant drug (Escitalopram)
Controls: Patients who didn’t switch the antidepressant drug (Escitalopram)
1
Antidepressant
exposure
At least 1 year post-index
Qualifying
Depression Dx
(Index Date)
Control
1 year prior to Dx
Antidepressant exposures

The Impact: Failing to match cases to controls can bias results because of significant differences
in baseline characteristics – cohorts need to be time censored and matched on age, gender and
duration of medical observation
Making Unbalanced Comparisons
Problem
Definition
Model Development
Results
Evaluation

The Problem: The initial experiment was designed to focus on Major Depressive Disorder only. It did not
include related disease terms in case definition.
Initial Case Definition
Patients with a diagnosis
of Major Depressive
Disorder (MDD)
Patients who are on
antidepressants
Patients with a minimum of 2 TRD
treatment failures(Exclude patients
with <=2 treatments)
TRD is when one class of drug is failed
(Treatment failure is when the switch
occurred 60 days or more than the first
drug)
• Citalopram
• Escitalopram
• Fluoxetine
• Fluvoxamine
• Paroxetine
• Sertraline
• Vilazodone • Desvenlafaxine
• Duloxetine
• Levomilnacipran
• Milnacipran
• Venlafaxine
Selective Norepinephrine
Reuptake Inhibitors(SNRI)
• Amitriptyline
• Amoxapine
• Desipramine
• DoxepinImipramine
• Nortriptyline
• Protriptyline
• Trimipramine
Tricyclic Antidepressants
(TCA)
LIST OFANTI-DEPRESSANTS
Selective Serotonin Reuptake Inhibitors
(SSRI)
The Challenge with a Simple Case Definition
Problem
Definition
Model Development
Results
Evaluation

The Problem: When Lasso Classifiers and Random Forest models were asked to pick out the top features
contributing to a switch, the machine essentially “cheated” and prioritized disease synonyms above all else
because of their clinical relatedness to the disease of interest not because it contributed to a switch
Therefore, Machines Will Do What They’re Told
Top Features on Truven
300.4 Dysthymic disorder ICD9CM 82
530.8
1 Esophageal reflux ICD9CM 6
401.1 Benign essential hypertension ICD9CM 5
244.9 Unspecified acquired hypothyroidism ICD9CM 4
401.9 Unspecified essential hypertension ICD9CM 3
780.6 Fever, unspecified ICD9CM 2
272.4 Other and unspecified hyperlipidemia ICD9CM 2
723.1 Cervicalgia ICD9CM 2
493.9 Asthma, unspecified type, unspecified ICD9CM 2
722.1
Displacement of lumbar intervertebral disc
without myelopathy
ICD9CM 1
272.2 Mixed hyperlipidemia ICD9CM 1
300 Anxiety state, unspecified ICD9CM 1
305.1 Tobacco use disorder ICD9CM 1
276.8 Hypopotassemia ICD9CM 1
462 Acute pharyngitis ICD9CM 1
486 Pneumonia, organism unspecified ICD9CM 1
920 Contusion of face, scalp, and neck except eye(s) ICD9CM 1
719.4
6 Pain in joint, lower leg ICD9CM 1
729.5 Pain in limb ICD9CM 1
786.5 Chest pain, unspecified ICD9CM 1
218.9 Leiomyoma of uterus, unspecified ICD9CM 1
786.0
5 Shortness of breath ICD9CM 1
562.1
Diverticulosis of colon (without mention of
hemorrhage)
ICD9CM 1
I10 Essential (primary) hypertension ICD10 1
Top Features on Optum
N06AX21 duloxetine ATC 2
N06AX16 venlafaxine ATC 2
85025 Blood count; complete (CBC), automated (Hgb, CPT4 Hct, RBC,
WBC and platelet count) and
automated differential WBC count
2
N06AB06 sertraline ATC 2
80061 Lipid panel This panel must include the CPT4 following:
Cholesterol, serum, total (82465) Lipoprotein, direct
measurement, high density cholesterol (HDL cholesterol)
(83718)
Triglycerides (84478)
2
N06AB03 fluoxetine ATC 2
N06AA09 amitriptyline ATC 2
1751-7 Albumin serum/plasma LOINC 1
C03AA03 hydrochlorothiazide ATC 1
84443 Thyroid stimulating hormone (TSH) CPT4 1
90657 Influenza virus vaccine, trivalent, split virus, CPT4 when
administered to children 6-35 months of
age, for intramuscular use
1
80053 Comprehensive metabolic panel This panel must CPT4 include
the following: Albumin (82040) Bilirubin,
total (82247) Calcium, total (82310) Carbon
dioxide (bicarbonate) (82374) Chloride (82435)
Creatinine (82565) Glucose (82947)
Phosphatase, alkaline (84075) Pot
1
89240 Unlisted miscellaneous pathology test CPT4 1
81003 Urinalysis, by dip stick or tablet reagent for CPT4 bilirubin,
glucose, hemoglobin, ketones,
leukocytes, nitrite, pH, protein, specific gravity,
urobilinogen, any number of these constituents;
automated, without microscopy
1
93010 Electrocardiogram, routine ECG with at least 12 CPT4 leads;
interpretation and report only
1
Dysthymia, also called persistent depressive disorder, is a
continuous long-term (chronic) form of depression. (Mayo Clinic
2018)
Problem
Definition
Model Development
Results
Evaluation

The Solution: Case definition needed to be more exhaustive. Definition was made more exhaustive and
updated to include the 124 clinical concepts from the PNAS treatment pathways paper.
Caveat: Expanded definition to include additional depression concepts that may not be
considered major depressive disorder. Study design needed to be updated to reflect this
change in case specificity.
Expanding the Case Definition
Problem
Definition
Model Development
Results
Evaluation

Assessing Top Features Contributing to a Drug Switch
Experiment 1a
Problem
Definition
Model Development
Results
Evaluation
Optum Truven
# Cases 38,353 6,824
# Controls 38,353 6,824
# All features 18,831 21,445
# Top features 24 109
Table 1. Experiment 1A: Summary Statistics
Tables 2 & 3. Top Feature Lists Derived from Logistic Regression & Random Forest
Truven Optum
Our Finding: You need a lot of information to understand the relative importance of features

Comparing Top Features Across Data Sets
Optum Truven
Acute pharyngitis Abdominal pain
Acute sinusitis Acute bronchitis
Acute upper respiratory infection Acute pharyngitis
Alanine aminotransferase
serum/plasma
Acute sinusitis, unspecified
Anxiety disorder Acute upper respiratory infections of
unspecified site
Benign essential hypertension Allergic rhinitis, cause unspecified
Carbon dioxide serum/plasma Anxiety state, unspecified
Cholesterol [Mass/volume] in Serum or
Plasma
Atrial fibrillation
Cholesterol in LDL [Mass/volume] in
Serum or Plasma by calculation
Benign essential hypertension
Cough Chest pain
Dizziness and giddiness Chest pain, unspecified
Essential hypertension End stage renal disease
Gastroesophageal reflux disease Essential hypertension
Generalized anxiety disorder Gastroesophageal reflux disease
Headache
Headache
Insomnia Hyperlipidemia
Low back pain Lumbago
Malaise and fatigue Malaise and fatigue
Neck pain Other and unspecified hyperlipidemia
Pain in limb Pure hypercholesterolemia
Protein serum/plasma Pure hypercholesterolemia
Pure hypercholesterolemia Tobacco dependence syndrome
Shoulder joint pain Type 2 diabetes mellitus
Triglyceride [Mass/volume] in Serum or
Plasma
Unspecified essential hypertension
Urinary tract infectious disease Urinary tract infection, site not specified
Highlighted Findings:
• Both data sets aligned against similar key
features and multiple synonyms
• Feature lists reflect artifacts of the data
domain density
• Optum data included more procedures vs.
diagnosis data but identified synonyms
for the same factors
Alphabetical List of Top Features by Data Set
Problem
Definition
Model Development
Results
Evaluation

Clinical Explanations for Top Features
Experiment 1a
Problem
Definition
Model Development
Results
Evaluation
Optum Truven
# Cases 38,353 6,824
# Controls 38,353 6,824
# All features 18,831 21,445
Top Feature Lists Derived from Logistic Regression & Random Forest – Bucketed by Explanation
Factors Related to Contact w/ Healthcare Providers:
• Acute sinusitis
• Acute upper respiratory infections
• Allergic rhinitis
• Acute bronchitis
• Acute pharyngitis
• Urinary tract infectious disease
• Cough
• Alanine transaminase serum/plasma
• Triglyceride in serum
• Cholesterol in serum
• Protein serum/plasma
• Carbon dioxide serum/plasma
Factors Related to Potential Continuing Symptoms:
• Malaise and fatigue
• Lumbago / low back pain
• Generalized anxiety disorder
• Chest pain
• Headache
• Insomnia
• Shoulder joint pain
• Neck pain
• Type II diabetes
• Atrial fibrillation
Factors Related to Potential Side Effects:
• Abdominal pain
• Essential hypertension (unspecified & benign)
• Headache
• End stage renal disease
• Pure hypercholesterolemia
• Gastroesophageal reflux disease
• Hyperlipidemia
• Pain in limb
• Anxiety
• Dizziness and giddiness
• Carbon dioxide serum/plasma
Other Factors:
• Tobacco dependence syndrome
Our Finding: Machine chosen features reflect the clinical story associated with depression

Understanding the Relative Importance of Features
Problem
Definition
Model Development
Results
Evaluation
Experiment 1a
Model meta parameters Truven Counts
# of selected cases 6,824
# of selected controls 6,824
# of All features 21,445
# of filtered Top features 109
Our Finding: Some machine-chosen factors may be protective indicating a prevention of a switch

Table 2. Experiment 1A: Summary of Traditional Model Performance
Accuracy Score (%)
Models used to filter significant features Optum Truven
Random Forest 62.0 51.3
Logistic Regression 57.2 51.9
Accuracy Score (%)
Models trained on significant features Optum Truven
Random Forest 58.7 50.0
Logistic Regression 53.5 52.3
Grid Search (LASSO, Ridge, Elastic net) 51.9 50.2
Gradient Boosting Machines 56.1 52.7
Accuracy Score (%)
# of
Epochs Run
Model Learning Iteration Optum Truven
50 RNN AVG w/ no Temporal
Aggregation
71.7 80
50 RNN Temp Agg – Run 1 54.3 54.6
50 RNN Temp Agg – Run 2 54.3 55.2
50 RNN Temp Agg – Run 3 54.9 55.1
50 RNN Temp Agg –AVERAGE 54.5 55.0
50 LSTM Avg w/ no Temporal
Aggregation
75.3 77.6
50 LSTM Temp Agg – Run 1 57.6 59.7
50 LSTM Temp Agg – AVERAGE 58.1 58.8
Table 3. Experiment 1A: Summary of
Deep Learning Models Trained on Significant Features
Optum Truven
# Cases 38,353 6,824
# Controls 38,353 6,824
# All features 18,831 21,445
Experiment 1a
Our Finding: RNNs and LSTMs are viable methods to model
treatment switching but require significant care in feature vector
preparation to ensure methods do not overestimate accuracy
Overall Model Performance
Problem
Definition
Model Development
Results
Evaluation

Depression Treatment Drug Switch Summary
• Out of >18,000 combinations of drugs, procedures and diagnoses, a machine-driven approach finds
that <100 of these features had a significant impact on potential for treatment switching
• Factors commonly contributing to treatment switches include: malaise and fatigue, hypertension, low
back pain, headache, anxiety disorder, high cholesterol
• When compared to Random Forest, Logistic Regression, Grid Search and Gradient Boosting Machines
(Average Accuracy: 55.1%), RNNs & LSTMs may be more capable (Accuracy: 58-60%) in their ability to
accurately predict events but are susceptible to bias based on feature vector construction and temporal
spacing.
• Sequence driven models such as RNN/LSTM require additional tools to establish explainability. The use
of techniques such as a generative adversarial network approach, with clinical adjudication, can provide
an understanding of which clinical sequences are driving model scoring.
Highlighted Findings
Problem
Definition
Model Development
Results
Evaluation

Real World Problem: Life with Non-Alcoholic Steatohepatitis (NASH)
9 to 18 million (2-5)
Americans nationwide have NASH1
Up to 16% of liver transplants
in the U.S. are due to NASH2
By 2020, NASH is projected to
overtake Hepatitis C
as the leading cause of liver
transplants in the U.S.2
Young to middle-aged (average age at
diagnosis is 46 years)
Commonly also has:
• Obesity
• High blood pressure
• High cholesterol
• Type-2 diabetes
Often unable to work due to health
issues
Anxious and afraid of disease
progressing into liver cancer
The Average Patient
1. National Institutes of Health. 2018. Accessed from: https://www.niddk.nih.gov/health-information/liver-disease/nafld-nash
2. Michael Charlton, MD. Cirrhosis and Liver Failure in NAFLD: Molehill or Mountain?

EXPERIMENT 2:
Understanding Onset of
NASH + Non-Alcoholic Fatty
Liver Disease (NAFLD)
Highlighted FindingsDeep Miner Supervised Model Training and Validation
Even with limited data, can we identify what medications,
diagnoses or procedures indicate a patient who will develop
NASH?
TRUVEN OPTUM
# Cases 21,229 15,986
# Controls 20,025 15,308
# All features 61,375 35,164
Average Accuracy Score (%) for Models
Trained on Top Features for Experiment 2A
TRUVEN OPTUM
Classical Models* 65.7% 74.4%
Recurrent Neural Nets (RNN) 69.7% 79.0%
Long-Short Term Memory (LSTM) 75.3% 74.3%
Experiment 2a
Cases: NASH
Controls: No NAFL and No NASH
Experiment 2b**
Cases: NAFL patients who become NASH in future
Controls: NAFL patients who never become NASH
patients
(**Tested twice with two distinct sets of control logic)
• Potential factors contributing to
developing NASH include:
Esophageal varices without mention
of bleeding, esophageal reflux,
Cervicalgia (neck pain), obstructive
sleep apnea, use of hydrocodone, use
of hydrochlorothiazide
• Even with limited samples and
sparse matrices of observational
information, we can still develop a
model with high accuracy (RNN
Accuracy: 79%) to predict cases
that will become NASH
Data Sets
for Experiment 2A

Using Generative Adversarial Approach to Identify High Scoring Sequences
LSTM/RNN
Model
Generative
Synthetic
Data/Sequence
Scored sequences
Feedback
Iterations
High scoring sequences of interest

Lessons Learned
Designing the right classification model is just as important as
predicting an occurrence of a disease
Scalable models depend on use of agreed on common analytic plans and
common outputs
Deep learning models can look at large, sparse matrices to identify potential
switching patterns and classifying better than traditional approaches
Differences in real world data sets lead to differences in results – both an
advantage and a disadvantage depending on the question

“The So What”: Long Term Impact to Drug Discovery
can arm researchers with new insights on the complexities of
the patient journey that can proactively inform drug development and clinical trial planning

The Future of Scalable Models
Externally Validate
Prediction Models
on Additional RWD
Extend Analysis
Framework to RWD
with Linked-Genetics
& Genomics Data
Perform Empirical
Calibration on
Experiments to
Evaluate Potential Bias

Thank you!
Dan Housman
dhousman@deloitte.com

Patient-Focused Data Science: Machine Learning for Complex Diseases (AIM203-S) - AWS re:Invent 2018

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Patient-Focused Data Science: Machine Learning for Complex Diseases (AIM203-S) - AWS re:Invent 2018

Similaire à Patient-Focused Data Science: Machine Learning for Complex Diseases (AIM203-S) - AWS re:Invent 2018 (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Patient-Focused Data Science: Machine Learning for Complex Diseases (AIM203-S) - AWS re:Invent 2018