1. Data Supporting Precision Oncology
Warren A. Kibbe, Ph.D.
Professor, Biostats & Bioinformatics
Director of Biomedical Informtics, Duke CTSI
Chief Data Officer, Duke Cancer Institute
warren.kibbe@duke.edu
@wakibbe
#DataSharing
#LearningHealthSystem
#DataHarmonization
2. The World is Changing
Pace of Commercialization
Reach of Markets
Role of Data
Change in Healthcare
Change in Computing
Societal Changes
3. Understanding Cancer
Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and
clinical presentation and direct effective, evidence-
based prevention and treatment.
Ramifications across many
aspects of health care
4. Changes in Oncology
• Cancer is a grand challenge
• Anatomic vs molecular classification
• Health vs Disease
5. Our ability to generate biomedical
data continues to grow in terms of
variety and volume
Current sources of data
molecular genome pathology imaging labs notes sensors
icons by the Noun Project
14. Convergence
Machine Learning and Deep
Learning approaches are
enabled by changes in
instrumentation, digitalization,
computation, technology.
15. Best Practices
• For software, behavior driven
development, true DevOps (security
first), fully traceable software
development (tests, and deployment,
testing and validation harness),
open APIs, and validation of
algorithms including learning models
are not just feasible but critical to
overall validation.
16. Boeing 737 MAX
Example of where good software
practices were undermined by
changing the behavior of the software
without changing the scope of testing
21. “Science, informatics, incentives, and
culture are aligned for continuous
improvement and innovation, with best
practices seamlessly embedded in the
delivery process and new knowledge
captured as an integral by-product of the
delivery experience.”
—Institute of Medicine
LEARNING HEALTH SYSTEMS
22. Another imperative is that such systems
do their work:
• Transparently (how does one learn
without well documented processes?)
• Reproducibly (good practices must
always be repeatable at scale and
scientifically reproducible)
• Only with the above can the science in
“data science” be done with sufficient
rigor
LEARNING HEALTH SYSTEMS
32. 32
Blue Ribbon Panel Report
Cancer Moonshot℠ Blue Ribbon Panel
“The Cancer Moonshot Task Force was
directed to consult with external experts
from relevant scientific sectors, including
the presidentially appointed National
Cancer Advisory Board(NCAB).
A Blue Ribbon Panel of scientific experts
was created to advise the NCAB.”
33. Vision:
Enable the creation of a Learning Healthcare System for
Cancer, where as a nation we learn from the contributed
knowledge and experience of every cancer patient. As
part of the Cancer Moonshot, we want to unleash the
power of data to enhance, improve, and inform the journey
of every cancer patient from the point of diagnosis
through survivorship.
34. Data Sharing and the FAIR Principles
FAIR –
Making data
Findable,
Accessible,
Attributable,
Interoperable,
Reusable,
and provide Recognition
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
35. 35
NCI Cancer Research Data Commons (CRDC) - Concept
NCI Scope: “Create a data
science infrastructure necessary
to connect repositories, analytical
tools, and knowledge bases”
Data commons co-locate data,
storage and computing
infrastructure with commonly
used services, tools & apps for
analyzing and sharing data to
create an interoperable resource
for the research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A
Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science
and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons
infrastructure at the University of Chicago Kenwood Data Center.
36. 36
Data Commons Framework
Clinical Proteomics ImagingGenomics Immuno-
oncology
Animal Models Cancer Biomarkers
NCI Cancer Research
Data Commons
SBG CGC
Broad FireCloud ISB CGC
Elastic
Compute
Query
Visualization
Clinical Proteomics Tumor
Analysis Consortium*
Tool
Deployment
The Cancer Imaging Archive*
TCIA
Web
Interface
APIs Data
Submission
Authentication
& Authorization
Authentication
& Authorization
Data Models &
Dictionaries
Computational
Workspaces
Data Contributors and Consumers
Tool
Repositories
Metadata
Validation
& Tools
Analysis
Courtesy NCI-CBIIT
39. Data Harmonization
• The process of semantic and
syntactic mapping of data to a set of
definitions, predefined data
elements, data model.
• Validation and Harmonization of
primary and secondary data is crucial
to enable analysis and reuse
40. Spanning the Semantic Chasm of Despair
Building a Translational Bridge
CD2H
Thanks to Melissa Haendel
41. Project Highlight: Harmonizing clinical data models
Sentinel
I2b2/ACT
OMOP
PCORNET
▪ Different countries use different “outlets”.
▪ There is a need for travel adapters.
The Solution:
▪ Use a converter between various adapters.
▪ Allow researchers to ask a question once and
receive results from many different sources
42. Project Highlight: LOINC2HPO
◆ Develop a software tool to map
LOINC codes to HPO terms
◆ Develop software to convert
EHR observations into HPO
terms for use in clinical
research
Steps
Develop a tool for converting LOINC laboratory codes and values into more
phenotypically meaningful language (Human Phenotype Ontology) to allow for
translational interoperability and new analytics
2657-5 “Nitrite [Mass/volume] in Urine” Numeric
20407-3 “Nitrite [Mass/volume] in Urine by Test
strip”
Numeric
32710-6 “Nitrite [Presence] in Urine” Positive/Negati
ve
5802-4 “Nitrite [Presence] in Urine by Test strip” Positive/Negati
ve
50558-6 “Nitrite [Presence] in Urine by
Automated test strip
Positive/Negati
ve
LOINC Outcome
HPO: Nitrituria
43. INSERT CDE Browser Screenshot?
CIBMTR Center for Cancer
Research
Over 35 NCI Programs, Plus
Cancer Centers and Consortia
GDC
44. Data Sharing Index
• We need metrics for data, software,
algorithm use, usability, conformance
• Data sharing stimulates science,
innovation, commercialization
• Providing recognition and attribution
to data providers and software &
algorithm builders is critical for a
robust data sharing ecosystem
• Support and measure FAIRness!