1. The Vision for Data @ the NIH
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
National Institutes of Health
Bio-IT World, Boston
April 21, 2015
2. Office of Biomedical
Data Science
Mission Statement
To foster an open ecosystem that
enables biomedical research to be
conducted as a digital enterprise that
enhances health, lengthens life and
reduces illness and disability & to
train the next generation of data
scientists
Goals expanded from recommendations in the June 2012 DIWG and
BRWWG reports.
4. 1. We are at a Point of Deception …
Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
5. 1. We Are At a Point of Deception
The 6D Exponential Framework
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
6. 2. Democratization Will Follow
The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_
Phil_Bourne
Stephen Friend
9. “And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
10. Precision Medicine Initiative
Vision: Build a broad research program to encourage
creative approaches to precision medicine, test them
rigorously, and, ultimately, use them to build the
evidence base needed to guide clinical practice.
Near Term: apply the tenets of precision medicine to a
major health threat – cancer
Longer Term: generate the knowledge base necessary
to move precision medicine into virtually all areas of
health and disease
11. Precision Medicine Initiative
National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
Participants will be centrally involved in design and
implementation of the cohort
They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
12. National Research Cohort:
What Early Success Might Look Like
A real test of pharmacogenomics—right drug at the right
dose for the right patient
New therapeutic targets by identifying loss-of-function
mutations protective against common diseases
– PCSK9 for cardiovascular disease
– SLC30A8 for type 2 diabetes
Resilience – finding individuals who should be ill but aren’t
New ways to evaluate mHealth technologies for
prevention/management of chronic diseases
13. Precision Medicine:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Now
– Though woman’s glucose control has been suboptimal,
doctor renews her prescription for drug often used for type 2
diabetes
– Continues to monitor blood glucose with fingersticks and
glucometer, despite dissatisfaction with these methods
14. Precision Medicine:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Future: + 2 years
– Volunteers for new national research network
• Sample of her DNA, along with her health information,
sent to researchers for sequencing/analysis
• Can view her health/research data via smartphone
– Agrees to researchers’ request to track her glucose levels
via tiny implantable chip that sends wireless signals to her
watch, researchers’ computers
• Using these data, she changes diet,
medicine dose schedule
15. Other Diseases:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Future: + 5 years
– Receives word from her doctor about a new drug based
upon improved molecular understanding of type 2 diabetes
– When she enters drug’s name into her smartphone’s Rx
app, her genomic data show she’ll metabolize the drug
slowly
• Her doctor alters the dose accordingly
16. Other Diseases:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Future: + 10 years
– Celebrates her 60th birthday and reflects with her family
about how proud she is to be part of cohort study
– Her glucose levels remain well controlled; she’s suffered no
diabetes-related complications
– Her children decide to volunteer for cohort study
18. The BD2K Program is Central
to the Mission
Planned – Black; Available- Green
19. Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
20. Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
Virtuous
Research
Cycle
22. Big Data: The study involved
MRI images & GWAS data
from over 30,000 people
Collaboration: Data came
from many different sights
affiliated with the ENIGMA
consortium
Methods: To homogenize
data from different sites, the
group designed standardized
protocols for image analysis,
quality assessment, genetic
imputation, and association
Found five novel genetic
variants
Results provided insight into
the variability of brain
development, and may be
applied to study of
neuropsychiatric dysfunction
23. Community – Enigma, BD2K
Policy
– Improved consent methods
– Cloud accessibility for human subjects data
– Trusted partners
– Data sharing
Infrastructure
– Standards, compute resources, software
24. Communities: Thus Far
Visioning workshop convened 9/3/14
Launched BD2K ($32M)
– 12 Centers of data excellence
– Data Discovery Index Coordination Consortium
(DDICC)
– Training awards
First successful consortia meeting 11/3-4
Workshops to inform future funding
– Software indexing and discoverability
– Gaming
25. Communities: 2015 Activities
New FOAs with outreach to new
communities – math, stats, comp science etc.
Work with e.g GA4GH, RDA, FORCE11,
NDS ….
IDEAS lab with NSF
Competition with international funders
Software carpentry, hackathons, Pi Day
26. Communities: Questions?
Societies of the modern age?
How to enable these groups?
How to marry the funding of individuals with
the funding of communities?
27. Policies: Now & Forthcoming
Data Sharing
– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
28. Policies - Forthcoming
Data Citation
– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib
sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
dbGaP in the cloud (done!)
31. The Commons: Compute Platforms
The Commons
Conceptual Framework
Public Cloud
Platforms
Super Computing
(HPC) Platforms
Other
Platforms ?
Google, AWS (Amazon)
Microsoft (Azure), IBM,
other?
In house compute
solutions
Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
Traditionally low access
by NIH
33. Infrastructure: Standards
2013 Workshop on Frameworks for Community-
Based Standards
August 2014 Input on Information Resources for
Data-Related Standards Widely Used in Biomedical
Science – 30 responses
Feb 2015 Workshop Community-based Data and
Metadata Standards
Internal CDE Registry project
34. Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
35. Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
37. Problem: Lack of
Biomedical Data Science Specialists
BD2K T32/T15
BD2K K01
Career path workshops – eg AAU
Challenge
model of
funding
38. Problem: Limited Access to
Data Science Training
BD2K R25
Metadata
for training
materials
Community-sourced
cataloging and indexing
of training opportunities
Measure utility
NIH Workforce Development Center
RFA-ES-15-004
39. I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
40. Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
rogrammatic Theme
Deliverable
Example Features • IC’s
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
Scientific Data Council External Advisory Board
Training
Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124
http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328
Photos: FC tweet; RK screen grab
Draft vision statement taken from Collins/Varmus NEJM article, pg 1, colum2, last sentence
Images that reflect both cancer and cohort study
Images of people from Infographic (NOTE: Image is just a placeholder—Jill will tweak)
Detailed Notes:
National Research Cohort <<OR name of study>>
>1 million U.S. volunteers committed to participating in research
Will combine a number of existing cohorts
Will include Dept of Veterans Affairs Million Veteran Program—note Veteran is singular per http://www.research.va.gov/MVP/
Images--pharmacogenomics—helix on pill, bottle, prescription pad, Drug targets—DNA helix, whatever we might have used for PCSK9, unidentifiable pills/capsules; smart phone, the “fall prevention” watch from gerontology meeting, other mobile health images
Molecular structure structure of PCSK9 from NIBIB database.
Images: Overweight woman, someone testing blood sugar, pills for metformin or other diabetes drug?
Images: DNA sequencing, readouts, pills, maybe the DNA on pill bottle that we’ve used for pharmacogenomics.