Secure Data Sharing and Related Matters – An NIH View
1. Secure Data Sharing and Related
Matters – An NIH View
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
National Institutes of Health
October 26, 2015
2. Disclaimer…
I am not a cybersecurity expert, and
as an informatician previously
working primarily in the pre-clinical
space not an expert in security
associated with human subjects
3. Conversation Cards
What is happening that makes a
discussion of security important
What is secure anyway?
How is the NIH responding in this
changing landscape
4. Conversation Cards
What is happening that makes a
discussion of security important
What is secure anyway?
How is the NIH responding in this
changing landscape
5. Big Data in the Life Sciences …
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
7. The History of Computational
Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
9. We are at a Point of Deception …
Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
11. We Are At a Point of Deception
The 6D Exponential Framework
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
12. What Are Some General Implications
of Such a Future?
Open collaborative science becomes of increasing
importance
The value of data and associated analytics becomes
of increasing value to scholarship
Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
Cooperation between funders will be needed to
sustain the emergent digital enterprise
Current training content and modalities will not match
supply to demand
Balancing accessibility vs security becomes more
important yet more complex
13. An Example of That Promise:
Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
14. “And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
15. Precision Medicine Initiative
National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
Participants will be centrally involved in design and
implementation of the cohort
They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
16. Conversation Cards
What is happening that makes a
discussion of security important
What is secure anyway?
How is the NIH responding in this
changing landscape
17. For the Purposes of this NIH Centric
Digital Discussion:
What is Secure Anyway?
Access to digital research objects
when, how, and by whom are
authorized to access them in
accordance of the wishes of the
owner and/or laws and policies which
define accessibility
18. Some of the Complexities
Research objects
– Narrative
– Data – preclinical and
clinical
– Software
– Publications
Owner
– Individual
– Institution
– Funding agency
– Third party
Governance
– Federal
– Funding agency
– Institutional
– Third party
19. Conversation Cards
What is happening that makes a
discussion of security important
What is secure anyway?
How is the NIH responding in this
changing landscape
23. “The HGP changed the norms around data sharing
in biomedical research.”
“The HGP changed the norms around data sharing
in biomedical research.”
24. Data Sharing Goes Global: GA4GH
Global Alliance for Genomics and
Health
Accelerating the potential of genomic medicine to
advance human health, by:
– Establishing common framework of approaches to enable
effective, responsible sharing of genomic and clinical data
– Catalyzing data sharing projects that drive and demonstrate
value of data sharing
Alliance*: >350 leading institutions (healthcare, research,
advocacy, life science, IT) representing 35 countries
Working groups (Clinical, Data, Security, Regulatory &
Ethics) assess, prioritize needs
– Form task teams to produce tools, solutions, demonstration
projects
*Statistics as of October 5, 2015
26. A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
27. Guiding Principle of NIH GWAS Policy
The greatest public benefit will be
realized if data from GWAS are made
available, under terms and conditions
consistent with the informed consent
provided by individual participants, in a
timely manner to the largest possible
number of investigators.
NIH expectation that data would be shared in the
NIH database of Genotype and Phenotype (dbGaP)
29. A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
30. NIH Public Access Policy for Publications
Ensures public access to published results of all
research funded by NIH since 2008
– Recipients of NIH funds required to submit final peer-
reviewed journal manuscripts to PubMed Central (PMC)
upon acceptance for publication
– Papers must be accessible to the public on PMC no later
than 12 months after publication
31. A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
32. Harnessing Data to Improve Health:
BD2K (Big Data to Knowledge)
NIH’s 6-year initiative to use data science to foster an
open digital ecosystem that will accelerate efficient,
cost-effective biomedical research to enhance health,
lengthen life, and reduce illness and disability
Programs and activities:
Advance discovery for biomedical research
Facilitate use and re-use of biomedical data
Develop analytical methods and software
Enhance biomedical data science training
33. A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
34. NIH Genomic Data Sharing (GDS)
Policy
Purpose
– Sets forth expectations, responsibilities that ensure broad,
responsible sharing of genomic research data in a timely
manner
Scope
– All NIH-funded research generating large-scale human or
non-human genomic data – and their use for subsequent
research
• Data to be submitted to NIH-designated data repositories
(e.g., dbGaP, GEO, GenBank, WormBase, FlyBase, Rat
Genome Database)
– Applies to all funding mechanisms (grants, contracts,
intramural support) with no minimum threshold for cost
Released August 2014; effective January 25, 2015
gds.nih.gov
35. A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
36. Modernizing NIH Clinical Trials
Activities:
The Need
NIH-Funded trials published within 100 months of
completion
Less than 50% published within 30 months of completion
BMJ 2012;344:d7292
38. Increasing Clinical Trial Transparency
Proposed November 2014; Final Spring 2016 (est.)
Notice of Proposed Rulemaking: Clinical Trials
Registration and Results Submission (FDAAA, Section
801)
– Further implements statutory requirements on private and
public sponsors to register; report results on phase 2, 3,
and 4 trials
– Includes drugs, biologics, and devices (except small
feasibility)
Draft NIH Policy on Clinical Trial Information
Dissemination
– Extends Section 801 requirements to all NIH-funded clinical
trials
– Includes phase 1 trials and trials of non-FDA regulated
interventions such as behavioral trials
42. The Commons
Digital Object Compliance: FAIR
Attributes of digital objects in the Commons
Initial Phase
• Unique digital object identifiers of some type
• A minimal set of searchable metadata
• Physically available in a cloud based Commons provider
• Clear access rules (especially important for human subjects data)
• An entry (with metadata) in one or more indices
– Future Phases
• Standard, community based unique digital object identifiers
• Conform to community approved standard metadata for enhanced
searching
• Digital objects accessible via open standard APIs
• Are physically and logical available to the commons
43. BD2K Targeted Software Topics
Supports innovative analytical methods and software tools
that address critical current and emerging needs of the
biomedical research
2015 Topics (18 awards, U01s)
– Data Compression
– Data Provenance
– Data Visualization
– Data Wrangling
2016 Topics (U01s, under review)
– Data Privacy
– Data Repurposing
– Applying Metadata
– 2016: Crowdsourcing and interactive Digital Media
(UH2)
44. I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
46. NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/
Notes de l'éditeur
16 million hospital inpatient events (24.5% of total), 35 million outpatient clinic events (53.6% of total) and 14 million emergency
department events (21.9% of total
Photos: FC tweet; RK screen grab
Images of people from Infographic (NOTE: Image is just a placeholder—Jill will tweak)
Detailed Notes:
National Research Cohort <<OR name of study>>
>1 million U.S. volunteers committed to participating in research
Will combine a number of existing cohorts
Will include Dept of Veterans Affairs Million Veteran Program—note Veteran is singular per http://www.research.va.gov/MVP/
“As biology’s first large-scale project, the HGP paved the way for numerous consortium-based research ventures. The NHGRI alone has been involved in launching more than 25 such projects since 2000. These have presented new challenges to biomedical research — demanding, for instance, that diverse groups from different countries and disciplines come together to share and analyse vast data sets.”
“The HGP changed the norms around data sharing in biomedical research.”
2013 White House Initiative: “Increasing Access to the Results of Federally Funded Scientific Research”
Updated to include numbers through September 2015.
From Dina Paltoo [10/6/15]: “The data in the first slide is for all of dbGaP 2007-2014. The information came from a version of what is on the GDS website (https://gds.nih.gov/19dataaccesscommitteereview_dbGaP.html) and in a Nature Genetics paper (http://www.nature.com/ng/journal/v46/n9/full/ng.3062.html), but results from information that we receive from NCBI.”
The NIH Public Access Policy implements Division F Section 217 of PL 111-8 (Omnibus Appropriations Act, 2009).
http://publicaccess.nih.gov/policy.htm
OSP’s summary:
The NIH Public Access Policy for publications has been in a requirement for all recipients of NIH funds since 2008. It implements Division G, Title II, Section 218 of PL 110-161 (Consolidated Appropriations Act, 2008). The NIH Public Access Policy ensures that the public has access to the published results of NIH-funded research. It requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to the digital archive PubMed Central (PMC) upon acceptance for publication. Scientists can also deposit papers through partnerships NIH has established with publishers. To help advance science and improve human health, the Policy requires that NIH supported papers are accessible to the public on PMC no later than 12 months after publication.
Updated by ADDS group 8/25/15
Figure 2. Cumulative percentage of studies published in a peer reviewed biomedical journal indexed by Medline during 100 months after trial completion among all NIH funded clinical trials registered within ClinicalTrials.gov
Public benefits to clinical trials data-sharing (OSP):
Inform future research and research funding decisions
Mitigate bias (e.g., non publication of results, especially negative results)
Prevent duplication of unsafe trials
Meet ethical obligation to human subjects (i.e., that results inform science)
Increase access to data about marketed products
All contribute to public trust in clinical research
Source: Ross JS, Tse T, Zarin DA, Xu H, Zhou L, Krumholz HM. Publication of NIH funded trials registered in ClinicalTrials.gov: cross-sectional analysis. BMJ 2012;344:d7292.
Text updated by Sarah Carr [10/7/2015] – also changed order to feature NPRM before Draft NIH Policy.
Nearly 900 Comments received on PPRM: Many simply stating broad support
Final Rule expected Spring 2016
Section 801 of the Food and Drug Administration Amendments Act (FDAAA)