SlideShare une entreprise Scribd logo
1  sur  43
Data at the NIH: Some Early Thoughts
Philip E. Bourne Ph.D.
Associate Director for Data Science
National Institutes of Health
http://www.slideshare.net/pebourne/
Background
 Research in computational biology…
 Co-directed the RCSB Protein Data Bank (1999-
2014)
 Co-founded PLOS Computational Biology; First EIC
(2005 – 2012)
 With NIAID:
– Collaborator on the IEDB Project (Sette)
Disclaimer: I only started March 3,
2014
…but I had been thinking about this prior to my
appointment
http://pebourne.wordpress.com/2013/12/
Numberofreleasedentries
Year
Motivation for Change:
PDB Growth in Numbers and Complexity
[From the RCSB Protein Data Bank]
Reminder of Why the ADDS
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Motivation for Change:
We Are at the Beginning
Motivation:
We Are at an Inflexion Point for Change
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
From the Second Machine Age
From: The Second Machine Age: Work, Progress, and
Prosperity in a Time of Brilliant Technologies by Erik
Brynjolfsson & Andrew McAfee
Much Useful Groundwork Has Been
Done
NIH Data & Informatics WorkingNIH Data & Informatics Working
GroupGroup
In response to the growth of large biomedicalgrowth of large biomedical
datasetsdatasets, the Director of NIH established a
special Data and Informatics Working Group
(DIWG).
Big Data to Knowledge (BD2K)Big Data to Knowledge (BD2K)
1. Facilitating Broad Use
2. Developing and Disseminating Analysis
Methods and Software
3. Enhancing Training
4. Establishing Centers of Excellence
http://bd2k.nih.gov
Currently…
 Data Discovery Index – under review
 Data Centers – under review
 Training grants – RFA’s issued; under review
 Software index – workshop in May
 Catalog of standards – FOA under development
This is just the beginning…
Some Early Observations
Some Early Observations
1. We don’t know enough about how existing data are
used
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity for
H1N1 Influenza related structures
3B7E: Neuraminidase of A/Brevig Mission/1/1918
H1N1 strain in complex with zanamivir
[Andreas Prlic]
Consider What
Might Be Possible
We Need to Learn from Industries Whose
Livelihood Addresses the Question of Use
Some Early Observations
1. We don’t know enough about how existing data are
used
2. We have focused on the why, but not the how
2. We have focused on the why, but
not the how
 The OSTP directive is the why
 The how is needed for:
– Any data that does not fit the existing data resource model
• Data generated by NIH cores
• Data accompanying publications
• Data associated with the long tail of science
Some Early Observations
1. We don’t know enough about how existing data are
used
2. We have focused on the why, but not the how
3. We do not have an NIH-wide sustainability plan for
data (not heard of an IC-based plan either)
3. Sustainability
 Problems
– Maintaining a work force – lack of reward
– Too much data; too few dollars
– Resources
• In different stages of maturity but treated the same
• Funded by a few used by many
– True as measured by IC
– True as measured by agency
– True as measured by country
• Reviews can be problematic
Some Early Observations
1. We don’t know enough about how existing data are
used
2. We have focused on the why, but not the how
3. We do not have an NIH-wide sustainability plan for
data (not heard of an IC-based plan either)
4. Training in biomedical data science is spotty
Some Early Observations
1. We don’t know enough about how existing data are
used
2. We have focused on the why, but not the how
3. We do not have an NIH-wide sustainability plan for
data (not heard of an IC-based plan either)
4. Training in biomedical data science is spotty
5. Reproducibility will need to be embraced
47/53 “landmark” publications
could not be replicated
[Begley, Ellis Nature,
483, 2012] [Carole Goble]
Enough of the problems what about
some solutions….
Associate Director for Data Science
Commons
Training
Center
BD2K
Modified
Review
Sustainability* Education* Innovation* Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Communication
Collaboration
rogrammatic Theme
Deliverable
Example Features • IC’s
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
Scientific Data Council External Advisory
Board
* Hires made
Solution: The Power of the Commons
Data
The Long Tail
Core Facilities/HS Centers
Clinical /Patient
The Why:
Data Sharing Plans
The
Commons
Government
The How:
Data
Discovery
Index
Sustainable
Storage
Quality
Scientific
Discovery
Usability
Security/
Privacy
Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment
The End Game:
KnowledgeNIH
Awardees
Private
Sector
Metrics/
Standards
Rest of
Academia
Software Standards
Index
BD2K
Centers
Cloud, Research Objects,
What Does the Commons Enable?
 Dropbox like storage
 The opportunity to apply quality metrics
 Bring compute to the data
 A place to collaborate
 A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-
1024x825.png
Commons Timeline
 Spring/Summer 2014: DS group are gathering
information about activities and needs from ICs (and
outside communities).
– Shared interests in developing cloud-based biomedical
commons.
– Investigating potential models of sustainability.
– Exploring metrics of usefulness and success.
 Fall 2014: Develop possible pilots to explore
options in addition to those already being
implemented by some ICs.
Solution: Process – Modified Review
 Possible Solutions
– Establish a central fund to support
– The 50% model
– New funding models eg open submission and review
– Split innovation from core support and review separately
– Policies for uniform metric reporting
– Discuss with the private sector possible funding models
– More cooperation, less redundancy across agencies
– Bring foundations into the discussion
– Discuss with libraries, repositories their role
– Educate decision makes as to the changing landscape
Solution: Education
 Raise awareness among stakeholders eg senior
academic leadership
 Catalog existing intramural and extramural training
efforts
 Define a data science curriculum
 Consider one or more regional training centers (cf
Cold Spring Harbor)?
Solution: BD2K
 Make awards that bring out the best developments in
data science by the extramural community
 Provide a governance model such that these
extramural activities maximize the value of the
national infrastructure
 Encourage interagency – national and international
participation
 Up the ante on training the next generation of data
scientists
What will this look like if we are
successful?
The NIH as a Digital Enterprise
Components of The Academic Digital
Enterprise
 Consists of digital assets
– E.g. datasets, papers, software, lab notes
 Each asset is uniquely identified and has provenance,
including access control
– E.g. publishing simply involves changing the access control
 Digital assets are interoperable across the enterprise
Life in the Academic Digital Enterprise
 Jane scores extremely well in parts of her graduate on-line neurology class.
Neurology professors, whose research profiles are on-line and well described, are
automatically notified of Jane’s potential based on a computer analysis of her scores
against the background interests of the neuroscience professors. Consequently,
professor Smith interviews Jane and offers her a research rotation. During the
rotation she enters details of her experiments related to understanding a widespread
neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line
research space – an institutional resource where stakeholders provide metadata,
including access rights and provenance beyond that available in a commercial
offering. According to Jane’s preferences, the underlying computer system may
automatically bring to Jane’s attention Jack, a graduate student in the chemistry
department whose notebook reveals he is working on using bacteria for purposes of
toxic waste cleanup. Why the connection? They reference the same gene a number
of times in their notes, which is of interest to two very different disciplines – neurology
and environmental sciences. In the analog academic health center they would never
have discovered each other, but thanks to the Digital Enterprise, pooled knowledge
can lead to a distinct advantage. The collaboration results in the discovery of a
homologous human gene product as a putative target in treating the
neurodegenerative disorder. A new chemical entity is developed and patented.
Accordingly, by automatically matching details of the innovation with biotech
companies worldwide that might have potential interest, a licensee is found. The
licensee hires Jack to continue working on the project. Jane joins Joe’s laboratory,
and he hires another student using the revenue from the license. The research
continues and leads to a federal grant award. The students are employed, further
research is supported and in time societal benefit arises from the technology.
From What Big Data Means to Me JAMIA 2014 21:194
Life in the NIH Digital Enterprise
 Researcher x is made aware of researcher y through
commonalities in their data located in the commons.
Researcher x reviews the grants profile of researcher y and
publication history and impact from those grants in the past 5
years and decides to contact her. A fruitful collaboration ensues
and they generate papers, data sets and software. Metrics
automatically pushed to company z for all relevant NIH data and
software in a specific domain with utilization above a threshold
indicate that their data and software are heavily utilized and
respected by the community. An open source version remains,
but the company adds services on top of the software for the
novice user and revenue flows back to the labs of researchers x
and y which is used to develop new innovative software for
open distribution. Researchers x and y come to the NIH training
center periodically to provide hands-on advice in the use of their
new version and their course is offered as a MOOC. Course
attendees make breakthroughs that improve the health of the
nation.
To get to that end point we have to
consider the complete research
lifecycle
The Research Life Cycle will Persist
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Tools and Resources Will Continue To
Be Developed
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Those Elements of the Research Life Cycle will
Become More Interconnected Around a Common
Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
New/Extended Support Structures Will
Emerge
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
We Have a Ways to Go
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
Thank You!
Questions?
philip.bourne@nih.gov
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov

Contenu connexe

Tendances

There is No Intelligent Life Down Here
There is No Intelligent Life Down HereThere is No Intelligent Life Down Here
There is No Intelligent Life Down HerePhilip Bourne
 
Understanding the Big Data Enterprise
Understanding the Big Data EnterpriseUnderstanding the Big Data Enterprise
Understanding the Big Data EnterprisePhilip Bourne
 
Towards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterpriseTowards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterprisePhilip Bourne
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationPhilip Bourne
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and  People: A SEAD ViewPreservation, Publishing, and  People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewInna Kouper
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Ciera Martinez
 
Highlights from NIH Data Science
Highlights from NIH Data ScienceHighlights from NIH Data Science
Highlights from NIH Data SciencePhilip Bourne
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...Western Illinois University
 
TableauVisitJuly2016
TableauVisitJuly2016TableauVisitJuly2016
TableauVisitJuly2016Brian Fisher
 

Tendances (20)

There is No Intelligent Life Down Here
There is No Intelligent Life Down HereThere is No Intelligent Life Down Here
There is No Intelligent Life Down Here
 
BD2K Update
BD2K Update BD2K Update
BD2K Update
 
Understanding the Big Data Enterprise
Understanding the Big Data EnterpriseUnderstanding the Big Data Enterprise
Understanding the Big Data Enterprise
 
Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]
 
Towards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterpriseTowards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital Enterprise
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Concept on e-Research
Concept on e-ResearchConcept on e-Research
Concept on e-Research
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and  People: A SEAD ViewPreservation, Publishing, and  People: A SEAD View
Preservation, Publishing, and People: A SEAD View
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...
 
Highlights from NIH Data Science
Highlights from NIH Data ScienceHighlights from NIH Data Science
Highlights from NIH Data Science
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111
 
IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...
 
TableauVisitJuly2016
TableauVisitJuly2016TableauVisitJuly2016
TableauVisitJuly2016
 
3 dvc nsf-062813
3 dvc nsf-0628133 dvc nsf-062813
3 dvc nsf-062813
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 

En vedette

One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersPhilip Bourne
 
Societal aspects of Big Data
Societal aspects of Big DataSocietal aspects of Big Data
Societal aspects of Big DataPhilip Bourne
 
Regional Student Group NBIC Career Presentation April 18, 2011
Regional Student Group NBIC Career Presentation April 18, 2011Regional Student Group NBIC Career Presentation April 18, 2011
Regional Student Group NBIC Career Presentation April 18, 2011Philip Bourne
 
Training Quantitative Scientists for Biomedical Science Through the BD2K Init...
Training Quantitative Scientists for Biomedical Science Through the BD2K Init...Training Quantitative Scientists for Biomedical Science Through the BD2K Init...
Training Quantitative Scientists for Biomedical Science Through the BD2K Init...Philip Bourne
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020Philip Bourne
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterprisePhilip Bourne
 
Moving Forward with Open Data Science - SWOT Analysis
Moving Forward with Open Data Science - SWOT AnalysisMoving Forward with Open Data Science - SWOT Analysis
Moving Forward with Open Data Science - SWOT AnalysisPhilip Bourne
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbPhilip Bourne
 

En vedette (10)

The Commons
The CommonsThe Commons
The Commons
 
One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific Publishers
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 
Societal aspects of Big Data
Societal aspects of Big DataSocietal aspects of Big Data
Societal aspects of Big Data
 
Regional Student Group NBIC Career Presentation April 18, 2011
Regional Student Group NBIC Career Presentation April 18, 2011Regional Student Group NBIC Career Presentation April 18, 2011
Regional Student Group NBIC Career Presentation April 18, 2011
 
Training Quantitative Scientists for Biomedical Science Through the BD2K Init...
Training Quantitative Scientists for Biomedical Science Through the BD2K Init...Training Quantitative Scientists for Biomedical Science Through the BD2K Init...
Training Quantitative Scientists for Biomedical Science Through the BD2K Init...
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
Moving Forward with Open Data Science - SWOT Analysis
Moving Forward with Open Data Science - SWOT AnalysisMoving Forward with Open Data Science - SWOT Analysis
Moving Forward with Open Data Science - SWOT Analysis
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like Airbnb
 

Similaire à Data at the NIH

The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterprisePhilip Bourne
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataPhilip Bourne
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterprisePhilip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Workshop intro090314
Workshop intro090314Workshop intro090314
Workshop intro090314Philip Bourne
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH     Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH Philip Bourne
 
A SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIHA SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIHPhilip Bourne
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Overview of Digital Publishing
Overview of Digital PublishingOverview of Digital Publishing
Overview of Digital PublishingPhilip Bourne
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?Philip Bourne
 
Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?Philip Bourne
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 

Similaire à Data at the NIH (20)

The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Data!
Data!Data!
Data!
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital Enterprise
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Workshop intro090314
Workshop intro090314Workshop intro090314
Workshop intro090314
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH     Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
 
A SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIHA SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIH
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Overview of Digital Publishing
Overview of Digital PublishingOverview of Digital Publishing
Overview of Digital Publishing
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?
 
Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 

Plus de Philip Bourne

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationPhilip Bourne
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingPhilip Bourne
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityPhilip Bourne
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?Philip Bourne
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug DiscoveryPhilip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchPhilip Bourne
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data SciencePhilip Bourne
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptxPhilip Bourne
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Philip Bourne
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision EducationPhilip Bourne
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Philip Bourne
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Philip Bourne
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance SustainabilityPhilip Bourne
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesPhilip Bourne
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in ResearchPhilip Bourne
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?Philip Bourne
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data SciencePhilip Bourne
 

Plus de Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 

Dernier

ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 

Dernier (20)

ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 

Data at the NIH

  • 1. Data at the NIH: Some Early Thoughts Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health http://www.slideshare.net/pebourne/
  • 2. Background  Research in computational biology…  Co-directed the RCSB Protein Data Bank (1999- 2014)  Co-founded PLOS Computational Biology; First EIC (2005 – 2012)  With NIAID: – Collaborator on the IEDB Project (Sette)
  • 3. Disclaimer: I only started March 3, 2014 …but I had been thinking about this prior to my appointment http://pebourne.wordpress.com/2013/12/
  • 4. Numberofreleasedentries Year Motivation for Change: PDB Growth in Numbers and Complexity [From the RCSB Protein Data Bank]
  • 5. Reminder of Why the ADDS Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
  • 6. Motivation for Change: We Are at the Beginning
  • 7. Motivation: We Are at an Inflexion Point for Change  Evidence: – Google car – 3D printers – Waze – Robotics
  • 8. From the Second Machine Age From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 9. Much Useful Groundwork Has Been Done
  • 10. NIH Data & Informatics WorkingNIH Data & Informatics Working GroupGroup In response to the growth of large biomedicalgrowth of large biomedical datasetsdatasets, the Director of NIH established a special Data and Informatics Working Group (DIWG).
  • 11. Big Data to Knowledge (BD2K)Big Data to Knowledge (BD2K) 1. Facilitating Broad Use 2. Developing and Disseminating Analysis Methods and Software 3. Enhancing Training 4. Establishing Centers of Excellence http://bd2k.nih.gov
  • 12. Currently…  Data Discovery Index – under review  Data Centers – under review  Training grants – RFA’s issued; under review  Software index – workshop in May  Catalog of standards – FOA under development
  • 13. This is just the beginning… Some Early Observations
  • 14. Some Early Observations 1. We don’t know enough about how existing data are used
  • 15. * http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010 1RUZ: 1918 H1 Hemagglutinin Structure Summary page activity for H1N1 Influenza related structures 3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir [Andreas Prlic] Consider What Might Be Possible
  • 16. We Need to Learn from Industries Whose Livelihood Addresses the Question of Use
  • 17. Some Early Observations 1. We don’t know enough about how existing data are used 2. We have focused on the why, but not the how
  • 18. 2. We have focused on the why, but not the how  The OSTP directive is the why  The how is needed for: – Any data that does not fit the existing data resource model • Data generated by NIH cores • Data accompanying publications • Data associated with the long tail of science
  • 19. Some Early Observations 1. We don’t know enough about how existing data are used 2. We have focused on the why, but not the how 3. We do not have an NIH-wide sustainability plan for data (not heard of an IC-based plan either)
  • 20. 3. Sustainability  Problems – Maintaining a work force – lack of reward – Too much data; too few dollars – Resources • In different stages of maturity but treated the same • Funded by a few used by many – True as measured by IC – True as measured by agency – True as measured by country • Reviews can be problematic
  • 21. Some Early Observations 1. We don’t know enough about how existing data are used 2. We have focused on the why, but not the how 3. We do not have an NIH-wide sustainability plan for data (not heard of an IC-based plan either) 4. Training in biomedical data science is spotty
  • 22. Some Early Observations 1. We don’t know enough about how existing data are used 2. We have focused on the why, but not the how 3. We do not have an NIH-wide sustainability plan for data (not heard of an IC-based plan either) 4. Training in biomedical data science is spotty 5. Reproducibility will need to be embraced
  • 23. 47/53 “landmark” publications could not be replicated [Begley, Ellis Nature, 483, 2012] [Carole Goble]
  • 24. Enough of the problems what about some solutions….
  • 25. Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process • Cloud – Data & Compute • Search • Security • Reproducibility Standards • App Store • Coordinate • Hands-on • Syllabus • MOOCs • Community • Centers • Training Grants • Catalogs • Standards • Analysis • Data Resource Support • Metrics • Best Practices • Evaluation • Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration rogrammatic Theme Deliverable Example Features • IC’s • Researchers • Federal Agencies • International Partners • Computer Scientists Scientific Data Council External Advisory Board * Hires made
  • 26. Solution: The Power of the Commons Data The Long Tail Core Facilities/HS Centers Clinical /Patient The Why: Data Sharing Plans The Commons Government The How: Data Discovery Index Sustainable Storage Quality Scientific Discovery Usability Security/ Privacy Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment The End Game: KnowledgeNIH Awardees Private Sector Metrics/ Standards Rest of Academia Software Standards Index BD2K Centers Cloud, Research Objects,
  • 27. What Does the Commons Enable?  Dropbox like storage  The opportunity to apply quality metrics  Bring compute to the data  A place to collaborate  A place to discover http://100plus.com/wp-content/uploads/Data-Commons-3- 1024x825.png
  • 28. Commons Timeline  Spring/Summer 2014: DS group are gathering information about activities and needs from ICs (and outside communities). – Shared interests in developing cloud-based biomedical commons. – Investigating potential models of sustainability. – Exploring metrics of usefulness and success.  Fall 2014: Develop possible pilots to explore options in addition to those already being implemented by some ICs.
  • 29. Solution: Process – Modified Review  Possible Solutions – Establish a central fund to support – The 50% model – New funding models eg open submission and review – Split innovation from core support and review separately – Policies for uniform metric reporting – Discuss with the private sector possible funding models – More cooperation, less redundancy across agencies – Bring foundations into the discussion – Discuss with libraries, repositories their role – Educate decision makes as to the changing landscape
  • 30. Solution: Education  Raise awareness among stakeholders eg senior academic leadership  Catalog existing intramural and extramural training efforts  Define a data science curriculum  Consider one or more regional training centers (cf Cold Spring Harbor)?
  • 31. Solution: BD2K  Make awards that bring out the best developments in data science by the extramural community  Provide a governance model such that these extramural activities maximize the value of the national infrastructure  Encourage interagency – national and international participation  Up the ante on training the next generation of data scientists
  • 32. What will this look like if we are successful? The NIH as a Digital Enterprise
  • 33. Components of The Academic Digital Enterprise  Consists of digital assets – E.g. datasets, papers, software, lab notes  Each asset is uniquely identified and has provenance, including access control – E.g. publishing simply involves changing the access control  Digital assets are interoperable across the enterprise
  • 34. Life in the Academic Digital Enterprise  Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors, whose research profiles are on-line and well described, are automatically notified of Jane’s potential based on a computer analysis of her scores against the background interests of the neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research rotation. During the rotation she enters details of her experiments related to understanding a widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line research space – an institutional resource where stakeholders provide metadata, including access rights and provenance beyond that available in a commercial offering. According to Jane’s preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a graduate student in the chemistry department whose notebook reveals he is working on using bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a number of times in their notes, which is of interest to two very different disciplines – neurology and environmental sciences. In the analog academic health center they would never have discovered each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage. The collaboration results in the discovery of a homologous human gene product as a putative target in treating the neurodegenerative disorder. A new chemical entity is developed and patented. Accordingly, by automatically matching details of the innovation with biotech companies worldwide that might have potential interest, a licensee is found. The licensee hires Jack to continue working on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the license. The research continues and leads to a federal grant award. The students are employed, further research is supported and in time societal benefit arises from the technology. From What Big Data Means to Me JAMIA 2014 21:194
  • 35. Life in the NIH Digital Enterprise  Researcher x is made aware of researcher y through commonalities in their data located in the commons. Researcher x reviews the grants profile of researcher y and publication history and impact from those grants in the past 5 years and decides to contact her. A fruitful collaboration ensues and they generate papers, data sets and software. Metrics automatically pushed to company z for all relevant NIH data and software in a specific domain with utilization above a threshold indicate that their data and software are heavily utilized and respected by the community. An open source version remains, but the company adds services on top of the software for the novice user and revenue flows back to the labs of researchers x and y which is used to develop new innovative software for open distribution. Researchers x and y come to the NIH training center periodically to provide hands-on advice in the use of their new version and their course is offered as a MOOC. Course attendees make breakthroughs that improve the health of the nation.
  • 36. To get to that end point we have to consider the complete research lifecycle
  • 37. The Research Life Cycle will Persist IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
  • 38. Tools and Resources Will Continue To Be Developed IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  • 39. Those Elements of the Research Life Cycle will Become More Interconnected Around a Common Framework IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  • 40. New/Extended Support Structures Will Emerge IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • 41. We Have a Ways to Go IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • 43. NIHNIH…… Turning Discovery Into HealthTurning Discovery Into Health philip.bourne@nih.gov

Notes de l'éditeur

  1. Augment (not replace) existing IC programs
  2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124 http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328
  3. Can you put more than just one time point on this? How about making the last sub-bullet it’s own bullet (as shown) Yes – that is what we were thinking and wrestling with.