SlideShare une entreprise Scribd logo
1  sur  22
Big Data and its Role in
Biomedical Research
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
10/10/18 ACoP 2018 1
@pebourne
Bias
• Cant help but be influenced by my time as Associate
Director for Data Science (ADDS) at NIH
• Now very much engaged in data science across disciplines
– broader but shallower perspective
• Knowing my long-time colleague Prof. Lei Xie and others
will follow me with a deeper perspective
10/10/18 ACoP 2018 2
Lets start with a definition ….
10/10/18 ACoP 2018 3
Big data and data
science are like
the Internet…
If I asked you to
define them you
would all say
something
different, yet you
use them every
day…
10/10/18 ACoP 2018 4
http://vadlo.com/cartoons.php?id=357
So what do I mean by big data/data
science?
• Use of the ever increasing amount of open, complex, diverse
digital data
• Finding ways to ask and then answer relevant questions by
combining such diverse data sets
• Arriving at statistically significant conclusions not otherwise
obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human
condition
10/10/18 ACoP 2018 5
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
QSP - Open, complex, diverse digital data
Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262
10/10/18 6
Machine learning has been around for over 20
years – why the fuss now?
• Amount of data available for training
• Open source - R and python
• Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep
learning)
• Algorithmic efficiency gains (e.g., in back propagation)
• Success promotes further research
• Commercialization
10/10/18 ACoP 2018 7
Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
The NIH view
• Big Data
– Total data from NIH-funded research in 2016 estimated at 650 PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10
PB in 2016
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
10/10/18 ACoP 2018 8
NIH strategic plan for data
• Support a Highly Efficient and Effective
Biomedical Research Data
Infrastructure
• Promote Modernization of the Data-
Resources Ecosystem
• Support the Development and
Dissemination of Advanced Data
Management, Analytics, and
Visualization Tools
• Enhance Workforce Development for
Biomedical Data Science
• Enact Appropriate Policies to Promote
Stewardship and Sustainability
10/10/18 ACoP 2018 9
https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
A research data infrastructure requires
we move from pipes to platform…
which begs the question ...
10/10/18 ACoP 2018 10
Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Will biomedical research become more like Airbnb?
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer
(renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and
consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working:
– 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
10/10/18 ACoP 2018 11
Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Cloud computing
environment
data
metadata
software
model
container
Metadata
Model Commons
Model Commons
Recommendation
System
Model
registry
User interface
(A) (B)
(C)
ontology
model
data
algorithm
software
These plans require moving from pipes to platforms
10/10/18 ACoP 2018 12
The pillars of data science operate
within this platform environment
13
QSP
10/10/18 ACoP 2018
Lets briefly focus on those five pillars
in the Context of QSP …
10/10/18 ACoP 2018 14
Data acquisition
The data production issue (the V’s of Big Data)— Experimentally
• Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90%
of all the world’s data having been created in the past two years.
• Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!)
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
10/10/18 ACoP 2018 15
Data integration and engineering
• Generic
– Ontologies
– Object identifiers
– Indexing schemes
– Common data models
1610/10/18 ACoP 2018
Data analytics
17
• Generic
–SVM’s
–Neural nets
–Deep learning
–Random forest
10/10/18 ACoP 2018
Visualization
• Generic
– VR
– Networks
– Sonics
1810/10/18 ACoP 2018
Ethics, law & policy
10/10/18 ACoP 2018 19
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary,
co-occurring mutation
From Adam Resnick
Diffuse Intrinsic Pontine Glioma (DIDG)
Conclusion:
Driven by large amounts of open
digital data of different types and new
algorithms and approaches biomedical
researchers are destined to follow the
private sector towards the fourth
paradigm
10/10/18 ACoP 2018 20
Acknowledgements
10/10/18 ACoP 2018 21
The BD2K Team at NIH
My Colleagues at UVA
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Zheng Zhao Lei Xie
Thank You
peb6a@virginia.edu
2210/10/18 ACoP 2018

Contenu connexe

Tendances

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...Rania Malik
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)IndrajaDoradla
 
Human genome
Human genomeHuman genome
Human genomeshoaa311
 
second generation of DNA Sequencing
second generation of DNA Sequencingsecond generation of DNA Sequencing
second generation of DNA SequencingSidra Shaffique
 
Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014Mohammad Hessam Rafiee
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biologylemberger
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
 
The human genome project
The human genome projectThe human genome project
The human genome projectKundol Laa
 
Proteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tofProteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tofnirvarna gr
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencinganita devi
 
An Introduction to Genomics
An Introduction to GenomicsAn Introduction to Genomics
An Introduction to GenomicsDr NEETHU ASOKAN
 

Tendances (20)

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...
 
Stem cells
Stem cellsStem cells
Stem cells
 
Biological networks
Biological networksBiological networks
Biological networks
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
Human genome
Human genomeHuman genome
Human genome
 
The Human Genome Project
The Human Genome Project The Human Genome Project
The Human Genome Project
 
second generation of DNA Sequencing
second generation of DNA Sequencingsecond generation of DNA Sequencing
second generation of DNA Sequencing
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
The human genome project
The human genome projectThe human genome project
The human genome project
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Proteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tofProteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tof
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
An Introduction to Genomics
An Introduction to GenomicsAn Introduction to Genomics
An Introduction to Genomics
 

Similaire à Big Data and its Role in Biomedical Research

UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and supportJisc
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Philip Bourne
 
Big Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringBig Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringPhilip Bourne
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedPhilip Bourne
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyTorsten Reimer
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Robin Rice
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth ParadigmPhilip Bourne
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020Philip Bourne
 
Towards a Data Commons
Towards a Data CommonsTowards a Data Commons
Towards a Data CommonsMichael Becich
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?Philip Bourne
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Juan Antonio Vizcaino
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?Philip Bourne
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?Philip Bourne
 
Research Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeResearch Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeYasar Tonta
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management BlueprintEduserv
 

Similaire à Big Data and its Role in Biomedical Research (20)

UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and support
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
Big Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringBig Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical Engineering
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020
 
Towards a Data Commons
Towards a Data CommonsTowards a Data Commons
Towards a Data Commons
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Research Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeResearch Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research Practice
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
 

Plus de Philip Bourne

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationPhilip Bourne
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingPhilip Bourne
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityPhilip Bourne
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?Philip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug DiscoveryPhilip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchPhilip Bourne
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data SciencePhilip Bourne
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptxPhilip Bourne
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Philip Bourne
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision EducationPhilip Bourne
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Philip Bourne
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Philip Bourne
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance SustainabilityPhilip Bourne
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesPhilip Bourne
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in ResearchPhilip Bourne
 

Plus de Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 

Dernier

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 

Dernier (20)

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 

Big Data and its Role in Biomedical Research

  • 1. Big Data and its Role in Biomedical Research Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 10/10/18 ACoP 2018 1 @pebourne
  • 2. Bias • Cant help but be influenced by my time as Associate Director for Data Science (ADDS) at NIH • Now very much engaged in data science across disciplines – broader but shallower perspective • Knowing my long-time colleague Prof. Lei Xie and others will follow me with a deeper perspective 10/10/18 ACoP 2018 2
  • 3. Lets start with a definition …. 10/10/18 ACoP 2018 3
  • 4. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 10/10/18 ACoP 2018 4 http://vadlo.com/cartoons.php?id=357
  • 5. So what do I mean by big data/data science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 10/10/18 ACoP 2018 5
  • 6. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota QSP - Open, complex, diverse digital data Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 10/10/18 6
  • 7. Machine learning has been around for over 20 years – why the fuss now? • Amount of data available for training • Open source - R and python • Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep learning) • Algorithmic efficiency gains (e.g., in back propagation) • Success promotes further research • Commercialization 10/10/18 ACoP 2018 7 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  • 8. The NIH view • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 10/10/18 ACoP 2018 8
  • 9. NIH strategic plan for data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data- Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 10/10/18 ACoP 2018 9 https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  • 10. A research data infrastructure requires we move from pipes to platform… which begs the question ... 10/10/18 ACoP 2018 10 Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818. Will biomedical research become more like Airbnb?
  • 11. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 10/10/18 ACoP 2018 11 Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
  • 12. Cloud computing environment data metadata software model container Metadata Model Commons Model Commons Recommendation System Model registry User interface (A) (B) (C) ontology model data algorithm software These plans require moving from pipes to platforms 10/10/18 ACoP 2018 12
  • 13. The pillars of data science operate within this platform environment 13 QSP 10/10/18 ACoP 2018
  • 14. Lets briefly focus on those five pillars in the Context of QSP … 10/10/18 ACoP 2018 14
  • 15. Data acquisition The data production issue (the V’s of Big Data)— Experimentally • Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90% of all the world’s data having been created in the past two years. • Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!) Mura et al. 2018 Curr Opin Struct Biol. 52:95-102 10/10/18 ACoP 2018 15
  • 16. Data integration and engineering • Generic – Ontologies – Object identifiers – Indexing schemes – Common data models 1610/10/18 ACoP 2018
  • 17. Data analytics 17 • Generic –SVM’s –Neural nets –Deep learning –Random forest 10/10/18 ACoP 2018
  • 18. Visualization • Generic – VR – Networks – Sonics 1810/10/18 ACoP 2018
  • 19. Ethics, law & policy 10/10/18 ACoP 2018 19 • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick Diffuse Intrinsic Pontine Glioma (DIDG)
  • 20. Conclusion: Driven by large amounts of open digital data of different types and new algorithms and approaches biomedical researchers are destined to follow the private sector towards the fourth paradigm 10/10/18 ACoP 2018 20
  • 21. Acknowledgements 10/10/18 ACoP 2018 21 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0 Zheng Zhao Lei Xie

Notes de l'éditeur

  1. Model integration in systems pharmacology. Diverse models need to be integrated across multiple methodologies, multiple heterogeneous data sets, organismal hierarchy, and species (transportability).
  2. $1.25bn per year to capture all data. After a significant effort at reduction, intramurally data is spread across > 60 data centers; imagine the extramural situation.
  3. Distribution of kinases and the number of covalent small-molecule kinase inhibitors (CSKIs) for every targeted kinase across the human kinome
  4. 22