SlideShare une entreprise Scribd logo
1  sur  21
Why should researchers care 
about data curation? 
Varsha Khodiyar
WHY SHARE DATA
Expenditure on data 
generation 
 16.8% NIH grant applications funded* 
◦ Hours spent writing grants? 
◦ Hours spent reviewing grants? 
 Resources are finite/expensive 
◦ Modified animals 
◦ Specialized reagents 
 Time and effort to generate good, valid 
data 
* For fiscal year 2013 
(http://report.nih.gov/success_rates/Success_ByIC.cfm)
Reproducibility is a cornerstone 
of science 
“[W]e evaluated the replication of data 
analyses in 18 articles on microarray-based 
gene expression profiling 
published in Nature Genetics in 2005– 
2006...We reproduced two analyses in 
principle and six partially or with some 
discrepancies; ten could not be 
reproduced. The main reason for 
failure to reproduce was data 
unavailability.” 
Ioannidis JPA. et al. Repeatability of published 
microarray gene expression analyses. Nature 
Genetics 41, 149–55 (2009)
HOW TO SHARE DATA
Data needs to be… 
 Discoverable 
◦ Need to know it’s there 
 Accessible 
◦ Must be able to get to the data 
 Usable 
◦ Require sufficient information about how the data was 
generated 
 Persistent 
◦ Historical data access as part of the scientific record, as 
well as for new research 
 Reliable 
◦ Data provenance informs data reuse decisions
Traditional publishing 
• Data in a PDF is discoverable and accessible, by 
readers of the paper 
• But is not usable - can't manipulate data in a PDF table
I’ll send my data when someone 
asks for it 
 “We examined the availability 
of data from 516 studies 
between 2 and 22 years old 
 The odds of a data set 
being reported as extant fell by 17% per year 
 Broken e-mails and obsolete storage devices 
were the main obstacles to data sharing” 
Vines TH. et al. The availability of research data declines 
rapidly with article age. Curr Biol 24, 94–7 (2014)
I’ll make my data available in a 
repository 
• Data is discoverable, accessible and persistent 
• But data may not be usable, as limited space for data-specific 
description in an unstructured repository
I’ll write a data paper 
Materials and Methods 
Animal surgery 
Behavioural testing 
Data collection and cell-type 
classification 
Data description 
Data file organization 
Metadata organization 
• Data is discoverable, accessible and persistent 
• Sufficient space for methodological detail
BUT ARE WE MISSING 
SOMETHING?
Human vs. machine 
• Is your data truly 
discoverable by researchers 
outside your own domain? 
• Too many papers to read in 
each person’s own field. 
• Could increasing the 
machine readability of your 
data result in increased use 
of your data? 
• Is making an entire 
dataset machine readable, 
feasible?
Metadata 
 Fully describe the experiments that 
generated the data 
◦ Takes time to ensure full metadata capture 
 Structure the metadata to ensure 
machine readability 
◦ Structure needs to be decided 
prospectively 
 Metadata can be discovered in 
automated way 
◦ Requires relevant infrastructure
Curation is a specialised task 
 Researchers are not data 
management professionals 
 Learning how to curate data, takes 
time 
 Article publication is carried out by 
specialists (journals). 
 Follows that data publication should 
also be carried out by specialists.
Benefits of curated metadata 
 Users of data 
◦ Data is findable 
◦ Data provenance is clear 
◦ Increased data usability 
◦ Reduce unnecessary duplication of data 
 Data generators 
◦ Data more likely to be used, so data 
citation rates will increase 
◦ Contribute to novel research that data 
generators would not have carried out
Metadata as an integral part of a 
data paper
FUTURE POSSIBILITIES
Machine readable research 
metadata could lead to... 
Linked Data 
Infrastructure for 
linked research data 
is being developed 
a way to publish data so that data from 
different sources can be connected and 
queried 
"Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch 
and Richard Cyganiak. http://lod-cloud.net/"
The beginnings of linked 
research data 
An open-access database of publicly 
available antibodies against human protein 
targets, with user and provider data on 
antibody efficacy in a range of assays. 
“We show that Antibodypedia may be used to 
track the development of available and validated 
antibodies to the individual chromosomes, and 
thus the database is an attractive tool to identify 
proteins with no or few antibodies yet 
generated.”
Summary 
 Reusing previously generated data is 
economical 
 Data reuse dependant on discoverable, 
accessible and usable shared datasets 
 Descriptive metadata enhances 
(re)usability of data 
 Capture of structured metadata is a 
specialist skill 
 The future: machine readable metadata 
will be important
Thanks for listening...

Contenu connexe

Tendances

Biocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification InitiativeBiocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification Initiative
mhaendel
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Brett Tully
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 

Tendances (20)

Metadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at RiskMetadata for Data Rescue and Data at Risk
Metadata for Data Rescue and Data at Risk
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
 
Biocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification InitiativeBiocuration 2014 - The Resource Identification Initiative
Biocuration 2014 - The Resource Identification Initiative
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Working Effectively with Medicare Data: Limits and Opportunities
Working Effectively with Medicare Data: Limits and OpportunitiesWorking Effectively with Medicare Data: Limits and Opportunities
Working Effectively with Medicare Data: Limits and Opportunities
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...
 
Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data Publishing
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 
Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 

En vedette

Kurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere MortalsKurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere Mortals
Bertram Ludäscher
 

En vedette (6)

data curation issues
data curation issuesdata curation issues
data curation issues
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
 
Kurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere MortalsKurator: Towards Data Curation for Mere Mortals
Kurator: Towards Data Curation for Mere Mortals
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 

Similaire à Why should researchers care about data curation?

Similaire à Why should researchers care about data curation? (20)

Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 

Plus de Varsha Khodiyar

Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Varsha Khodiyar
 
New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...
Varsha Khodiyar
 
The value of data curation as part of the publishing process
The value of data curation as part of the publishing processThe value of data curation as part of the publishing process
The value of data curation as part of the publishing process
Varsha Khodiyar
 

Plus de Varsha Khodiyar (20)

Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
 
COVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and testsCOVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and tests
 
COVID-19 variants and vaccines
COVID-19 variants and vaccinesCOVID-19 variants and vaccines
COVID-19 variants and vaccines
 
Data citation and sharing during article publication
Data citation and sharing during article publicationData citation and sharing during article publication
Data citation and sharing during article publication
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositories
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 
Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data
 
New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...
 
The value of data curation as part of the publishing process
The value of data curation as part of the publishing processThe value of data curation as part of the publishing process
The value of data curation as part of the publishing process
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...
 
Practical challenges for researchers in data sharing
Practical challenges for researchers in data sharingPractical challenges for researchers in data sharing
Practical challenges for researchers in data sharing
 
Update from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IGUpdate from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IG
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshop
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journal
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional Repositories
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Clinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific DataClinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific Data
 

Dernier

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 

Dernier (20)

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 

Why should researchers care about data curation?

  • 1. Why should researchers care about data curation? Varsha Khodiyar
  • 3. Expenditure on data generation  16.8% NIH grant applications funded* ◦ Hours spent writing grants? ◦ Hours spent reviewing grants?  Resources are finite/expensive ◦ Modified animals ◦ Specialized reagents  Time and effort to generate good, valid data * For fiscal year 2013 (http://report.nih.gov/success_rates/Success_ByIC.cfm)
  • 4. Reproducibility is a cornerstone of science “[W]e evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005– 2006...We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability.” Ioannidis JPA. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149–55 (2009)
  • 6. Data needs to be…  Discoverable ◦ Need to know it’s there  Accessible ◦ Must be able to get to the data  Usable ◦ Require sufficient information about how the data was generated  Persistent ◦ Historical data access as part of the scientific record, as well as for new research  Reliable ◦ Data provenance informs data reuse decisions
  • 7. Traditional publishing • Data in a PDF is discoverable and accessible, by readers of the paper • But is not usable - can't manipulate data in a PDF table
  • 8. I’ll send my data when someone asks for it  “We examined the availability of data from 516 studies between 2 and 22 years old  The odds of a data set being reported as extant fell by 17% per year  Broken e-mails and obsolete storage devices were the main obstacles to data sharing” Vines TH. et al. The availability of research data declines rapidly with article age. Curr Biol 24, 94–7 (2014)
  • 9. I’ll make my data available in a repository • Data is discoverable, accessible and persistent • But data may not be usable, as limited space for data-specific description in an unstructured repository
  • 10. I’ll write a data paper Materials and Methods Animal surgery Behavioural testing Data collection and cell-type classification Data description Data file organization Metadata organization • Data is discoverable, accessible and persistent • Sufficient space for methodological detail
  • 11. BUT ARE WE MISSING SOMETHING?
  • 12. Human vs. machine • Is your data truly discoverable by researchers outside your own domain? • Too many papers to read in each person’s own field. • Could increasing the machine readability of your data result in increased use of your data? • Is making an entire dataset machine readable, feasible?
  • 13. Metadata  Fully describe the experiments that generated the data ◦ Takes time to ensure full metadata capture  Structure the metadata to ensure machine readability ◦ Structure needs to be decided prospectively  Metadata can be discovered in automated way ◦ Requires relevant infrastructure
  • 14. Curation is a specialised task  Researchers are not data management professionals  Learning how to curate data, takes time  Article publication is carried out by specialists (journals).  Follows that data publication should also be carried out by specialists.
  • 15. Benefits of curated metadata  Users of data ◦ Data is findable ◦ Data provenance is clear ◦ Increased data usability ◦ Reduce unnecessary duplication of data  Data generators ◦ Data more likely to be used, so data citation rates will increase ◦ Contribute to novel research that data generators would not have carried out
  • 16. Metadata as an integral part of a data paper
  • 18. Machine readable research metadata could lead to... Linked Data Infrastructure for linked research data is being developed a way to publish data so that data from different sources can be connected and queried "Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
  • 19. The beginnings of linked research data An open-access database of publicly available antibodies against human protein targets, with user and provider data on antibody efficacy in a range of assays. “We show that Antibodypedia may be used to track the development of available and validated antibodies to the individual chromosomes, and thus the database is an attractive tool to identify proteins with no or few antibodies yet generated.”
  • 20. Summary  Reusing previously generated data is economical  Data reuse dependant on discoverable, accessible and usable shared datasets  Descriptive metadata enhances (re)usability of data  Capture of structured metadata is a specialist skill  The future: machine readable metadata will be important