SlideShare une entreprise Scribd logo
1  sur  33
Public Data Archiving in Ecology and Evolution
How well are we doing?
Dr. Sandra A. Binning
@binsan5
Are publications the only useful research output?
What about DATA?
Do scientists have an
obligation to make their
data freely available?
Big push in the biological sciences for
Public Data Archiving
“The data and its analysis are the scientific product.
The paper is just an advertisement.”
Richard McElreath
McElreath R (2016) Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press: 469 pp
What is Public Data Archiving?
(Figure from Reichman et al 2011 Science)
The process of storing data
and associated metadata in a
repository that is open to the
public and where data can be
accessed and downloaded
freely by a third party.
Why do it?
• avoids data loss from hardware malfunction/obsolescence or from researchers moving on
to different projects or retiring
• encourages good metadata production to ensure that datasets are interpretable
• increases the ability to evaluate and reproduce studies
• increases opportunities for teaching and learning
• encourages a stronger sharing culture
• improves the return per research dollar
• increased citations and collaborations
(Huang & Qiao 2011 TREE, Molloy 2011 PLOS Biol, Piwowar et al 2011 Nature, Reichman et al
2011 Science, Tenopir et al 2011 PLOS One, Whitlock 2011 TREE, Whitlock et al 2010 Am Nat)
Most research is paid for by…..
Data as a public good?
TAXPAYERS
in the form of government grants and salaries
So, who really “owns” the data?
Joint Data Archiving Policy (JDAP)
http://datadryad.org/pages/jdap
Journals that require data archiving
Examples:
•The American Naturalist
•Biological Journal of the Linnean Society
•Biology Letters
•BMC Ecology
•BMC Evolutionary Biology
•BMJ
•BMJ Open
•Ecological Applications
•Ecological Monographs
•Ecology
•Ecosphere
•Evolution
•Evolutionary Applications
•Frontiers in Ecology and the Environment
•Functional Ecology
•Genetics
•Heredity
… http://datadryad.org/pages/jdap
Data archiving trends in Ecology & Evolution?
Data deposition has increased considerably
in Dryad and other repositories.
(Vision 2013 figshare)
Members of the JDAP consortium have
tripled since its inception in 2011.
(Magee et al 2014 PLOS One)
Enforcing Public Data Archiving policies has had a positive effect on data deposition rates.
(Vines et al 2013 FASEB Journal, Magee et al 2014 PLOS One)
The problem…
Many researchers harbour concerns about making their data publicly available.
This is particularly true in fields such as ecology and evolutionary biology, where datasets are
often complex, have a long shelf life, and can be used to test multiple hypotheses.
Why are researchers reluctant to archive/share their data?
• Proper data archiving takes time (away from publishing).
• Competition for publications - fear of being “scooped”.
• Concerns about data misinterpretation / misuse.
• Lack of recognition for Public Data Archiving.
Benefits vs. Costs
• avoids data loss from hardware
malfunction/obsolescence or from researchers moving
on to different projects or retiring
• encourages good metadata production to ensure that
datasets are interpretable
• increases the ability to evaluate and reproduce studies
• increases opportunities for teaching and learning
• encourages a stronger sharing culture
• improves the return per research dollar
• increased citations and collaborations
• funded by taxpayers
Good for scientific
community
But costs are to
individual
researchers
“63% of PIs were against PDA as currently required”
“41% of respondents said that they have avoided
publishing in journals that require [PDA]”
“53% intend to avoid publishing in [journals requiring
PDA] in the future”
“A key concern is that [PDA] will be a disincentive
both for the initiation of long-term studies, and for
maintenance of ongoing studies.”
Are we filling up ‘empty archives’?
(Nelson 2009 Nature)
Most journals and databases don’t verify the quality of archived data beyond
basic checks like ensuring that a data availability statement and a valid DOI
number are provided in the paper.
(Noor et al 2006 PLOS Biol, Costello et al 2013 TREE)
What’s happening in molecular biology?
It’s not looking good…
1) Ioannidis et al 2008 Nat Gen:
Review of microarray studies :
- only 2 of 18 were reproducible
2) Gilbert et al 2014 Mol Ecol:
Review of pop genetics studies:
- 30% of analyses irreproducible
- 35% of datasets insufficiently
described
PDA in E&E – how well are we doing?
We assessed 100 non-molecular studies in journals either have adopted the Joint
Data Archiving Policy (JDAP) or have a strong data archiving policy.
Completeness criterion
Reusability criterion
Joint Data Archiving Policy (JDAP)
“data supporting the results in the paper should be archived in an appropriate public archive”
http://datadryad.org/pages/jdap
Data completeness score
Meets JDAP requirements
Does not meet JDAP requirements (Roche et al 2015; PLOS Biol)
Data reusability score
(Roche et al 2015; PLOS Biol)
Bad archiving examples
• SPSS files archived
• Files archived in language other than English with no metadata
• Too much data!
• Only data (no description)
• Principle components without raw data
Data completeness - results
More than half (56%) of studies did not meet the minimum
requirement of JDAP or strong archiving policies
passfail
(Roche et al 2015; PLOS Biol)
Data reusability - results
passfail
Even more (64%) of studies were archived in a way that partially
or entirely prevented reuse (Roche et al 2015; PLOS Biol)
How do we increase high quality participation?
How do we increase participation?
1. Encourage communication between data generators and re-users
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
3. Encourage increased recognition of publicly archived data
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
3. Encourage increased recognition of publicly archived data
4. Facilitate more flexible embargoes on archived data
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
• Be mindful of PDA
• Provide detailed metadata
• Use descriptive file names
• Archive unprocessed data
• Use standard file formats (i.e. .txt, .csv)
• Facilitate data aggregation
• Perform quality control
How do we increase high quality participation?
Key recommendations to improve PDA practices
Public Data Archiving: The way forward?
• Not everyone is on board
• “Empty archives” are a problem in E&E
• Willful omission
• Lack of knowledge
• Solutions
• Acknowledge fears and try to alleviate them
• Enforcement, reward, flexibility
• Educate researchers as to best practices
• Recognize individual efforts to increase transparency
Many thanks to Ainsley Seago, Luke Holman, Scott Keogh, Pat
Backwell, Andrew Cockburn, Todd Vision, Mark Hahnel, the
Evolutionary Ecology Reading group at the Australian National
University and the Eco-Ethology and Cognitive Sciences lab
groups at the University of Neuchatel.
Image / illustration credits: A. Seago, Google@binsan5

Contenu connexe

Tendances

OA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing ResearchOA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing Research
William Gunn
 

Tendances (20)

OA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing ResearchOA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing Research
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
 
Practical challenges for researchers in data sharing
Practical challenges for researchers in data sharingPractical challenges for researchers in data sharing
Practical challenges for researchers in data sharing
 
Ethics and Stem Cells
Ethics and Stem CellsEthics and Stem Cells
Ethics and Stem Cells
 
Ethics, Research & Society
Ethics, Research & SocietyEthics, Research & Society
Ethics, Research & Society
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
References on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopReferences on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. Bishop
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
 
Global Dementia Legacy Event: Dr Neil Buckholtz
Global Dementia Legacy Event: Dr Neil Buckholtz Global Dementia Legacy Event: Dr Neil Buckholtz
Global Dementia Legacy Event: Dr Neil Buckholtz
 
ischools future of data managemente dec2017
ischools future of data managemente dec2017ischools future of data managemente dec2017
ischools future of data managemente dec2017
 
Data citation metrics : best practice to enable new metrics for research data
Data citation metrics : best practice to enable new metrics for research dataData citation metrics : best practice to enable new metrics for research data
Data citation metrics : best practice to enable new metrics for research data
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defense
 
Responsible Conduct of Research
Responsible Conduct of ResearchResponsible Conduct of Research
Responsible Conduct of Research
 
Inglis Preprints in Biology and Medicine
Inglis Preprints in Biology and MedicineInglis Preprints in Biology and Medicine
Inglis Preprints in Biology and Medicine
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Collaborative Research: Scopus & RefWorks
Collaborative Research: Scopus & RefWorksCollaborative Research: Scopus & RefWorks
Collaborative Research: Scopus & RefWorks
 
Open Notebook Science in Drug Discovery
Open Notebook Science in Drug DiscoveryOpen Notebook Science in Drug Discovery
Open Notebook Science in Drug Discovery
 

Similaire à Public Data Archiving in Ecology and Evolution: How well are we doing?

Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
rds-wayne-edu
 

Similaire à Public Data Archiving in Ecology and Evolution: How well are we doing? (20)

The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global Infrastructure
 
Rebecca Grant - Publishers and RDM
Rebecca Grant - Publishers and RDMRebecca Grant - Publishers and RDM
Rebecca Grant - Publishers and RDM
 
A National Approach to Open Data in Ireland: Publishers and Research Data Man...
A National Approach to Open Data in Ireland: Publishers and Research Data Man...A National Approach to Open Data in Ireland: Publishers and Research Data Man...
A National Approach to Open Data in Ireland: Publishers and Research Data Man...
 
Data sharing archiving discovery, Bill Michener
Data sharing archiving discovery, Bill MichenerData sharing archiving discovery, Bill Michener
Data sharing archiving discovery, Bill Michener
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
Data users, data producers
Data users, data producersData users, data producers
Data users, data producers
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
One Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open ScienceOne Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open Science
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Open Science: Where Theory Meets Practice
Open Science: Where Theory Meets PracticeOpen Science: Where Theory Meets Practice
Open Science: Where Theory Meets Practice
 
Publishing your data smyth
Publishing your data smythPublishing your data smyth
Publishing your data smyth
 

Dernier

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 

Dernier (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Public Data Archiving in Ecology and Evolution: How well are we doing?

  • 1. Public Data Archiving in Ecology and Evolution How well are we doing? Dr. Sandra A. Binning @binsan5
  • 2.
  • 3. Are publications the only useful research output?
  • 4. What about DATA? Do scientists have an obligation to make their data freely available? Big push in the biological sciences for Public Data Archiving “The data and its analysis are the scientific product. The paper is just an advertisement.” Richard McElreath McElreath R (2016) Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press: 469 pp
  • 5. What is Public Data Archiving? (Figure from Reichman et al 2011 Science) The process of storing data and associated metadata in a repository that is open to the public and where data can be accessed and downloaded freely by a third party.
  • 6. Why do it? • avoids data loss from hardware malfunction/obsolescence or from researchers moving on to different projects or retiring • encourages good metadata production to ensure that datasets are interpretable • increases the ability to evaluate and reproduce studies • increases opportunities for teaching and learning • encourages a stronger sharing culture • improves the return per research dollar • increased citations and collaborations (Huang & Qiao 2011 TREE, Molloy 2011 PLOS Biol, Piwowar et al 2011 Nature, Reichman et al 2011 Science, Tenopir et al 2011 PLOS One, Whitlock 2011 TREE, Whitlock et al 2010 Am Nat)
  • 7. Most research is paid for by….. Data as a public good? TAXPAYERS in the form of government grants and salaries So, who really “owns” the data?
  • 8. Joint Data Archiving Policy (JDAP) http://datadryad.org/pages/jdap
  • 9. Journals that require data archiving Examples: •The American Naturalist •Biological Journal of the Linnean Society •Biology Letters •BMC Ecology •BMC Evolutionary Biology •BMJ •BMJ Open •Ecological Applications •Ecological Monographs •Ecology •Ecosphere •Evolution •Evolutionary Applications •Frontiers in Ecology and the Environment •Functional Ecology •Genetics •Heredity … http://datadryad.org/pages/jdap
  • 10. Data archiving trends in Ecology & Evolution? Data deposition has increased considerably in Dryad and other repositories. (Vision 2013 figshare) Members of the JDAP consortium have tripled since its inception in 2011. (Magee et al 2014 PLOS One) Enforcing Public Data Archiving policies has had a positive effect on data deposition rates. (Vines et al 2013 FASEB Journal, Magee et al 2014 PLOS One)
  • 11. The problem… Many researchers harbour concerns about making their data publicly available. This is particularly true in fields such as ecology and evolutionary biology, where datasets are often complex, have a long shelf life, and can be used to test multiple hypotheses.
  • 12. Why are researchers reluctant to archive/share their data? • Proper data archiving takes time (away from publishing). • Competition for publications - fear of being “scooped”. • Concerns about data misinterpretation / misuse. • Lack of recognition for Public Data Archiving.
  • 13. Benefits vs. Costs • avoids data loss from hardware malfunction/obsolescence or from researchers moving on to different projects or retiring • encourages good metadata production to ensure that datasets are interpretable • increases the ability to evaluate and reproduce studies • increases opportunities for teaching and learning • encourages a stronger sharing culture • improves the return per research dollar • increased citations and collaborations • funded by taxpayers Good for scientific community But costs are to individual researchers
  • 14. “63% of PIs were against PDA as currently required” “41% of respondents said that they have avoided publishing in journals that require [PDA]” “53% intend to avoid publishing in [journals requiring PDA] in the future” “A key concern is that [PDA] will be a disincentive both for the initiation of long-term studies, and for maintenance of ongoing studies.”
  • 15. Are we filling up ‘empty archives’? (Nelson 2009 Nature) Most journals and databases don’t verify the quality of archived data beyond basic checks like ensuring that a data availability statement and a valid DOI number are provided in the paper. (Noor et al 2006 PLOS Biol, Costello et al 2013 TREE)
  • 16. What’s happening in molecular biology? It’s not looking good… 1) Ioannidis et al 2008 Nat Gen: Review of microarray studies : - only 2 of 18 were reproducible 2) Gilbert et al 2014 Mol Ecol: Review of pop genetics studies: - 30% of analyses irreproducible - 35% of datasets insufficiently described
  • 17.
  • 18. PDA in E&E – how well are we doing? We assessed 100 non-molecular studies in journals either have adopted the Joint Data Archiving Policy (JDAP) or have a strong data archiving policy. Completeness criterion Reusability criterion
  • 19. Joint Data Archiving Policy (JDAP) “data supporting the results in the paper should be archived in an appropriate public archive” http://datadryad.org/pages/jdap
  • 20. Data completeness score Meets JDAP requirements Does not meet JDAP requirements (Roche et al 2015; PLOS Biol)
  • 21. Data reusability score (Roche et al 2015; PLOS Biol)
  • 22. Bad archiving examples • SPSS files archived • Files archived in language other than English with no metadata • Too much data! • Only data (no description) • Principle components without raw data
  • 23. Data completeness - results More than half (56%) of studies did not meet the minimum requirement of JDAP or strong archiving policies passfail (Roche et al 2015; PLOS Biol)
  • 24. Data reusability - results passfail Even more (64%) of studies were archived in a way that partially or entirely prevented reuse (Roche et al 2015; PLOS Biol)
  • 25. How do we increase high quality participation?
  • 26. How do we increase participation?
  • 27. 1. Encourage communication between data generators and re-users (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 28. 1. Encourage communication between data generators and re-users 2. Disclose data re-use ethics (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 29. 1. Encourage communication between data generators and re-users 2. Disclose data re-use ethics 3. Encourage increased recognition of publicly archived data (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 30. 1. Encourage communication between data generators and re-users 2. Disclose data re-use ethics 3. Encourage increased recognition of publicly archived data 4. Facilitate more flexible embargoes on archived data (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 31. • Be mindful of PDA • Provide detailed metadata • Use descriptive file names • Archive unprocessed data • Use standard file formats (i.e. .txt, .csv) • Facilitate data aggregation • Perform quality control How do we increase high quality participation? Key recommendations to improve PDA practices
  • 32. Public Data Archiving: The way forward? • Not everyone is on board • “Empty archives” are a problem in E&E • Willful omission • Lack of knowledge • Solutions • Acknowledge fears and try to alleviate them • Enforcement, reward, flexibility • Educate researchers as to best practices • Recognize individual efforts to increase transparency
  • 33. Many thanks to Ainsley Seago, Luke Holman, Scott Keogh, Pat Backwell, Andrew Cockburn, Todd Vision, Mark Hahnel, the Evolutionary Ecology Reading group at the Australian National University and the Eco-Ethology and Cognitive Sciences lab groups at the University of Neuchatel. Image / illustration credits: A. Seago, Google@binsan5