4. What about DATA?
Do scientists have an
obligation to make their
data freely available?
Big push in the biological sciences for
Public Data Archiving
“The data and its analysis are the scientific product.
The paper is just an advertisement.”
Richard McElreath
McElreath R (2016) Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press: 469 pp
5. What is Public Data Archiving?
(Figure from Reichman et al 2011 Science)
The process of storing data
and associated metadata in a
repository that is open to the
public and where data can be
accessed and downloaded
freely by a third party.
6. Why do it?
• avoids data loss from hardware malfunction/obsolescence or from researchers moving on
to different projects or retiring
• encourages good metadata production to ensure that datasets are interpretable
• increases the ability to evaluate and reproduce studies
• increases opportunities for teaching and learning
• encourages a stronger sharing culture
• improves the return per research dollar
• increased citations and collaborations
(Huang & Qiao 2011 TREE, Molloy 2011 PLOS Biol, Piwowar et al 2011 Nature, Reichman et al
2011 Science, Tenopir et al 2011 PLOS One, Whitlock 2011 TREE, Whitlock et al 2010 Am Nat)
7. Most research is paid for by…..
Data as a public good?
TAXPAYERS
in the form of government grants and salaries
So, who really “owns” the data?
9. Journals that require data archiving
Examples:
•The American Naturalist
•Biological Journal of the Linnean Society
•Biology Letters
•BMC Ecology
•BMC Evolutionary Biology
•BMJ
•BMJ Open
•Ecological Applications
•Ecological Monographs
•Ecology
•Ecosphere
•Evolution
•Evolutionary Applications
•Frontiers in Ecology and the Environment
•Functional Ecology
•Genetics
•Heredity
… http://datadryad.org/pages/jdap
10. Data archiving trends in Ecology & Evolution?
Data deposition has increased considerably
in Dryad and other repositories.
(Vision 2013 figshare)
Members of the JDAP consortium have
tripled since its inception in 2011.
(Magee et al 2014 PLOS One)
Enforcing Public Data Archiving policies has had a positive effect on data deposition rates.
(Vines et al 2013 FASEB Journal, Magee et al 2014 PLOS One)
11. The problem…
Many researchers harbour concerns about making their data publicly available.
This is particularly true in fields such as ecology and evolutionary biology, where datasets are
often complex, have a long shelf life, and can be used to test multiple hypotheses.
12. Why are researchers reluctant to archive/share their data?
• Proper data archiving takes time (away from publishing).
• Competition for publications - fear of being “scooped”.
• Concerns about data misinterpretation / misuse.
• Lack of recognition for Public Data Archiving.
13. Benefits vs. Costs
• avoids data loss from hardware
malfunction/obsolescence or from researchers moving
on to different projects or retiring
• encourages good metadata production to ensure that
datasets are interpretable
• increases the ability to evaluate and reproduce studies
• increases opportunities for teaching and learning
• encourages a stronger sharing culture
• improves the return per research dollar
• increased citations and collaborations
• funded by taxpayers
Good for scientific
community
But costs are to
individual
researchers
14. “63% of PIs were against PDA as currently required”
“41% of respondents said that they have avoided
publishing in journals that require [PDA]”
“53% intend to avoid publishing in [journals requiring
PDA] in the future”
“A key concern is that [PDA] will be a disincentive
both for the initiation of long-term studies, and for
maintenance of ongoing studies.”
15. Are we filling up ‘empty archives’?
(Nelson 2009 Nature)
Most journals and databases don’t verify the quality of archived data beyond
basic checks like ensuring that a data availability statement and a valid DOI
number are provided in the paper.
(Noor et al 2006 PLOS Biol, Costello et al 2013 TREE)
16. What’s happening in molecular biology?
It’s not looking good…
1) Ioannidis et al 2008 Nat Gen:
Review of microarray studies :
- only 2 of 18 were reproducible
2) Gilbert et al 2014 Mol Ecol:
Review of pop genetics studies:
- 30% of analyses irreproducible
- 35% of datasets insufficiently
described
18. PDA in E&E – how well are we doing?
We assessed 100 non-molecular studies in journals either have adopted the Joint
Data Archiving Policy (JDAP) or have a strong data archiving policy.
Completeness criterion
Reusability criterion
19. Joint Data Archiving Policy (JDAP)
“data supporting the results in the paper should be archived in an appropriate public archive”
http://datadryad.org/pages/jdap
22. Bad archiving examples
• SPSS files archived
• Files archived in language other than English with no metadata
• Too much data!
• Only data (no description)
• Principle components without raw data
23. Data completeness - results
More than half (56%) of studies did not meet the minimum
requirement of JDAP or strong archiving policies
passfail
(Roche et al 2015; PLOS Biol)
24. Data reusability - results
passfail
Even more (64%) of studies were archived in a way that partially
or entirely prevented reuse (Roche et al 2015; PLOS Biol)
25. How do we increase high quality participation?
27. 1. Encourage communication between data generators and re-users
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
28. 1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
29. 1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
3. Encourage increased recognition of publicly archived data
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
30. 1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
3. Encourage increased recognition of publicly archived data
4. Facilitate more flexible embargoes on archived data
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
31. • Be mindful of PDA
• Provide detailed metadata
• Use descriptive file names
• Archive unprocessed data
• Use standard file formats (i.e. .txt, .csv)
• Facilitate data aggregation
• Perform quality control
How do we increase high quality participation?
Key recommendations to improve PDA practices
32. Public Data Archiving: The way forward?
• Not everyone is on board
• “Empty archives” are a problem in E&E
• Willful omission
• Lack of knowledge
• Solutions
• Acknowledge fears and try to alleviate them
• Enforcement, reward, flexibility
• Educate researchers as to best practices
• Recognize individual efforts to increase transparency
33. Many thanks to Ainsley Seago, Luke Holman, Scott Keogh, Pat
Backwell, Andrew Cockburn, Todd Vision, Mark Hahnel, the
Evolutionary Ecology Reading group at the Australian National
University and the Eco-Ethology and Cognitive Sciences lab
groups at the University of Neuchatel.
Image / illustration credits: A. Seago, Google@binsan5