3. Why share and preserve data?
● To meet funding agency and/or publisher
requirements
● To validate research results
● To enable the re-use or re-purposing of data
● To enhance research impact (visibility, citations, etc.)
4. Data sharing and management snafu
in three short acts
http://youtu.be/N2zK3sAtr-4
5. Barriers to data sharing
● Takes too much time
● Fear of getting ‘scooped’
● Fear of misinterpretation or misuse of data
● Fear of exposing errors
● No scholarly credit
● No established culture of data sharing in many fields
6. Journal data sharing policies
In press at Journal of the Association for Information Science and Technology
Out of 371 science and
social science journals
surveyed, ~50% had
data sharing policies.
Example:
http://www.plos.org/policies/
7. National Institutes of Health
National Science Foundation
National Endowment for the Humanities
Department of Education
Department of Energy
American Heart Association
Department of Health and Human Services
National Aeronautics and Space Administration
US Geological Survey
Centers for Disease Control and Prevention
Bill and Melinda Gates Foundation
Department of Agriculture
Institute of Museum and Library Services
Alfred P. Sloan Foundation
Gordon and Betty Moore Foundation
Funder data sharing policies
8. National Science Foundation
Data Management Plans
All grant applications must include a
1-2 page data management plan
describing how data will be managed
during the project and shared after the
project.
9. National Science Foundation
Data Management Plans
1. Types of data produced
What data will be produced?
How much data will be produced?
1. Data formats and metadata
What file formats will be used?
How will data be documented and described?
1. Policies for access and sharing
When and how will data be distributed?
How will privacy or intellectual property concerns be addressed?
1. Policies for re-use
What conditions will be placed on data re-use, re-distribution, or
production of derivatives?
1. Plans for archiving and long-term preservation
For how long will data be kept?
What preservation strategies will be used?
11. Ways of sharing research data
Data
e-mailed
upon
request
Public data
repository or
archive
Data posted
on personal
website
Data as
supplemental
files for journal
articles
Data Sharing Continuum
12. Data repositories
Things to consider when selecting
and using a repository:
● Open vs. restricted access
● Sustainability and preservation policy
● Proprietary vs. non-proprietary file formats
● Amount of data description/metadata
(data package-level, file-level, item-level)
● Associated code and software
Hundreds to thousands of general, institutional, and
subject-specific data repositories.
Directories of data repositories: databib.org, re3data.org
13. Data repository safari
● What is the data deposit process?
● Are there data deposit fees?
● Are data easy to browse/search?
● How extensive is the associated
metadata or documentation?
● How long will data be preserved?
14. Data journals and data papers
Article outline:
● Abstract
● Background
● Methods
● Data records
● Technical validation
● Usage notes
● References
● Data citations
15. Data journals and data papers
>180 data journals in
many subject areas:
● General Science
● Agriculture
● Archeology
● Astronomy
● Biomedicine
● Chemistry and physics
● Digital humanities
● Earth sciences
● Ecology and
evolutionary biology
● Psychology
● Public health & policy
● Robotics
● Statistics
16. Example of paper linked to dataset
Data paper Data repository
Digital object identifier
When trying to understand what is involved in research data management, it helps to think about the entire research data lifecycle. Plan: When starting a research project, it’s good practice to map out a plan for how data will be managed both during and after the project. Create and Analyze: During the course of a research project, data are created and analyzed. Research data can take the form of spreadsheets, documents and text files, images, audio and visual files, or computer code. During these stages, aspects of the data should be documented (e.g., data collection instrument settings, description of environmental conditions, description of data processing steps) so that the data can be understood at a later date or by other users. Share and Preserve: After the completion of the project, data can (or should) be preserved and shared with others. This presentation will focus on issues and avenues of data sharing and preservation.
There are many reasons to share and preserve research data. (1) A growing number of both public and private funding agencies and journal publishers encourage or require researchers to make their data accessible to others. (2) Sharing data can help make the research process more transparent and allow the findings reported in publications to be validated (i.e., to back up findings). (3) Data can sometimes be re-used or re-purposed; data from different studies can be analyzed together (i.e., meta-analyses) or used to answer new research questions. (4) As research data becomes more frequently treated as a “first-class” research objects that can be cited just like traditional publications, the sharing or publication of research data can serve to enhance the visibility and impact of a researcher’s work..
However, good data management and data sharing is not commonplace among researchers. This is a humorous video depicting what can happen if research data is not carefully managed or prepared for dissemination. Unfortunately, this is a pretty accurate depiction of research data management in many fields.
Several surveys show that most researchers don’t share their data. There are many reasons for this. (1) Organizing, cleaning, documenting, and preparing data to share with others takes a lot of time. (2) Researchers fear that other researchers might “steal” their ideas or “beat them to the chase” of publishing research findings. (3) Researchers fear that others might misinterpret their data (e.g., by using the data out of context) or use their data in inappropriate or unintended ways. (4) Opening up datasets allows for the possibility that mistakes in data processing or analysis might be detected. (5) Researchers are not rewarded for sharing their data. Promotion and tenure committees judge researchers by their grants and publications, not by their data sharing practices. (6) As a result of these reasons, and others, there is simply no established culture of data sharing in many fields.
A growing number of journals have data sharing policies. One recent study on this topic found that ~50% of science and social science journals either encourage or require sharing of the data underlying the findings reported in the article. Some journals have relatively strict data sharing requirements (e.g., Nature, Science, PLoS journals).
A growing number of funding agencies (both federal funding agencies and private foundations) expect that data resulting from the funding is shared with others (either other researchers or the public) and/or require a data management or sharing plan as part of the grant application. This is a sample of funding agencies that have data sharing policies and/or require data management or sharing plans.
NSF’s data sharing policy has received the most attention. NSF started requiring data management plans for all grant applications starting in 2011.
Different NSF directorates provide different guidance, but researchers may want to address these 5 aspects of data management in their plans. Here are some example questions that should be answered in different sections of a data management plan.
This plan addresses all the elements 1-5 in a clear and specific fashion in a single page.
There are several different ways of sharing research data. These can be thought of as occurring on a continuum from not-so-good practices to best practices. (1) Researchers can indicate that they will e-mail their data to others upon request. This approach suggests that data sharing may not be a priority of the researcher. Studies show that such requests for data are often ignored. Also, such data may be not accompanied by sufficient documentation to permit re-use (refer back to Data Panda video). (2) Data can be posted on personal or university websites. However, such data is difficult to discover unless you already know that it exists, and websites disappear all the time. (3) Data can be submitted to journals as supplemental files. Again, this data may be difficult to discover unless you already know of its existence, and the journal may be allowed to “control” access to the data. (4) Depositing data in a publicly accessible repository or archive is the best practice. Repositories often make data visible to relevant communities of interest, allow users to search for datasets, require supporting documentation, and commit to long-term preservation.
There are several hundred to thousands of research data repositories. Some are general-purpose repositories (such as Figshare), some are institutional repositories (hosted by university libraries or other research organizations), and some are specific to certain research areas (such as ecology or autism or genetics). Two directories of data repositories can help you find data repositories for in particular subjects. Data repositories differ considerably from each other, and there are several things you should think about when selecting or using a data repository.
Split into small groups and explore either Dryad (a science data repository) or OpenICPSR (a social science repository). See if you can answer these questions. 10 min to explore, 5-10 min to discuss.
Often a complement to data repositories, data journals and data papers are another interesting way for researchers to disseminate their data. Instead of drawing conclusions from data, the purpose of data papers is to highlight and describe datasets that might be useful to other researchers. This is an example data paper from Scientific Data, a new journal from the Nature Publishing Group. The structure of a data paper is different from a traditional journal article. Data papers are peer-reviewed (scientific and technical review). Data papers can be listed on CV just like traditional journal articles; therefore, they provide scholarly credit for data sharing. Researchers can get two publications out of the same work.
The number of data journals is rapidly increasing. There are currently over 180 data journals (either pure or mixed) covering a wide range of subject areas (mention BMC and Frontiers)--most have emerged within the last 10 years.
The location of the underlying data files varies depending on the journal. Usually, data papers describe datasets that are housed in data repositories. For example, this data paper links to the underlying dataset in Dryad using the DOI (digital object identifier) assigned to the dataset.