Ruth Duerr, data scientist and steward at the National Snow & Ice Data Center, CIRES and CU-Boulder, describes the new data citation policy for American Geophysical Union (AGU) journals. She shows examples of each part of a good citation, and answers questions about where to house data.
1. Data Citation and You: The
new AGU guidelines for data
citation
Ruth Duerr
This presentation is licensed by Ruth Duerr under a Creative Commons Attribution-Share Alike 3.0 License
2. Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
American Geophysical Union Publications Policy
AGU affirmed in its 2012 position statement that:
“Earth and space science data should be widely
accessible in multiple formats and long term‐
preservation of data is an integral responsibility of
scientists and sponsoring institutions.”
2
3. American Geophysical Union Publications Policy
"all data necessary to understand, evaluate, replicate, and build
upon the reported research must be made available and
accessible whenever possible.”
The policy requires:
• That data availability be listed in the Acknowledgments section
• Data policy compliance be acknowledged during manuscript submission
• If data are not available (such as for proprietary or security reasons), a statement
to this effect explaining the details for requesting a data policy waiver is included
in the Acknowledgment section of the manuscript AND in the Cover Letter.
• Published data sets should be cited according toESIP
Commons guidelines.”
3 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
4. What counts as data?
“For the purposes of this policy, data include, but are not limited
to, the following:
• Data used to generate, or be displayed in, figures, graphs,
plots, videos, animations, or tables in a paper.
• New protocols or methods used to generate the data in a
paper.
• New code/computer software used to generate results or
analyses reported in the paper.
• Derived data products reported or described in a paper.
4 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
5. Where should my data be?
• AGU encourages authors to identify and archive their data in
approved data centers (
http://publications.agu.org/files/2014/01/Data-Repositories.pdf)
• Many AGU journals allow supplements
• Otherwise, authors are expected to "curate the data for at least 5
years after publication and provide a transparent process to make
the data available to anyone upon request.”
5 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
6. Example Acknowledgements
• The data for this paper are available at NOAA’s Comprehensive
Large Array Data Stewardship System. Data set: ATMS. Dataset
name:
TATMS_npp_d20140108_t2356553_e0004549_b11401_c201401
09060453863818_noaa_ops.h5
• Data supporting Figure 3 are available as in Supporting
Information Table S1.
• Data to support this article are from the U.S. Department of
Energy. Because of national security issues, the data cannot be
released.
6 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
7. The Federation of Earth Science Information Partners (ESIP)
• “an open networked community that brings together
Earth science, data and information technology
practitioners”
• Sponsored by NASA and NOAA
• More than 150 members:
Type I - data centers,
Type II - service providers, and
Type III - commercial and non-commercial tool
developers
7 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
8. ESIP Data Citation Guidelines
• Data Citation Guidelines approved by the ESIP
Assembly 5 January, 2012
• Mandatory content:
Author
Release Date
Title
Archive and/or Distributor
Version
Locator, Identifier, or Distribution Medium
Access Date and Time
8 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
9. ESIP Data Citation Guidelines
• Suggested Content as Needed:
Subset Used
Editor, Compiler, or other important role
Archive or Distributor Place
Distributor, Associate Archive, or other Institutional Role
Data Within a Larger Work
• More than 20 citation examples included:
Bockheim, J. 2003. "University of Wisconsin Antarctic Soils Database".
In International Permafrost Association Standing Committee on Data
Information and Communication (comp.). 2003. Circumpolar Active-
Layer Permafrost System, Version 2.0. Edited by M. Parsons and T.
Zhang. Boulder, CO: National Snow and Ice Data Center/World Data
Center for Glaciology. CD-ROM.
9 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
10. Mandatory Content: Authors
• Name of the individual(s) or organization(s) whose
intellectual work led to the creation of the data set (i.e.,
deserves to receive credit and accept responsibility for the
data set).
• Doe, J. and R. Roe. 2001. The FOO Data Set. The
FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983.
Accessed 1 May 2011.
• The FOO Working Group. 2001. The FOO Data Set.
The FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983.
Accessed 1 May 2011.
10 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
11. Mandatory Content: Release Date
• The year of release for a completed data set.
• Doe, J. and R. Roe. 2001. The FOO Data Set. The FOO Data
Center.http://dx.doi.org/10.xxx
• Capture when updates occurred if detailed versioning information is
missing.
• Doe, J. and R. Roe. 2001, updated 2005. The FOO Occasionally
Updated Data Set. The FOO Data Center.
http://dx.doi.org/10.xxxx/notfoo.547983. Accessed 1 May 2011.
• For an ongoing data set that is updated on a regular or continual basis,
list the first year of release followed by the last update. Updates could
occur annually or more frequently.
• Doe, J. and R. Roe. 2001, updated daily. The FOO Time Series
Data Set. The FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983. Accessed 1 May
2011.
11 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
12. Mandatory Content: Title
• This is the formal title of the data set - not of the project
or a related publication.
• Doe, J. and R. Roe. 2001. The FOO Data Set. The
FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983.
Accessed 1 May 2011.
12 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
13. Mandatory Content: Archive and/or Distributor
• This is the organization that maintains and manages the release
or distribution of the data set. There is often an implied
responsibility for stewardship of the data set.
• "The entity that holds, archives, publishes, prints, distributes,
releases, issues, or produces the resource. This property will be
used to formulate the citation, so consider the prominence of
the role."- DataCite
• This may be an appropriate place to recognize a major sponsor
of the data.
• Doe, J. and R. Roe. 2001. The FOO Data Set. The FOO
Funding Agency Data Center.
http://dx.doi.org/10.xxxx/notfoo.547983. Accessed 1 May
2011.
13 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
14. Mandatory Content: Versions
• Careful versioning and documentation of version changes
are central to enabling accurate citation.
• Data stewards need to track and clearly indicate precise
versions as part of the citation for any version greater than
1.
• It may be appropriate to track major and minor versions.
• Doe, J. and R. Roe. 2001. The FOO Data Set. Version
2.3. The FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983.
Accessed 1 May 2011.
14 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
15. Mandatory Content: Locator, Identifier, or Distribution Medium
• Background
• If there is one fixed medium, indicate it. For example, CD-ROM,
DVD.
• More typically, data are available over the internet or through
multiple digital media options. Then it is necessary to include a
persistent reference to the location of the data. Any reasonably
persistent location service such as DOIs, ARKs, Handles, PURLs
etc. is acceptable.
• Scientific publishers, however, are most familiar with the DOI.
• Furthermore, Thomson Reuters, who manages the Web of
Science, now indexes data sets.
• Data sets that are cited by articles in the Web of Science also
show up in Web of Science, so there is an incentive for authors
to cite data sets.
15 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
16. Mandatory Content: Locator, Identifier, or Distribution Medium
• Persistent identifiers should point to a data landing page that
describes and provides access to the data.
• Other locators and identifiers may be more appropriate for locating
individual records or files.
• Best practice is that the suffix of the identifier does not include a
reference to the archive in case the data are moved from the original
location where the persistent identifier was assigned initially.
• Doe, J. and R. Roe. 2001. The FOO Data Set. The FOO Data
Center. CD-ROM.
• Doe, J. and R. Roe. 2001. The FOO Data Set. Version 2.3. The
FOO Data Center. http://dx.doi.org/10.xxxx/notfoo.547983.
Accessed 1 May 2011.
16 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
17. Mandatory Content: Access Date and Time
• Because data can be dynamic and changeable in ways that
are not always reflected in release dates and versions, it is
important to indicate when on-line data were accessed.
• Depending on how frequently the data change, it may be
necessary to include time as well as date of access.
• Doe, J. and R. Roe. 2001. The FOO Data Set. Version
2.3. The FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983.
Accessed 1 May 2011.
17 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
18. As Needed Content: Subset Used
• It is necessary to enable "micro-citation" or the ability to
refer to the specific data used--the exact files, granules,
records, etc.
• Equivalent to quoting a certain passage in a book,
where one then references a specific page number in
the citation.
• Alternatively, one might make reference to the
"structural index" of a canonical text (e.g. book,
chapter, and verse in the King James Bible).
18 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr
19. As Needed Content: Subset Used
• Use the structural form of a data set to cite a specific subset.
• With Earth science data, subsets can often be identified by
referring to a temporal and spatial range.
• Doe, J. and R. Roe. 2001, updated daily. The FOO Gridded
Time Series Data Set. Version 3.2. Oct. 2007- Sep. 2008,
84°N, 75°W; 44°N, 10°W. The FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983. Accessed
1 May 2011.
• Sometimes, the data may be packaged in different sub-
collections or representations which can be referenced.
• Doe, J. and R. Roe. 2001. The FOO Data Set. Version 2.0
shapefiles. The FOO Data
Center.http://dx.doi.org/10.xxxx/notfoo.547983. Accessed
1 May 2011.19 Data Citation and You: The new AGU guidelines for data citation, March. 2014, Ruth Duerr