2. Objectives
2
Understand the current climate around data
management and data sharing
Learn about the basic elements of a data
management plan
Explore some of the best practices for data
documentation, long-term preservation, and data
sharing
Work with the DMPTool to create a data
management plan
9. Funding Agency Requirements
9
Funding Agency Requirement
NSF* • Must include a 2-page DMP in proposal
• Materials collected during research should be shared
NIH • Papers must be submitted to PubMed
• Projects with over $500,000 funding must share data and include
Data Sharing Plan in proposal
USDA • National Institute of Food and Agriculture requires all data to be
submitted to public domain without restriction
NOAA • Soon requiring that all grants include a data sharing plan, which
must also be shared
• All data should be made visible, accessible and independently
understandable to users, within 2 years of end of grant
NASA • Data should be made freely and widely available.
• A data sharing plan and evidence of any past sharing activities
should be included as part of the technical proposal
CDC • All data should be released and/or shared as soon as feasible
10. Exciting News!
10
Beginning January 14, 2013, the Biographical
Sketch(es) for an NSF grant proposal will include
a section on “Products,” and no longer
“Publications.” This way, applicants can include not
just publications, but also datasets, software,
patents and copyrights.
11. Basic DMP Components
11
The NSF requires a 2-page data management
plan with every grant proposal.
Data Description
Data and metadata standards
Data access and sharing policies
Data re-use and re-distribution
Data preservation and archiving
Depending on the funding source and the directorate/division/program, data
management plan requirements may differ.
12. Data Description
12
What kinds of data will you produce?
Numerical data, simulations, text sequences, etc.
Experimental, observational, simulation
Raw, derived
How will you acquire the data?
How will you process the data?
How much data will you collect?
Are you using any existing data?
What QA/QC procedures will you use?
13. Recommendations
13
A short description of your project helps to give
context to why you are collecting the data.
Survey existing data sources.
Can be a narrative paragraph, table, or list.
Keep all raw data separate from analyzed data,
and maintain versions of data during analysis.
Implement QA/QC procedures.
Ex. Two people independently record data
Ex. Tools to audit spreadsheets
14. Example (taken from Oceanography DMP)
14
The project will collect and analyze the following
data:
Conductivity and temperature from moorings and
shipboard CTD surveys
Horizontal currents from Lowered ADCP and moorings.
Horizontal currents from shipboard sonar
Fine and micro-scale velocity from the WHOI High
Resolution Profiler
Fine and micro-scale temperature from fast-response
thermistors (pods)
15. Data and Metadata Formats
15
What metadata will you create/include with data?
i.e.
What does someone else need to know about your
data in order to reuse them?
Where will this be recorded? How? What format?
Will you use a community metadata standard?
Will you conform to community terminology?
16. Recommendations
16
Use metadata standards common in your discipline.
Include a “readme.txt” file that describes the who, what,
where, when and why of the data, at a bare minimum.
Make sure you have recorded the information that you would
need if you were trying to use someone else’s data.
Check with the data repository where you hope to store your
data – sometimes they require a particular metadata
standard.
Use files names that are understandable to humans.
Make sure you record units and have headers for rows and
columns in your tables.
Notes about the data should be recorded alongside the data
by the data collectors.
Thesauri
18. Example
18
From NCAR
(National Center
for Atmospheric
Research)
19. Example (from NASA SEAC4RS DMP)
19
Appendix A SEAC4RS data file naming convention:
dataID_locationID_YYYYMMDD_R#.extension
The only allowed characters are: a-z A-Z 0-9_.- (that is, upper case and lower case alphanumeric, underscore, period, and hyphen). Fields are
described as follows:
dataID: an identifier of measured parameter/species, instrument, or model (e.g., O3; NxOy; and PTRMS). For DC3 and SEAC4RS data files, the PIs
are required to use “DC3-” or “SEAC4RS-” as prefixes for their DataIDs, i.e., DC3-O3 and SEAC4RS-NxOy.
locationID: an identifier of airborne platform or ground station, e.g., GV, DC8. Specific locationIDs for each deployment will be provided on the data
website.
R#: data revision number. For field data, revision number will start from letter “A”, e.g., RA, RB, … etc. Numerical values will be used for the
preliminary and final data, e.g., R1, R2, R3 … etc.
Extension: “ict” for ICARTT files, “h4” for HDF 4 files and “h5” for HDF 5 files.
For example, the filename for the DC-8 Diode Laser Spectrometer H2O measurement made on June, 1, 2012 flight may be: DC3-DLH-
H2O_DC8_20120601_RA.ict (for field data) or
DC3-DLH-H2O_DC8_20120601_R1.ict (for final data)
Appendix B Summary of ICARTT format metadata requirements (also required for HDF 5 files):
Platform and associated location data: Geographic location and altitude will be embedded as part of the data file or provided via a link to the
archival location of the aircraft navigational data.
Data Source Contact Information: phone number, mailing information, and e-mail address shall be given for themeasurement Co-I and one alternate
contact.
Data Information: Clear definition of measured quantities will be given in plain English, avoiding the use of undefined acronyms, along with reporting
units and limitation of data use if applicable.
Measurement Description: A simple description of the measurement technique with reference to readme file and relevant journal publication.
Measurement Uncertainty: Overall uncertainty will need to be given as a minimum. Ideally, precision and accuracy will be provided explicitly. The
confidence level associated with the reported uncertainties will also need to be specified for the reported uncertainties if it is applicable. The
measurement uncertainty can be reported as constants for entire flights or as separate variables. Measurement uncertainty is required by the ICARTT
data file format.
Data Quality Flags: definition of flag codes for missing data (not reported due to instrument malfunction or calibration) and detection limits.
Data Revision Comments: Provide sufficient discussion about the rationale for data revision. The discussions should focus on highlighting issues, solutions,
assumptions, and impact.
20. Policies for Access and Sharing
20
Are your data sensitive, so access by others needs
to be restricted?
What license or publishing model will you use for
your data?
How will you make your data accessible to others?
What data will you make available and at what
stage of your research?
Do you have protocols, such as IRB, that you need to
comply with? If so, how will you do so?
21. Recommendations
21
Apply an open license to data that you will share.
Explain why you cannot share data, if that is the
case.
Forexample, the data used in your research are
proprietary.
Anonymize any sensitive data.
Use a repository that can mediate data sharing if data
cannot be sufficiently anonymized
Comply with IRB restrictions.
That should be obvious, but we’ll say it anyways
Be aware of Georgia Tech Policy…
22. Example (from ICPSR)
22
“ICPSR will make the research data from this project available to the broader social
science research community.
Public-use data files: These files, in which direct and indirect identifiers have been
removed to minimize disclosure risk, may be accessed directly through the ICPSR
Web site. After agreeing to Terms of Use, users with an ICPSR MyData account
and an authorized IP address from a member institution may download the data,
and non-members may purchase the files.
Restricted-use data files: These files are distributed in those cases when removing
potentially identifying information would significantly impair the analytic
potential of the data. Users (and their institutions) must apply for these files,
create data security plans, and agree to other access controls.
Timeliness: The research data from this project will be supplied to ICPSR before
the end of the project so that any issues surrounding the usability of the data can
be resolved. Delayed dissemination may be possible. The Delayed Dissemination
Policy allows for data to be deposited but not disseminated for an agreed-upon
period of time (typically one year).”
23. Policies and Provisions for Re-use
23
Who do you expect will want to or can reuse your
data?
Should there be restrictions on who or how your
data can be reused?
How should others indicate that they have used your
data?
How long will your data be available to others for
reuse?
Does your institution have rules about data?
24. Recommendations
24
Imagine the broadest possible audience for your
data.
Place as few restrictions on your data as you can.
Link your published articles to the data underlying
those data.
Use a repository that can make your data available
far into the future.
Funding Agency Suggested Length of Time for Private Data Retention
NIH No later than the acceptance for publication of main findings from final data set
NOAA 2 years after data collection
NSF-Engineering Directorate 3 years after the end of the project or public release, whichever comes first
NSF-Earth Sciences Division 2 years after data collection
NSF-Ocean Sciences Division 2 years after data collection
25. Example (from USC)
25
“USC’s policy is to encourage, wherever appropriate,
research data to be shared with the general public
through internet access. This public access will be
regulated by the university in order to protect privacy
and confidentiality concerns, as well to respect any
proprietary or intellectual property
rights. Administrators will consult with the university’s
legal office to address any concerns on a case-by-case
basis, if necessary. Terms of use will include
requirements of attribution along with disclaimers of
liability in connection with any use or distribution of the
research data, which may be conditioned under some
circumstances.”
26. Archiving and Preservation
26
What formats for your data will you use? Are they
preservation friendly?
What repository or data archive can take your
data when you are finished?
How do they preserve/share your data?
What are their access policies?
Is any extra work needed to prepare data for the
repository?
Who will be responsible for final preservation?
27. Recommendations
27
Appraise your data, selecting those with long-term
value, and document your choices.
Use preservation friendly digital formats.
Non-proprietary,commonly used
You may need to transform data into new format.
Find a repository that will take your data, and plan
to comply with their policies early on.
Look into using SMARTech!
P.I.’s should ultimately be responsible for dealing
with the final disposition of the data.
28. Example (from DataOne)
28
Short Term:
The data product will be updated monthly reflecting updates to the record, revisions due to
recalibration of standard gases, and identification and flagging of any errors. The date of the update
will be included in the data file and will be part of the data file name. Versions of the data product
that have been revised due to errors/updates (other than new data) will be retained in an archive
system. A revision history document will describe the revisions made. Daily and monthly backups of the
data files will be retained at the Keeling Group Lab (http://scrippsco2.ucsd.edu, accessed 05/2011),
at the Scripps Institution of Oceanography Computer Center, and at the Woods Hole Oceanographic
Institution’s Computer Center.
Long Term:
Our intent is that the long term high quality final data product generated by this project will be
available for use by the research and policy communities in perpetuity. The raw supporting data will be
available in perpetuity as well, for use by researchers to confirm the quality of the Mauna Loa Record.
The investigators have made arrangements for long term stewardship and curation at the Carbon
Dioxide Information and Analysis Center (CDIAC), Oak Ridge National Laboratory (see letter of
support). The standardized metadata record for the Mauna Loa CO2 data will be added to the
metadata record database at CDIAC, so that interested users can discover the Mauna Loa CO2 record
along with other related Earth science data. CDIAC has a standardized data product citation including
DOI, that indicates the version of the Mauna Loa Data Product and how to obtain a copy of that
product.
33. Step 2: Create a Plan
33
Select a Funding Agency.
Email is sent to
Georgia Tech
Library.
34. Creating and Naming your Plan
34
Strongly Recommend
Naming Plan “[Insert
Proposal Title Here]
Data Management
Plan”.
35. Step 3: One Section at a Time
35
Sections are
different
depending on
funding
source.
Georgia Tech
and DataONE
Enter your have resources
answers here. available for
every section.
38. Step 4: Export
38
Now that you have
the content, you can
export your plan.
39. Step 5: Share plan
39
Send your plan to the Research Data
Librarian (Me!) to look over your plan.
Have your colleagues look at your plan.
Do you know your grant officer?
40. Step 6: Finish and Start Research!
40
Add plan to proposal or distribute among
research team
Begin your newly funded research!
41. Other Data Management Plan Resources
41
Digital Curation Centre -
http://www.dcc.ac.uk/resources/data-management-plans
ICPSR – while made for Social Science data, it has great
resources for anyone:
http://www.icpsr.umich.edu/icpsrweb/content/datamanage
ment/dmp/plan.html
UK Data Archive - http://www.data-
archive.ac.uk/media/2894/managingsharing.pdf
42. Questions?
42
Lizzy Rolando
Research Data Librarian
lizzy.rolando@library.gatech.edu
404.385.3706
http://libguides.gatech.edu/research-data