Meeting the NSF DMP Requirement June 13, 2012

DATA MANAGEMENT
June 13, 2012
PLANS & PLANNING:
MEETING THE NSF
REQUIREMENT

WHO ARE WE?

Heather Coates
Digital Scholarship & Data Management Librarian
Liaison to the School of Public Health
University Library

Kristi Palmer
Digital Scholarship Team Leader
Liaison to the Department of History
and Programs of Women's and American Studies
University Library

LEARNING OBJECTIVES

After attending this workshop:

 You will understand the NSF data policies.
 You will be aware of the relevant data -related services at IUPUI.
 You will have resources to develop a data management plan
(DMP) for your NSF proposal(s).
 You will be able to write a comprehensive DMP for your NSF
proposal(s).
 You will send your DMP draft to the Data Services Program for
review and assistance as needed.

OVERVIEW

 Context for the NSF data policies

 Meeting the NSF DMP requirement
 The requirement: 5 elements
 Developing a Data Management Plan
 Implementing your plan

 Workshop Evaluation ( 5 minutes)

CONTEXT: SCHOLARLY COMMUNICATIONS

 Funding, funding, funding

 Scholarly Impact
 Exposure  increased citation
 More equal access (especially for students)
 Facilitates reproducibility
 Facilitate new discoveries via secondary analysis/data re -use
 Foster productive collaborations
 Lead to new computational techniques

 Planning for the future
 If we can’t find it, it doesn’t exist
 Persistent access
 Long-term preservation of scholarly records

CONTEXT: WHY THE LIBRARY?

preservation, curation, access
 Trusted member of the institution
 Organizational structure lends itself to collaboration with
researchers
 Interdisciplinary by nature
 Existing infrastructure for digital information
 Existing expertise in preserving and providing access to
information
 Program of Digital Scholarship
 Archives

CONTEXT: DATA SERVICES PROGRAM

 Part of the Program of Digital Scholarship
 Mission
 Identifying data issues and connecting you to the solutions
 Services
 Workshops
 Individual consultations
 Data repository
 Resources
 Guide to NSF Data Management Plan Requirement
 Website

CONTEXT: TERMINOLOGY

 Cyberinfrastructure: computing resources & networks, services,
& people (see Empowering People, 2009 for more)
 Data management: technical processing and preparation of data
for analysis
 Data curation: selection of data for preservation and adding
value for current and future use
 Data citation: mechanisms to enable easy reuse and verification,
track impact of data, and create structures to recognize and
reward researchers ( DataCite)
 Data sharing: must take into account ethical and legal issues; a
spectrum with many options
 Data stewardship:

CONTEXT: FEDERAL POLICIES

 Issues in scholarly communication
 Open access
 Open data & data citation
 Data management & curation

 Federal policies (incremental steps towards openness)
 National Research Council, 1985
 Office of Management & Budget, 1999: Circular A-110
 NIH Data Sharing Policy, 2003
 NIH Public Access Policy, 2008
 NSF DMP Requirement, 2011
 Other policies: NEH, NOAA, NASA, Howard Hughes Medical Institute
Wellcome Trust

CONTEXT: IU STRATEGIC PLAN

IU Empowering People Strategic Plan for IT (2009), Action
33:

“IU should provision a data utility service for research data
that affords abundant near- and long-term storage, ease of
use, and preservation capabilities. This data utility will need
to offer a range of services for securing data, providing
authorized access within and beyond IU; ensuring metadata
description, annotation, and provenance; and providing
backup/recovery services.”

CONTEXT: OPEN ACCESS

 What is Open Access?
 Freely available, online, and free of most copyright restrictions
 Why should you care?
 Right thing to do?
 Increase your citations
 “We analysed 119,924 conference articles in computer science and related
disciplines. The mean number of citations to offline articles is 2.74, and the
mean number of citations to online articles is 7.03, an increase of 157%.”
(Lawrence, 2008)
 Publisher functions need not reside in for profit hands
 "Between 1975 and 2005 the average cost of journals in chemistry and
physics rose from $76.84 to $1,879.56. In the same period, the cost of a
gallon of unleaded regular gasoline rose from 55 cents to $1.82. If the gallon
of gas had increased in price at the same rate as chemistry and physics
journals over this period it would have reached $12.43 in 2005, and would
be over $14.50 today.” (Lewis, 2008)

CONTEXT: OPEN ACCESS @ IUPUI AND IU

 IUPUI University Library Program of Digital Scholarship
 http://www.ulib.iupui.edu/digitalscholarship
 Open Journals
 IUPUIScholarWorks-Faculty Scholarship
 Electronic Theses and Dissertations
 Cultural Heritage Collections
 Data
 eArchives

CONTEXT: RESEARCH LIFE CYCLE

Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.
<http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf>.

CONTEXT: BENEFITS OF PLANNING

 Saves time
 Less reorganization down the road
 Increases efficiency
 Gathers necessary information for analysis and writing
 Prevents problems in understanding data and metadata
 Prevents data loss
 If you have a plan, you are more likely to back up your data
 Makes it easier to preserve your data
 Documentation is more easily created throughout a project
 Metadata generation can be automated or incorporated into procedures
 Requirements of some funding agencies and institutions

DMP: INTERPRETING THE POLICY

 Why?
 Increased impact of research money
 Reduce redundant data collection
 Enhance use and value of existing data
 Further scientific research
 Data gathering tool
 What kinds of data are we collecting?
 How are researchers collecting, managing, and preserving data?
 What are community norms?
 Language is broad to allow input from research communities
 Implementation costs of the DMP CAN be included in direct costs

DMP: KEEP IN MIND

 The gist of it…
 Describe what you will do with your data during and after the proposed
project
 Ensures data is safe now and in the future
 DMP should reflect…
 Awareness of data management and curation in your discipline
 Feasible plan to utilize available cyberinfrastructure
 Try to…
 Explain the rationale for your choices
 Identify roles for data management and curation activities

DMP: ELEMENTS

 Types of data
 Standards and metadata
 Access and sharing
 Re-use, re-distribution, and the production of derivatives
 Long-term preservation
 [Budget]

DMP: TYPES OF DATA [1]

Use standards common in your research community

 Characterize the data
 Types of data
 experimental, observational, raw or derived, models, simulations, curriculum
materials, software, images, audio, video, etc.
 File formats (i.e., text, spreadsheet, database, etc.)
 How much data? (# of files, total size)
 Will the data be reproducible?

 Relationship to existing data? (i.e., interoperability)
 Syntactic
 Semantic


 How will data be collected?
 How? (tools, instruments, measurements, etc.)
 When? (timeframe, series)
 Where? (sites, settings)

 How will data be processed?
 Workflows (brief overview using flow chart)
 Software packages

 How will the data be stored and managed?
 File naming conventions
 Version control


 What QA & QC measures will be used?
 Identify steps during processing and analysis to eliminate missing data
points, identify outliers, and provide statistical summaries (e.g., double
data entry, histograms, scatterplots)
 Before data are collected, define and enforce standards and assign
responsibility
 During project, document processes and any changes or deviations

 What is the backup and security plan?
 Identify particular security or confidentiality issues
 Describe location & frequency

 Roles & responsibilities
 Who will carry out data collection, processing, and backup activities?

EXAMPLE: TYPES OF DATA

Atmospheric Concentrations of CO2, Mauna Loa
Observatory, Hawaii, 2011 -2013
https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Fo
rmatted.pdf

Arthropod responses to grassland nutrient limitation
https://www.dataone.org /sites/all/documents/DMP_NutNet_Form
atted.pdf

DMP: STANDARDS & METADATA [1]

 Metadata describes the who, what, when, where, how, why of
the data
 Include workflow: how you get from raw data to final products

 Purpose: enable finding, organization, interoperability,
identification, archiving & preservation

 Standards are commonly agreed upon terms and definitions in a
structured format
 Dublin Core (commonly used by libraries)
 Darwin Core (geographic occurrence of species)
 EML (ecology)
 Data Documentation Initiative (DDI; social sciences)
 IEEE LOM (learning objects metadata)

DMP: STANDARDS & METADATA [2]

 Ask yourself: will your datasets be self -explanatory or
understandable in isolation?

 Decisions to make about metadata
 Relevant standard(s)
 Format
 Content
 What information is needed to use and interpret in 5 years, 25 years?

 How are metadata created?
 Automatically generated
 Manually created

EXAMPLE: STANDARDS & METADATA [1]

Atmospheric Concentrations of CO2, Mauna Loa Observatory,
Hawaii, 2011-2013
https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Fo
rmatted.pdf

Metadata will be comprised of two formats —Contextual
information about the data in a text based document and ISO
19115 standard metadata in an xml file. These two formats for
metadata were chosen to provide a full explanation of the data
(text format) and to ensure compatibility with international
standards (xml format). The standard XML file will be more
complete; the document file will be a human -readable summary of
the XML file.

EXAMPLE: STANDARDS & METADATA [2]

R i o G ra n d e H yd rol ogic G e o d atabase C o m p e n di um
htt ps:/ /www. dataone .org /site s /al l/ doc ume nts /D M P_ Hydrol ogic _ Form atte d.pdf
M i c ro s o f t A c c e s s D ata b a s e fo r ma t w i l l b e u s e d s i n c e i t i s re a d i l y a c c e s s i b l e a n d
i t i s co m p a t i b l e w i t h E S R I A rc G I S ( htt p : / / w w w. e s r i . co m/s o f t wa re /a rc g i s / i n d ex . ht m l ) , a
G e o g ra p h i c I nfo r m at i o n S y s te m s o f t w a re p a c ka g e u s e d by t h e s ta ke h o l d e rs . N a m i n g
co nv e nt i o n s w i l l b e co n s i s te nt – n o s p a c e s w i l l b e u s e d i n ta b l e n a m e s o r f i e l d n a m e s .
T h e f i l e n a m i n g co nv e nt i o n w i l l co n s i s t o f t h e d a ta s o u rc e _ d a ta t y p e fo r m a t fo r ra w d a ta
f i l e s . D a ta re p o r t i n g f u n c t i o n a l i t y w i l l b e b u i l t i nto t h e V B A p ro c e s s i n g p ro g ra m s to
p ro v i d e o u t p u t i n .t x t f i l e fo r m at fo r n u m b e r o f re co rd s p e r s o u rc e w h e n u p d a ta b l e d a ta
s o u rc e s a re ref re s h e d .
Ev e r y ef fo r t w i l l b e m a d e to g o b a c k to t h e a u t h o r i ta t i v e s o u rc e fo r a n
i d e nt i f i e d d a ta s et . Q u a l i t y co nt ro l o f t h e d a ta b a s e w i l l b e p e r fo r m ed u s i n g S Q L
s t a te m e nt s t h a t ca p i ta l i ze o n t h e d a ta b a s e s t r u c t u re to e n s u re re l a t i o n a l d a ta b a s e
i nte g r i t y. A p p ro p r i ate p r i m a r y key s w i l l b e a s s i g n e d to m a n a g e p o s s i b l e d a ta d u p l i ca te s .
Po t e nt i a l d u p l i ca te s i te I D s , w i l l b e h a n d l e d t h ro u g h a u to m a te d p ro c e d u re s a n d t h e
c re a t i o n o f a l te r n a te I D ta b l e s .
A d ata d i c t i o n a r y w i l l b e c re a te d t h a t d ef i n e s t h e ta b l e d ef i n i t i o n , ta b l e
f i e l d s , a n d ta b l e f i e l d d a ta t y p e s . A n e nt i t y - re l at i o n s h i p d i a g ra m w i l l b e c re a te d t h a t
d ef i n e s t h e re l a t i o n a l s t r u c t u re o f t h e d a ta b a s e .
A m eta d a ta re co rd w i l l b e p ro d u c e d u s i n g t h e F G D C s ta n d a rd t h a t d e s c r i b e s t h e
e nt i re g e o d a ta b a s e . T h e F G D C s ta n d a rd w a s c h o s e n d u e to re q u i re d Fe d e ra l g o v e r n m e nt
s t a n d a rd s .

DMP: ACCESS & SHARING

 What are your obligations for sharing?
 Funding agency, institution, other organization, legal, etc.
 What are the ethical or legal issues? (i.e., privacy,
confidentiality, security, intellectual property, or other rights)

 How will the data be made available?
 What is the process for gaining access?
 When will the data be made available?
 When will the data become available?
 For how long will the data be available?
 What is the process for gaining access?
 Who will have access to the data?

DMP: RE-USE, RE-DISTRIBUTION, ETC.

 What rights will you retain before data is made available?
 Will permission restrictions be necessary?
 Limits or conditions for political, commercial, or patent reasons?
 Is there an embargo period? Why?

 Future users and uses
 Who might be interested in the data?
 How might you anticipate this data being used?
 What value might the data have for these people?

EXAMPLE: ACCESS, SHARING, RE-USE

Development of a NanoKlein Calorimeter
http://libguides.unm.edu/content.php?pid=137795&sid=1422879

We expect to apply for a patent for this instrument. All of the
materials submitted as part of the patent process will be a matter
of public record. We will also make technical drawings, test data
and calibration data available through our institutional repository.

Cave Microbiology
http://libguides.unm.edu/content.php?pid=137795&sid=1422879

DMP: LONG-TERM PRESERVATION

Project-based funding does not lend itself to long -term
preservation.

 What data will be preserved?
 What transformations are necessary to prepare the data?
 How long do you think the data will be useful? How long will the
data be preserved?
 Contextual information needed to make the data reusable
 metadata, references, reports, manuscripts, grant proposal, etc.
 Where will it be preserved?
 Links to published materials and other outcomes? Use of persistent
citation?
 Procedures for preservation and back-up?
 Who will be the contact for the dataset?

EXAMPLE: LONG-TERM PRESERVATION [1]

Arthropod responses to grassland nutrient limitation
https://www.dataone.org /sites/all/documents/DMP_NutNet_Form
atted.pdf

We will preserve both arthropod datasets generated during this
project (abundance and stoichiometry) for the long term in the
Digital Conservancy at the U of M. We will include the .csv
files, along with the associated metadata files. We will also submit
an abstract with the datasets that describe their original context
and any potentially relevant project information. Borer will be
responsible for preparing data for long -term preservation and for
updating contact information for investigators.

EXAMPLE: LONG-TERM PRESERVATION [2]

Improving the long-term preservability of HDF-formatted data by
creating maps to file contents
https://www.dataone.org /sites/all/documents/DMP_HDFMap_For
matted.pdf

The writer software will be preserved by the HDF Group for the life
of the HDF libraries. The HDF Group uses industrystandard best
practices to ensure the integrity of their software and systems.
Once the map writer has been used to generate maps for every
HDF file in existence, the continued existence of the writer
software is not required. The reader software will be preserved at
SourceForge.org for as long as there is community interest. The
collection of HDF files will be preserved at NSIDC as long as utility
is deemed high.

IUPUIDATAWORKS

 Institutional repository that can facilitate subject repositories
 Policies are being developed, informed by faculty needs
 Pilot projects
 More support at little/no cost
 Flexibility in what we are willing to do
 New tools to demonstrate impact of research
 The future
 Standardized levels of service
 Standardized policies, responsive to faculty needs
 Cost recovery for significant intellectual/time investment

IMPLEMENTING YOUR PLAN [1]

 The DMP is a working document

 NSF expects progress to be reported (progress reports, final
reports, new grant proposals)

 Incorporate implementation into the project startup process
 C&G, IRB, IACUC all have to be in place before data collection can begin
 Review, revise, and set up your system during startup

 Good documentation ensures…
 A shared understanding of the data throughout a project
 That future researchers will be able to understand data within the
relevant context
 That re-users of data are able to interpret the data appropriately

IMPLEMENTING YOUR PLAN [2]

Research File System: http://pti.iu.edu/storage/rfs
Scholarly Data Archive: http://pti.iu.edu/storage/sda
Research Technologies, UITS: http://uits.iu.edu/page/avel
Core Ser vices, UITS: http://pti.iu.edu/cs
Scholarly Cyberinfrastructure, UITS: http://uits.iu.edu/page/amee
C TSI Tools: http://www.indianactsi.org /rct (Alfresco Share, REDCap )

Program of Digital Scholarship: http://ulib.iupui.edu/digitalscholarship
Center for Research & Learning: http://crl.iupui.edu/
OVCR: http://research.iupui.edu/development/
Office of Academic Affairs: http://www.academicaffairs.iupui.edu
Intellectual Property Policy: https://www.indiana.edu/~vpfaa/
academicguide/index.php/Policy_I-11

IUWare: https://iuware.iu.edu
IUanyWare: https://iuanyware.iu.edu/vpn/index.html
StatMath: http://www.indiana.edu/~statmath/
Statistics Consulting Center: http://www.math.iupui.edu/asci/

PRACTICAL TOOLS

Lynda.com tutorials: http://ittraining.iu.edu/lynda/default.aspx
Cleaning Up Your Excel Data (2010)
Managing & Analyzing Data in Excel (2010)
Data Validation in Depth (2010)
DMPTool: https://dmp.cdlib.org /
DMPOnline: https://dmponline.dcc.ac.uk/
UK Data Archive Costing Tool:
http://www.data-archive.ac.uk/media/257647/
ukda_jiscdmcosting.pdf
Creative Commons Licenses & Data:
http://wiki.creativecommons.org /Data
Licensing Research Data, Digital Curation Centre
http://www.dcc.ac.uk/resources/how -guides/license-research-data
CIC Author Addendum
http://www.cic.net/authors

RECOMMENDED READING

UK Data Archive: Managing & Sharing Data Brochure:
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

MORE RESOURCES

 National Science Board, Digital Research Data Sharing &
Management, 2012 (pre-publication):
http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
 Committee on Science, Engineering, and Public Policy (U.S.).
(2009). Ensuring the integrity, accessibility, and stewardship of
research data in the digital age. Washington, D.C.: National
Academies Press.
 National Science Board Committee on Strategy and Budget Task
Force on Data Policies. (2011). Digital Research Data Sharing &
Management. Washington, D.C.: National Science Board.
 America Creating Opportunities to Meaningfully Promote
Excellence in Technology, Education, and Science Reauthorization
Act of 2010, Pub. L. No. 111 -358. 124 Stat. 3982 (2010).
Retrieved from the Library of Congress Thomas database .

REFERENCES

1. Higgins, S. ( nd). What are metadata standards. http://ww w.dcc.ac.uk/
resources/bri efing -papers/standards -watch-papers/what -are- metadata -
standards
2. Digital Curation Centre. ( nd). DCC Charter and Statement of Principles.
Retrieved from http://ww w.dcc.ac.uk/about -us/dcc- charter.
3. Indiana Universit y. (2011). Indiana Universit y ’s Advanced
Cyberinf rast ructure. Retri eved from
http://pti.iu.edu/cyberinf rast ructure.pdf.
4. Indiana Universit y. (2009). Empowering Peopl e: Indiana Universit y ’s
Strategic Plan for Information Technology. Retrieved from
http://ovpit.iu. edu/st rategic2/ .
5. National Science Foundati on. (2011 ). Award and Administration Guide:
Chapter IV C.4., Disseminati on and Sharing of Research Results. Ret ri eved
from
http://ww w.nsf. gov/pubs/policydocs/pappguide/nsf 1 1001/aag_6. jsp#VI D4 .
6. Lawrence, S., Free online availability substantially increases a paper ’s
impact, Nature, 31 May 2001. http://ww w.nat ure. com/nature/debates/e -
access/Articles/lawrence.html (accessed November 5, 2008,)
7. Lewis, David W. "Librar y budgets, open access, and the future of scholarl y
communication: Transformati ons in academic publishing." C&RL News, May
2008, Vol. 69, No. 5. [Available at:
http://ww w.ala.org /ala/mgrps/di vs/acrl/publicati ons/crlnews/
2008/may/ALA_print _layout _1_ 47113 9_471 139. cf m ]

COMPELLING CASES FOR OPEN DATA

SPARC, Research is more valuable when it ’s shared:
http://www.arl.org /sparc/greaterreach/index.shtml

Tim Berners-Lee: http://www.ted.com/talks/tim_berners_lee_
on_the_next_web.html

Open-source cancer research: http://www.ted.com/talks/
jay_bradner_open_source_cancer_research.html

Polymath problem blogs:
http://polymathprojects.org /about/
http://stevekochscience.blogspot.com/2011/02/open -data-success-
story.html
http://eaves.ca/2011/09/07/the -economics-of-open-data-mini-case-
transit-data-translink/

THANK YOU

Tell us what you think, take a brief survey.

Find us @
http://ulib.iupui.edu/digitalscholarship/dataservices
Heather Coates, hcoates@iupui.edu, 317-278-7125
Kristi Palmer, klpalmer@iupui.edu, 317-274-8230

IUB
Stacy Konkiel, skonkiel@indiana.edu, 812-856-5295

EXTRA: NIH DATA SHARING POLICY

 $500,000 or more in direct costs in any year of the proposed research
 Final research data, not summary statistics or tables, not underlying
pathology reports and other clinical source documents, might include
both raw data and derived variables
 If an application describes a data -sharing plan, NIH expects that plan
to be enacted.
 NIH expects the timely release and sharing of data to be no later than
the acceptance for publication of the main findings from the final
dataset.
 It is the responsibility of the investigators, their Institutional Review
Board (IRB), and their institution to protect the rights of subjects and
the confidentiality of the data. Prior to sharing , data should be
redacted to strip all identifiers, and effective strategies should be
adopted to minimize risks of unauthorized disclosure of personal
identifiers.

EXTRA: NIH DATA SHARING PLAN

 describe briefly the expected schedule for data sharing
 the format of the final dataset
 the documentation to be provided
 whether or not any analytic tools also will be provided
 whether or not a data -sharing agreement will be required
 if so, a brief description of such an agreement (including the criteria for
deciding who can receive the data and whether or not any conditions
will be placed on their use)
 mode of data sharing (e.g., under their own auspices by mailing
a disk or posting data on their institutional or personal
website, through a data archive or enclave)
 Applicants may request funds in their application for data
sharing.

RESOURCES

National Institutes of Health, Data Sharing Policy
http://grants.nih.gov/grants/policy/data_sharing /data_sharing_gui
dance.htm
NIH Public Access Policy Implications
http://publicaccess.nih.gov/public_access_policy_implications_20
12.pdf

Meeting the NSF DMP Requirement June 13, 2012

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Meeting the NSF DMP Requirement June 13, 2012

Similaire à Meeting the NSF DMP Requirement June 13, 2012 (20)

Plus de IUPUI

Plus de IUPUI (20)

Dernier

Dernier (20)

Meeting the NSF DMP Requirement June 13, 2012

Notes de l'éditeur