Many federal funding agencies, including NIH and most recently NSF, are requiring that grant applications contain data management plans for projects involving data collection. To support researchers in meeting this requirement, ICPSR is providing a set of tools and resources for creating data management plans. This presentation will covers:
• ICPSR’s Data Management Plan Website
• Suggested Elements of a Data Management Plan
• Example Data Management Plan Language
• Designating ICPSR as an Archive in a Data Management Plan
• Additional Resources for a Preparing Your Data Management Plan
Presented by Amy Pienta, Research Scientist, University of Michigan
1. Guidance on Preparing a Data Management Plan Amy Pienta, Acquisitions Director & Associate Research Scientist, ICPSR
2.
3. National Science Foundation The National Science Foundation has released a new requirement for proposal submissions regarding the management of data generated using NSF support. Starting in January, 2011, all proposals must include a data management plan (DMP). Up to 2 pages Plan will be reviewed What data are generated by your research? What is your plan for managing the data?
4. ICPSR’s Data Management Plan (DMP) Website http://www.icpsr.umich.edu/icpsrweb/ICPSR/dmp/index.jsp Suggested Elements of a DMP Example DMP Language Designating ICPSR as an Archive Additional Resources Webinar Outline
5.
6.
7. Collection DeliveryPreservation Mary Vardigan Nancy McGovern Wendi Fornoff Elizabeth Bedford Matthew Richardson Collection Development Amy Pienta Who @ ICPSR
55. Involving ICPSR Further In addition to reviewing ICPSR’s website materials, you may want to: Contact ICPSR to discuss whether a future data collection fits within the ICPSR collection. Note: The earlier the better! Request a Letter of Support for a grant application from ICPSR indicating support for archiving the data with ICPSR. Note: The earlier the better! Determine if there are any costs to archiving the data with ICPSR. Note: The earlier the better!
This table is available from the website. The bar across the top shows the websites that were incorporated into the env scan. The rows list the various elements the website we looked at covered as being important to a data management plan. Australia National University's Information Literacy Program DPM Template is a formatted template for any discipline. The Australian National Data Service created Data Management Planning, a document that lists the questions that should be answered in a data management plan.The Digital Curation Centre created its Data Management Plan Content Checklist as "a comprehensive list of the details that researchers may be asked to include in such plans.”The Finnish Social Science Data Archive's Data Management Planning Website lists questions that should be answered in a data management plan. It is aimed at social science researchers in particular. Geoscience Australia's Guide to Preparation of Data Management Plans. MIT Libraries' Data Management Webpage provides a list of questions that should be answered in data management plan.The National Science Board's Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century is one of the foundational documents in the US' current push for data sharing. It gives broad guidelines for what should be included.The National Science Foundation Directorate for Engineering's Data Management for NSF Engineering Directorate Proposals and Awards is the first document to directly address the coming NSF requirementThe Queensland University of Technology QUT Data Management Checklist is a highly structured, populable template. The UK Rural Economy and Land Use Programme's Data Management Plan is a form that must be filled out by RELU award holders at the outset of their projects. The University of Melbourne's Research Data Management Plan Template is a te,plate aimed at university researchers.
In all, the existing websites generally concurred that there are 8 elements that a data management plan should cover. I am going to talk about these 8 highly recommended elements first. For each of the 8 elements - I will give a couple of examples of what kinds of text might be written. And, I will show you what to write if the data will be archived with ICPSR.Data description is important to include because it will help reviewers understand the characteristics of the data, their relationship to existing data, and any disclosure risks that may apply.
The second example describes a data collection that is also nationally representative, but also mentions the sensitive nature of the data and the need for using a restricted-use data agreement to disseminate the data.
The next element – access and sharing – is important to include in a DMP to specify how the data will be accessed and shared. Sharing data helps to advance science and to maximize the research investment. A recent paper [link: http://deepblue.lib.umich.edu/handle/2027.42/78307] reported that when data are shared through an archive, research productivity increases and many times the number of publications result as opposed to when data are not shared.With respect to timeliness of data deposit, archival experience has demonstrated that the durability of the data increases and the cost of processing and preservation decreases when data deposits are timely. It is important that data be deposited while the producers are still familiar with the dataset and able to transfer their knowledge fully to the archive.
Self-dissemination example:Describe andjustify this Still indicate a plan to preserve the data.
Describe a delayed dissemination plan.
Designate a institutional repository on one’s campus.
If you intend to use ICPSR you can first include this text… It designates ICPSR as the intended archival home for the data.
It indicates our dissemination commitment to both public and restricted-use data files. It also covers ICPSR’s terms of use for these kinds of data.
It covers our expectation that the data be deposited before the end of the project to ensure continuity of data management. It also mentions delayed dissemination policy.
Why this is importantGood descriptive metadata are essential to effective data use. Metadata are often the only form of communication between the secondary analyst and the data producer, so they must be comprehensive and provide all of the needed information for accurate analysis.Structured or tagged metadata, like the XML format of the Data Documentation Initiative (DDI) standard, are optimal because the XML offers flexibility in display and is also preservation-ready and machine-actionable. The first example references this ideal.
The second example talks about clinical data and references the metadata standard common to clinical data which is called CDISC.
This is the language you can use if depositing with ICPSR. It references our reliance on the DDI standard. That study-level descriptions of the data will be created.
Our use of a standard data citation and a persistent identifier to locate the data.And, our documentation of data down to the variable-level.
Why this is importantGood descriptive metadata are essential to effective data use. Metadata are often the only form of communication between the secondary analyst and the data producer, so they must be comprehensive and provide all of the needed information for accurate analysis.Structured or tagged metadata, like the XML format of the Data Documentation Initiative (DDI) standard, are optimal because the XML offers flexibility in display and is also preservation-ready and machine-actionable.
In order to disseminate data, archives need a clear statement from the data producer of who owns the data. The principal investigator's university is usually considered to be the copyright holder for data the PI generates. The first example makes this clear.
Many archives do not ask for a transfer of copyright but instead just request permission to preserve and distribute the data. The 2nd example.
Copyright may also come into play if copyrighted instruments are used to collect data. In these cases, data producers should initiate discussions with archives in advance of data deposit.
This is the ICPSR text that can be included.
Protection of human subjects is a fundamental tenet of research and an important ethical obligation for everyone involved in research projects. Disclosure of identities when privacy has been promised could result in lower participation rates and a negative impact on science. One of the most important considerations is that whenever possible – researchers collecting data should not include langue in the informed consent that prohibits data sharing.The first example makes clear how to write this into a DMP.
This is specific language for an informed consent that can be mentioned in the DMP. ICPSR has been recommending this language for over a year.
This example pertains to HIPPA information being collected in studies.
The ICPSR text includes how the informed consent should be written. It also talks about ICPSR procedures that minimize disclosures and protect confidentiality that are part of our routine data processing.
Why this is importantDepositing data and documentation in formats preferred for archiving can make the processing and release of data faster and more efficient.Preservation formats should be platform-independent and non-proprietary to ensure that they will be usable in the future.
Video format data example. You can look at our website for more examples for other kinds of data formats.
ICPSR covers our preference of what format to receive data in.It also covers access formats that we create.It covers preservation formats for the data that we maintain.
Why this is importantDigital data need to be actively managed over time to ensure that they will always be available and usable. This is important in order to preserve and protect our investment in science. Preservation of digital information is widely considered to require more constant and ongoing attention than preservation of other media. Depositing data resources with a trusted digital archive can ensure that they are curated and handled according to good practices in digital preservation.The first example here highlights the preservations being handled by a repository.
Even with self-dissemination of data - this should be covered in a DMP. This example highlights a copy being preserved by a archive or repository.
The ICPSR text highlights our 50year track record, our commitment to migrating data to viable future formats/technologies and describes our succession plan for our content in the event of the archive failing.
Why this is importantDigital data are fragile and best practice for protecting them is to store multiple copies in multiple locations.
The ICPSR text for a DMP describes our multiple copies held by partner organizations. It also mentions that the multiple copies are kept in synch.
More optional elements…When the data are unique.
When the data relate to past data that have been collected.
Why this is importantIt is important to describe situations in which research data are in some way atypical with respect to how they will be organized. For example, some data collections are dynamically changing and version control is central to how the data will be used and understood by the scientific community.
Why this is importantProducing data of high quality is essential to the advancement of science, and every effort should be taken to be transparent with respect to data quality measures undertaken across the data life cycle.
Why this is importantSecurity for digital information is important over the data life cycle. Raw research data may include direct identifiers or links to direct identifiers and should be well-protected during collection, cleaning, and editing. Processed data may or may not contain disclosure risk and should be secured in keeping with the level of disclosure risk inherent in the data. Secure work and storage environments may include access restrictions (e.g., passwords), encryption, power supply backup, and virus and intruder protection.
Why this is importantTypically data are owned by the institution awarded a Federal grant and the principal investigator oversees the research data (collection and management of data) throughout the project period. It is important to describe any atypical circumstances. For example, if there is more than one principal investigator the division of responsibilities for the data should be described.
Why this is importantHow will the costs for creating data and documentation suitable for archiving be paid?
Why this is importantSome data have legal restrictions that impact data sharing—for example, data covered by HIPAA, proprietary data, and data collected through the use of copyrighted data collection instruments. How these issues might impact data sharing should be described fully in the data management plan.
Why this is importantThe audience for the data may influence how the data are managed and shared—for example, when audiences beyond the academic community may use the research data.
Why this is importantNot all data need to be preserved in perpetuity, so thinking through the proper retention period for the data is important, in particular when there are reasons the data will not be preserved permanently.