This slide deck provides an overview and resources to respond to the OSTP memo with the subject: Increasing Access to the Results of Federally Funded Scientific Research issued by John P. Holdren in February 2013. It provides resources and information agencies, foundations, and research projects can use to assemble achieve public access to scientific data in digital formats.
2. The OSTP Memo
Guidelines for Response
• Released February 2013, this memo directs funding
agencies with an annual R&D budget over $100 million to
develop a public access plan for disseminating the results
of their research
• ICPSR stresses that standards and guidelines for many of
the requirements currently exist
• The slides to follow provide an overview of the access
plan elements including guidelines and resources on how
to respond to meet digital data requirements in the
memo
3. The OSTP Memo – A Review
• Released February 22, 2013
• A concern for investment: “Policies that mobilize these
publications and data for re-use through preservation and
broader public access also maximize the impact and
accountability of the Federal research investment.”
• Federal agencies with over $100 M annually in R&D
expenditures to develop plans to support increased public
access to the results of research funded by the Federal
Government
• Plans to contain eight points
4. The Eight Points of the Plan
1. Strategy for leveraging existing archives
2. Strategy to improve the public’s ability to locate and access digital
data
3. Approach to optimize search, archival, and dissemination features
that encourage innovation in accessibility & interoperability and
ensure long-term stewardship
4. A plan to notify awardees & researchers of their obligations
5. Strategy for measuring and enforcing compliance with the plan
6. Identification of resources within the existing agency budget to
implement plan
7. Timeline for implementation
8. Identification of special circumstances that prevent the agency from
meeting memo objectives
5. Data Portion of Memo - 13 Elements
• The portion of the memo describing objectives
for public access to data stresses 13 elements
for a public access plan
• The elements are also summarized online
within ICPSR’s Web site:
http://icpsr.umich.edu/content/datamanagement/ostp.html
6. Maximize Access
"Maximize access, by the general public and without charge, to digitally
formatted scientific data created with Federal funds“
• Increasing access to research data prevents the duplication of
effort, provides accountability and verification of research results, and
increases opportunities for innovation and collaboration.
• Finding and accessing data in repositories requires descriptive metadata
("data about data") in standard, machine-actionable form. Metadata
help search engines find data, and help researchers understand the
context of data collections.
• Standards already exist: see Data Documentation Initiative
– http://www.ddialliance.org/
7. Maximize Access cont.
• Access also involves knowing how to interpret the data. Incomplete data
limit reuse. Obsolete data formats can be unreadable.
– Repositories 'curate' or enhance data to make it complete, self-
explanatory, and usable for future researchers. This includes adding
descriptive labels, correcting coding errors, gathering documentation, and
standardizing the final versions of files. This is called “data curation.”
– Like museums that curate art or artifacts for study and understanding now
and in the future, data archives curate data with the same goals.
• Data curation is crucial to maximizing access. Resources for curating
data:
– ICPSR's Guide to Social Science Data Preparation and Archiving
– UK Data Archive's Managing and Sharing Data guide.
8. Protect Confidentiality and Privacy
• It is critically important to protect the
identities of research subjects.
• Disclosure risk is a term that is often used
for the possibility that a data record from a
study could be linked to a specific person.
• Concerns about disclosure risk have grown
as more datasets have become available
online, and it has become easier to link
research datasets with publicly available
external databases.
9. Protect Confidentiality and Privacy cont.
Protecting confidentiality of research subjects is not a viable
argument for not sharing data. Infrastructure, including virtual
and physical data enclaves, already exists:
• Restricted-Use Data are made available for research
purposes for use by investigators who agree to stringent
conditions for the use of the data and its physical
safekeeping.
• Enclave Data are those datasets which present especially
acute disclosure risks. They can be accessed only on-site in
ICPSR's physical data enclave in Ann Arbor. Investigators
must be approved. Their notes and analytic output are
reviewed by ICPSR staff.
10. Preserve Intellectual Property Rights
and Commercial Interests
Original research may be both
commercially valuable and proprietary.
There are several approaches to
managing these interests, including:
– Tailor copyright and patent licenses, such
as through Creative Commons licenses
– Establish an embargo period or delayed
dissemination on distribution.
11. Balance Demands of Long-term
Preservation and Access
• Preserving digital data requires much more than
storing files on a server, desktop, or in the cloud!
• Digital preservation is the active and ongoing
management of digital content to lengthen the
lifespan and mitigate against loss, including physical
deterioration, format obsolescence, and hardware and
software failure.
12. Balance Demands of Long-term
Preservation and Access cont.
• Not all data are worth preserving
indefinitely; less valuable or easily
producible data may be preserved for
shorter periods.
• Establish selection and appraisal guidelines
that make it clear what to save or discard.
– Selection criteria consider factors like
availability, confidentiality, copyright, quality, f
ile format, and financial commitment.
13. Use of Data Management Plans
• Data management plans describe how researchers
will provide for long-term preservation of, and
access to, scientific data in digital formats.
• Data management plans provide opportunities for
researchers to manage and curate their data more
actively from project inception to completion.
• See ICPSR's resource: Guidelines for Effective Data
Management Plans
14. Include Cost of Data Management in Funding
Proposals
• Data management services carry real costs, ranging from
personnel to storage to software.
• Maintenance costs are routinely built into physical
infrastructure development, so too should data management
costs be built into data development.
• Long-term access to data requires durable institutions that plan
on a scale of decades and even generations.
• Cost resources:
– DataONE's Provide budget information for your data
management plan
– UK Data Archive's Costing Tool: Data Management Planning.
15. Evaluate Data Management Plans &
Ensure Compliance
• Plans help researchers prepare for working with
and preserving data, repositories get ready to
accession and provide access, and agencies to
understand the community needs for archiving
and access. Evaluation helps refine plans so they
are realistic and attainable.
• If data management plans are to be a standard
component of funding applications, funding
recipients should be held accountable for
diversions from the originally stated plans.
16. Promote Public Deposit of Data
• Public deposit of data helps to ensure the long-term
accessibility and preservation of the data.
• It removes the burden of ongoing maintenance and care (and
user support) from the researcher and provides a stable system
to which data can be entrusted.
• Many sustainable online repositories are already available to
host and archive research data. These may include discipline-
specific repositories, archives administered by funding
agencies, or institutional repositories.
• Databib, a searchable directory of over 500 research data
repositories, can help locate relevant repositories by subject
area.
17. Private-sector Cooperation to Improve
Access
Encourage cooperation with the private sector
to improve data access and compatibility.
Issues to consider:
• What funding structures will be in place to ensure that both
organizations involved are benefiting from the partnership?
• Will the partnership require any rights to be transferred to the
private organization?
• How does private-sector cooperation affect
access restrictions and intellectual property
concerns?
18. Mechanisms for Identification &
Attribution of Data
• Properly citing data encourages the replication of
scientific results, improves research standards, guarantees
persistent reference, and gives proper credit to data
producers.
• Citing data is straightforward. Each citation must include
the basic elements that allow a unique dataset to be
identified over time: title, author, date, version, and
persistent identifier.
• Resources: ICPSR's Data Citations page , IASSIST's Quick
Guide to Data Citation, DataCite.
19. Data Stewardship Workforce Development
In coordination with other agencies and the private
sector, support training, education, and workforce
development related to scientific data
management, analysis, storage, preservation, and
stewardship. Recent data stewardship workforce
development in the United States has included:
• Digital Preservation Outreach and Education, from the Library of
Congress
• Digital Preservation Management tutorial, from Cornell
University, ICPSR, and MIT
• DigCCurr, from the University of North Carolina
20. Data Stewardship Workforce
Development cont.
ICPSR hosts data stewardship courses as part of
its Summer Program in Quantitative Methods of
Social Research. These include:
• Curating and Managing Research Data for Re-Use
• Assessing and Mitigating Disclosure Risk: Essentials for
Social Science
• Providing Social Science Data Services: Strategies for
Design and Operation
21. Long-term Support for Repository
Development
• ICPSR advocates long-term funding for specialized, long-
lived, trustworthy, and sustainable repositories that can
mediate between the needs of scientific disciplines and data
preservation requirements.
• As digital data management becomes an increasingly important
part of scientific research, funding agencies must contribute to
the developing ecosystem of services and technologies that
support access to and preservation of data.
• For more information, including various long-term funding
models, see ICPSR’s 2013 position paper – “The Price of
Keeping Knowledge”
25. ICPSR – a 50-Year History of Providing Access to
Research Data
Established in 1962, ICPSR maintains and shares
over 8,600 research datasets and hosts 16 public-
access specialized collections of data funded by
various government agencies and foundations. Our
mission:
ICPSR advances and expands social and behavioral
research, acting as a global leader in data
stewardship and providing rich data resources and
responsive educational opportunities for present
and future generations.
26. The Concept of Data Curation
• Curation, from the Latin "to care," is the process that ICPSR uses to add
value to data, maximize access, and ensure long-term preservation.
• Data curation is akin to work performed by an art or museum curator.
– Data are organized, described, cleaned, enhanced, and preserved for
public use, much like the work done on paintings or rare books to make
the works accessible to the public now and in the future.
• Through curation, ICPSR provides meaningful and enduring access to
data.
27. ICPSR’s Data Management & Curation Goals
• Quality - Data at ICSPR are
enhanced with meaningful
information to make it
complete, self-explanatory, and
usable for future researchers
• Access – Sought by over 730
member institutions an indexed by
all the major search engines, ICPSR
data are easily discoverable and
widely accessible to the public.
• Citation - By providing
standardized and well-recognized
data citations, ICPSR ensures that
data producers receive credit for
their archived data
• Preservation – For over 50
years, ICPSR has preserved its data
resources for the long-
term, guarding against
deterioration, accidental loss, and
digital obsolescence
• Confidentiality - Stringent
protections are in place for securing
and distributing sensitive data
• Educational Support –
ICPSR has a long tradition of
supporting training in quantitative
methods, scientific data
management, and resources for
instruction
28. Copies of these Slides & Use
• Feel free to share it; present
it; cite it!
• Find copies of these slides
on Slideshare.net
29. Get More information
• Visit ICPSR’s Data Management &
Curation site:
http://www.icpsr.umich.edu/datamanage
ment/index.jsp
• Contact us:
– netmail@icpsr.umich.edu
– (734) 647-2200
Notes de l'éditeur
Current archives/collections/repositories already meeting public access requirements regarding dataNACDA – NACJD – SAMHDA: examples of long term sustainabilityNAHDAP – SAMHDA – DSDR: examples of sharing of confidential dataNACJD – example of depository/researcher compliance (holding 10% of funding to PI)LGBT – MET: unique infrastructure and disseminationResearch Connections: reports and data dissemination; audiences including policymakers