The document discusses the importance of data curation for scientific progress and integrity. It outlines the library's role in connecting researchers to content and supporting the research lifecycle. Barriers to data sharing like poor discovery, unfamiliar processes, and loss of control can be addressed through tools and services that provide identifiers, metadata, data use agreements, and data management planning guidance. Embedding best practices into existing researcher tools and workflows can help promote data curation.
Axa Assurance Maroc - Insurer Innovation Award 2024
Future of Scientific Publishing: Open Access to Manuscripts and Big Data
1. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Libraries and Research Data Curation
Barriers and Incentives for Preservation, Sharing, and Reuse
Stephen Abrams
University of California Curation Center
California Digital Library
www.cdlib.org/uc3
2. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Why is data curation important?
Accelerating scientific progress
Enabling appropriate scrutiny and verification of results
Promoting integrity and debate
Facilitating new collaborations
Avoiding needless duplication of effort
Increasingly, complying with institutional policies, publication
requirements, and funder mandates
Cf. White and Teds (2011), “Making the case for research data management” DCC briefing
paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
3. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
The library’s role
A continuation of its long-standing mission and practice to
connect patrons with content of interest in meaningful ways
across barriers of space and time
Cf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th
IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf
Offering solutions that enhance the natural points of
alignment between the scholarly research and information
lifecycles
Publish
Reuse
ShareCreate
Discover
Collect
PreserveAccessResearchResearch CurationCuration
Scholarly lifecycle Information lifecycle
4. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Addressing barriers to adoption
Critical issues on both the demand…
Poor discovery
and supply side …
Unfamiliar processes
Loss of control
Inadequate guidance
Cf. Schäfer et al. (2011), Baseline Report on Drivers and Barriers in Data Sharing, hdl:10013/epic.39262
Better access to tools and resources
Embedded best practices
Data use agreements
Data management planning
Data publication and citation
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org
5. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data publication and citation
Provide the same infrastructural support for data that exists
for traditional publications
Unique, actionable identifiers
Stable citation
Bi-directional references between publications and the data that
underlay their analysis, synthesis, and summarization
Discovery via disciplinary portals, catalogs, and web searches
Use and impact metrics
www.flickr.com/photos/fotobib/5555065521 www.flickr.com/photos/minhmeoinfo/4597866532
6. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data publication and citation
Provide the same infrastructural support for data that exists
for traditional publications
http://n2t.net/ezid
ARK and DOI identifiers
Descriptive metadata
Resolution targets
Aggregation by DataCite
(and soon) Primo and Web of Knowledge
7. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
www.flickr.com/photos/vixon/116447718
8. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
It’s easier to augment systems than change behaviors
Embed curation best practices into tools and workflows already
used by researchers
www.flickr.com/photos/34067077@N00/4576265327 www.flickr.com/photos/wealthofhealth4/6919840647
9. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
http://dataup.cdlib.org/
2013 Innovation Award winner
10. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
http://n2t.net/ark:/90135/q13j39xf
2013 Innovation Award winner
11. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
DataONE federation
http://dataone.org/
http://cn.dataone.org/onemercury
2013 Innovation Award winner
12. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
DataONE federation
http://dataone.org/
So you don’t need to know …
Metadata schema
XML syntax
Identifier registration
Packaging standards
Submission protocol
Aggregation/harvesting
mechanism
2013 Innovation Award winner
13. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/
14. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/
15. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
From: no-reply-merritt@ucop.edu
Subject:Merritt DUA acceptance
Name: Stephen Abrams
Affiliation: California Digital Library
Collection: UCSF DataShare
Object: Frontotemporal Lobar Degeneration (FTLD)
Date: 2013-05-3109:50:34PDT
Terms of use: As part of this agreement, Consumer submits to the following
statements:
(1) I will receive access to de-identified data and will not attempt to establish the
identity of any of the study subjects.
(2) I will share these data only with my immediate co-workers, and I will not transfer
these data to other research groups. I understand that these data are available to
other research groups through the process by which I obtain them.
(3) I will require anyone in my group who utilizes these data, or anyone with whom I
share these data to comply with this data use agreement
...
http://datashare.ucsf.edu/
16. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
Next steps …
Disciplinary survey of current DUA practice
Collaborate with Creative Commons to establish “model” DUAs
17. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data management planning
Researchers are being asked to plan for data curation by
institutional policy and as a pre-condition for publication and
grant funding
Cf. Office of Science and Technology Policy (2013), Increasing Access to the Results of Federally Funded
Scientific Research, www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_
memo_2013.pdf
18. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data management planning
provides guidance and resources for managing plans
Edit, publish, and share DMPs
Customizable for funding agency requirements
Customizable for general, disciplinary, and institutional resources
19 requirement templates
43 resource sets
Next steps …
DMPTool2: Follow-on
development –
Sloan Foundation
Outreach and
training – IMLS
http://dmptool.org/
http://blog.dmptool.org/
19. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Removing barriers, providing incentives
“Access to and sharing of data are essential for the conduct
and advancement of science”
— Arzberger et al. (2004), “Promoting access to public research data for
scientific, economic, and social development,” Data Science Journal 3: 135-
52, doi:10.2481/dsj.3.135
Libraries are a natural partner for the research community
Deep and broad experience in the curation, preservation, and
dissemination of digital assets
Subject area specialization in
science, technology, engineering, and mathematics
Collaborations with campus IT groups and data centers
20. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Removing barriers, providing incentives
Libraries are a natural partner for the research community
Effective discovery through … Data publication and citation
Maintain control through … Data use agreements
Familiar processes through … Embedded best practices
Guidance and resources through …Data management planning
www.slideshare.net/UC3/uc3-librariesandcurationbarriersandincentives
www.cdlib.org/uc3
uc3@ucop.edu
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org
Barry Egan, File rio 2006, http://www.flickr.com/photos/vixon/116447718
Wealth of Health, Nanomedicinescientifist working at the laboratory, http://www.flickr.com/photos/wealthofhealth4/6919840647Martin Caltrane, Work desk, http://www.flickr.com/photos/34067077@N00/4576265327