E-Research Support at
Johns Hopkins University & Purdue University
Supplemental Webinar
Wednesday, October 17, 2012
Presented by Sayeed Choudhurry & James Mullins
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
ESI Supplemental 1 E-research Support Slides
1. DuraSpace/ARL/DLF
E-Science Institute
E-Research Support at
Johns Hopkins University & Purdue University
Supplemental Webinar
Wednesday, October 17, 2012
1:00-2:30 pm EDT
2. E-Research Support at
Johns Hopkins University
Presented by Sayeed Choudhurry,
Johns Hopkins University, Sheridan Libraries
Associate Dean for Library Digital Programs &
Director, Hodson Digital Research & Curation Center
DuraSpace/ARL/DLF E-Science Institute
3. Data Conservancy
• Data Conservancy (DC) is a community that
develops solutions for data preservation and
sharing to promote cross-disciplinary re-use.
• DC Service Instance: data centric hardware,
software, components, and APIs within an
organizational context – installed at Johns
Hopkins University and National Snow and Ice
Data Center
DuraSpace/ARL/DLF E-Science Institute
4. Data Sharing Attributes
• Feature Extraction Framework that atomizes data into
constituent parts for indexing, metadata extraction,
etc.
• Discipline agnostic data model (inspired by PLANETS
project)
• Provenance and Lineage service
• Spatial, temporal and (soon) taxonomic query
capabilities
• Sustainability through diverse funding from Johns
Hopkins University, direct charges to NSF grants, other
grants and community development
DuraSpace/ARL/DLF E-Science Institute
5. Data Management Layers
Layers Characteristics Implication for PI Implication relative
to NSF
Curation Adding value throughout • Feature Extraction • Competitive
life-cycle • New query advantage
capabilities • New
• Cross-disciplinary opportunities
Preservation Ensuring that data can • Ability to use own • Satisfies NSF
be fully used and data in the future needs across
interpreted (e.g. 5 yrs) directorates
• Data sharing
Archiving Data protection including • Provides identifiers • Could satisfy most
fixity, identifiers for sharing, NSF requirements
references, etc.
Storage Bits on disk, tape, cloud, • Responsible for: • Could be enough
etc. • Restore for now but not
Backup and restore • Sharing near-term future
• Staffing
6. Establishing the JHU DMS
• May 2010 NSF announces DMP expectations
• Services incubated and scoped summer/fall
2010
– Build on Data Conservancy expertise
• Proposed in January and launched in July 2011
– Consultative data management planning services
to support NSF proposals
– Post award data management services
• Assessment of service in March 2012
DuraSpace/ARL/DLF E-Science Institute
7. Background work to scope
services
• Review of data management plan best practices
and development of questionnaire
• Piloted data management consultations as cases
• Short data survey with over 70 JHU researchers
• Analysis of JHU NSF proposal and award activity
• Business school capstone project on storage
options and costs
• Review of past data archiving projects and work
DuraSpace/ARL/DLF E-Science Institute
8. Proposing data management
services
• Services scoped to support anticipated NSF
requirements and to reflect system capabilities
– Defined time limits, volume of data deposited per
project, unencumbered data only for now
• Prepared budget for services
– Five year timeframe for costs
– All costs included: staffing, hardware, overhead, etc.
– Cost assumptions included: total data archived,
complexity of data prep for ingest
DuraSpace/ARL/DLF E-Science Institute
9. Developing financial model
Support secured and financial model established
• Data management planning for NSF proposals
– Service directly funded by schools
– Each school pays percentage according to 3 year
average of total NSF proposals submitted
• Post award data management
– Fee based service billed through a service center
– First year fee a percent of total direct costs on
grant
DuraSpace/ARL/DLF E-Science Institute
10. JHU Data Management Services
team
Dedicated group (that collaborates with DC and
Digital Research and Curation Center)
• Two data management consultants
• Senior technical consultant (Part-time)
• Software developer
• System administrator (to be hired)
• Interim manager (Part-time)
DuraSpace/ARL/DLF E-Science Institute
11. Service marketing
• Reach out through all stakeholders
– Announcements through Deans
– Work with research projects administration
– Outreach to department administrators
– Briefings with library colleagues/departments
– Presentations to researchers, graduate students
• More to do….and then repeat!
DuraSpace/ARL/DLF E-Science Institute
12. Observations
• Role of Choudhury as NSF PI within JHU
• Sheridan Libraries R&D and experience with
scientific data
• Already embedded within research enterprise
• Specifics will vary by institution but JHU
approach can be generalized…
• …But each institution should consider
appropriate role(s) or approach
DuraSpace/ARL/DLF E-Science Institute
13. Resources
• http://dataconservancy.org
• Alpha release of software -
https://dataconservancy.org/software/download
s/
• http://dmp.data.jhu.edu
• Reviewer guidelines for data management plans -
http://dmp.data.jhu.edu/assistance/grant-
reviewers-worksheet-for-data-management-
plans/
14. Acknowledgements
• NSF Award OCI-0830976
• Sheridan Libraries financial support
• Johns Hopkins University financial support
• Data Conservancy colleagues for their exceptional
work and patience
DuraSpace/ARL/DLF E-Science Institute
15. Questions
DuraSpace/ARL/DLF E-Science Institute
16. On overview of Sustaining
e-Science Collaboration in
an Academic Research
Library – the Purdue
Experience
James L. Mullins, PhD
Dean of Libraries & Esther Ellis Norton Professor
October 17th, 2012
Libraries
17. What is meant when we say the library
has a role in sustaining e-science?
•Application of library and archival science principles and theory
to data management.
•Collaboration of Libraries with faculty, information technology,
research office, and sponsored programs to develop a process
and repository to manage and preserve data.
DuraSpace/ARL/DLF E-Science Institute
Libraries
18. I. Background and Development of the
Libraries collaboration with e-Science at
Purdue – on local and national levels.
• Local – Conversations with researchers, research office, etc.
• Local – Principles of library and archival sciences.
• Local – Restructuring of Libraries.
• National – NSF Data Management dialogue.
• Local – Creation of Data Research Scientist.
• Local – Librarians not able “to service ” funded research.
• Local – Librarians with professorial rank and tenure (start-up
package of $40,000+).
• Local – Distributed Data Curation Center (D2C2)
DuraSpace/ARL/DLF E-Science Institute
Libraries
19. I. Background and Development of the
Libraries involvement in e-Science at Purdue
– on local and national levels (con’t).
• National – IMLS grant to develop Data Curation Profiles.
• Local – Partnerships of subject liaison librarians and faculty.
• Local – Re-definition of librarian roles.
• Local – Collaboration/advising on data management librarian role.
• National – IMLS grant to develop Data Information Literacy.
• National– Develop/teach ICPSR data science curriculum.
• National – IMLS grant to develop Databib-DMPTool collaboration.
• International – DataCite-Databib collaboration.
• Local – Society of American Archivists (SAA) workshop.
DuraSpace/ARL/DLF E-Science Institute
Libraries
20. Sustainability – applied expertise of librarians.
• Must be integrated into role of librarians.
• New positions must be created (data curation specialists, etc).
• Priority for new positions must be established with a total
view of strategic growth areas (at Purdue data management
and information literacy).
• Salaries partially funded through sponsored research, making
funds available for other positions and graduate research
assistants.
• Cluster hires with colleges and schools.
• Critical role of librarians in research garners additional
support from University Administration.
DuraSpace/ARL/DLF E-Science Institute
Libraries
21. Research Collaborations 2012/2013
• Big Data and Complex System Analytics to Enhance Society's
Resilience w/ Agronomy
• Human Rights Texts for Digital Research: Archiving and
Analyzing Amnesty International’s Historic Urgent Action
Bulletins w/ Political Science
• A Cross-Disciplinary Design Thinking Research Symposium to
Catalyze Groundbreaking Research and Practice w/
Engineering Education
• Establishing a Materials Center for Agriculture, Food and
Health w/ Food Science
DuraSpace/ARL/DLF E-Science Institute
Libraries
22. Purdue University Research Repository –
PURR
II. Building a Data Curation Program and Repository
• Not done independently of librarians knowledge &
support structure within Libraries
• In 2006, collaboration built around Purdue’s HubZero
platform in answer to NSF DataNet RFP.
• 2007 – 2010 Provost informed of impending data
management mandate.
• May 2010 – NSF announcement.
• Summer 2010 – Provost and VPR appoint taskforce of
faculty researchers – co-chaired by CIO and dean of libraries
to develop “ template.” Report written August 2010.
DuraSpace/ARL/DLF E-Science Institute
Libraries
23. PURR
II. Building a Data Curation Program and Repository
(Con’t)
• 2010 –
• Commitment to develop repository jointly by ITaP,
OVPR, and Libraries - $90K
•Working Group created to plan and develop Purdue
University Research Repository (PURR).
• Workshops sponsored by OVPR, conducted by
Libraries and ITaP;
•Libraries create resources to support faculty in
developing DMPs.
DuraSpace/ARL/DLF E-Science Institute
Libraries
24. PURR
II. Building a Data Curation Program and Repository
(Con’t)
•2011/2013
• Libraries Budget request indicated need for
positions to support sustainable data curation.
• 479 grant proposals to date include PURR in
data management plans
• 36 grants (so far) awarded with PURR as DMP.
• TRAC certification underway – ISO 16363.
DuraSpace/ARL/DLF E-Science Institute
Libraries
25. PURR
II. Building a Data Curation Program and Repository
(Con’t)
• What is provided by PURR? Any Purdue faculty, graduate
student, or staff can:
• Create a trial project of 500 MB for three years.
• External funding project receives 100GB for ten years.
• Invite collaborators to join from other institutions.
• Datasets can be published w/o grant: 50MB; with, 10GB.
• Each project receives to-do lists to manage projects;
• Wiki area for notes;
• Micro-blogging interface (similar to Facebook) for
discussion among team.
DuraSpace/ARL/DLF E-Science Institute
Libraries
26. PURR
II. Building a Data Curation Program and Repository
(Con’t)
• PURR Digital Preservation Policy approved April, 2012
http://www.lib.purdue.edu/spcol/content/PURRdigitalpreser
vationpolicy.pdf
• Working Group report on three year funding requirements
• One time - $1.2 M – received January 2012.
• Ongoing costs - $194,000 / year.
• Ongoing costs: F&A? Charge Back?
DuraSpace/ARL/DLF E-Science Institute
Libraries
27. Data
PURR BB
Management
Discovery
Preservation
28. OVERVIEW OF PURR
Research Collaboration, Data Management
Publishing & Archiving
Researchers
Libraries
Data Services OVPR/SPS
(Reference & Policy, Submission, and
Consulting) & Grant Compliance
Preservation
ITaP
Infrastructure
(HUBzero™)
29. OVERVIEW OF PURR
• Collaboration of ITaP, Libraries, and OVPR
• Based on HUBzero, provides a hub for Purdue researchers
and their collaborators to use, manage, and share their
data
• Comprehensive resource for supporting research data
management (Knowledge Base, tutorials, example plans,
boilerplate text, ask questions, etc.)
• Approximately 1/3 of NSF proposals submitted from
Purdue last year included PURR as a component of their
data management plans
• Purdue researchers are not required to use PURR. Other
options may be appropriate such as center facilities or
disciplinary repositories.
30. WHY USE PURR ?
PURR can be used for…
Managing Data
Publishing Data
Preserving Data and
Research Collaboration
31. QUICK START
http://research.hub.purdue.edu
What can be done right now:
– Create an account
– Create a project
• a default allocation of storage for free and can purchase
more if you need it
– Invite collaborators
– Upload data to project
– Publish and/or archive datasets with Digital Object
Identifiers (DOI)
– Search, browse, and cite published datasets
32. Overview model
PURR FUNCTIONS
of PURR functions
STEP 1
Discovery
Data mgmt Creating IF grant for
Data commitment ends,
planning projects, awarded, publishing
submitted Long term
resources collaborating more space /archiving
preservation decision
Create Research, data generation/collection Uncurated data Curation Discovery & Dissemination
Long term preservation
Researchers are guided to PURR for help with
data mgmt plans by Pre-Awards, workshops
and promotion, and by word-of-mouth
33. PURR FUNCTIONS
STEP 2
Discovery
Data mgmt Creating IF grant for
Data commitment ends,
planning projects, awarded, publishing
submitted Long term
resources collaborating more space /archiving
preservation decision
PLAN DEVELOP PROJECT EXPAND PUBLISH DATA DISSEMINATE DATA
Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination
Long term preservation
Researchers can create projects at any
time, invite others to join… the goal
is to help facilitate research development
34. Overview model of
PURR FUNCTIONS
PURRfunctions
STEP 3
Discovery
Data mgmt Creating IF grant for
Data commitment ends,
planning projects, awarded, publishing
submitted Long term
resources collaborating more space /archiving
preservation decision
Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination
Long term preservation
Once a grant is awarded,
researchers
get an increase in
space allocation and
length of time for project and data
35. PURR FUNCTIONS
STEP 4
Discovery
Data mgmt Creating IF grant for
Data commitment ends,
planning projects, awarded, publishing
submitted Long term
resources collaborating more space /archiving
preservation decision
Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination
Long term preservation
To make data sets publicly discoverable
and available, there is a submission
and “publishing” process
36. PURR FUNCTIONS
STEP 5
Discovery
Data mgmt Creating IF grant for
Data commitment ends,
planning projects, awarded, publishing
submitted Long term
resources collaborating more space /archiving
preservation decision
Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination
Long term preservation
PURR policy allows for a specified time
for discovery, and then decisions are
made regarding long-term preservation
37. WHERE CAN I GO FOR HELP ?
Overall help: Librarians
(link to subject librarians directory or name)
Data Services: http://www.lib.purdue.edu/research/dataservices
Librarians consult on best practices for data formats,
metadata, sharing, reuse, archiving, review plans,
write letters of support, and collaborate as partners/
co-PI’s on proposals.
Grant preparation: Sponsored Programs Services (SPS)
PURR Website: http://research.hub.purdue.edu
38. Retrieval and Citation
•Establish easier access to scientific research data on the
Internet.
•Increase acceptance of research data as legitimate,
citable contributions to the scientific record.
•Support data archiving that will permit results to be
verified and re-purposed for future study.
http://www.datacite.org/ Libraries
39. Linking of Dataset to Article
The DOI system offers an easy way to connect the article with the underlying data:
The dataset:
Kuhlmann, H et al. (2009):
Age models, iron intensity, magnetic susceptibility records and dry bulk
density of sediment cores from around the Canary Islands.
doi:10.1594/PANGAEA.727522,
Is supplement to the article:
Kuhlmann, Holger; Freudenthal, Tim; Helmke, Peer; Meggers, Helge
(2004): Reconstruction of paleoceanography off NW Africa during
the last 40,000 years: influence of local and regional factors on
sediment accumulation. Marine Geology, 207(1-4), 209-224,
doi:10.1016/j.margeo.2004.03.017
DuraSpace/ARL/DLF E-Science Institute
Libraries
40. Retrieval and Citation
•Establish easier access to scientific research data on the
Internet.
•Increase acceptance of research data as legitimate,
citable contributions to the scientific record.
•Support data archiving that will permit results to be
verified and re-purposed for future study.
http://www.datacite.org/ Libraries
41. In the United States three
DataCite Members Provide
DOIs for datasets: http://datacite.org/DataCiteUS
Libraries
Libraries
42. No one/right way to sustain
e-science or data
management; each
institutional environment
will be different and require
its own unique
collaborations or roles.
DuraSpace/ARL/DLF E-Science Institute
Libraries
First we have the goal of making sure that PURR can address the needs of data mgmt planning, and we have populated the site with a variety of tools and resources to help researchers do that—we emphasize this is necessary for all projects, not just NSF ones
One of the strengths of PURR is built into the HUBzero® platform, the ability to engage in collaborative work—Purdue researchers can create projects, they can use them to stage a grant, they can invite others to participate, create or add files, etc.
A key part of PURR is the incentive to submit projects for funding (as universities kind of like getting funding), and we are automating a process whereby once a grant is awarded, researchers automatically acquire more space—the chart may be hard to read but is on the website if you’re curious
As a goal is make research more transparent and data more available, PURR allows for what we are currently calling “publishing” and archiving data sets and collections—notice that it has to be submitted and reviewed first, and we are building a service in which librarians can approve submissions much like we do for our IR, after a check that metadata is added to ensure adequate descriptions for discovery and preservation. We believe that as data sets become collections, libraries will need to apply collection development principles to manage them
We continue to work on the longevity of support for these collections… currently we see them supported on the discovery platform for 10 years, at which time a selection or de-selection decision can be made, which is why we believe librarians must be involved early on, so they can make collection mgmt decisions down the road. We have just received funding to work on this and the preservation environment that will support long term preservation