Contenu connexe Similaire à Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data (20) Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data1. An Example Workflow for Depositing to a Research Data Repository:
A Case Study for Archiving Publication Data
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro | JHU Data Management Services | Johns Hopkins University Sheridan Libraries | datamanagement@jhu.edu
Data
•Pilot project with two, graduating doctoral students
•Biomedical engineering field. Largely image data
•Data already published, which differs from our usual
service model of working with researchers at the
beginning of their project
JHU Data Archive
•Used alpha-release of Data Conservancy software [1]
•Discipline-agnostic and data as primary objects
•A collection of data may have an associated
metadata file, structured or unstructured
•Not yet publicly-accessible
Understanding Research
•Met with students for initial overview of research
•Read publications to map data products and activity
that created them
•As shown in Fig. 1, provided a framework to organize
data and ensure that all data were included (students
could not locate all their data)
Organizing Data
•Completed several in-depth meetings with students
•Created new folders and subfolders with students
present, and moved files to appropriate location
•Discussed data content, instrument(s) used, and file
naming conventions used, if any
•Experimented with directory structures based on
publication figures or research methods. Students
and advisor decided that organizing by figure was
more useful for data reuse
•Did not rename files due to time constraints and
lack of consistency in filenames
Packaging
•Used BagIt (v. 0.97) and TAR for packaging format
•Used MD5 checksums for data (payload) and tag files
•Created a documentation folder for our unstructured
metadata (Fig. 2), which we treated as a tag file and
not part of the payload
•One “bag” per publication
•Unsurprisingly, it is hard for researchers to recall information
about their data after a few years. This pilot project reinforced
the importance of working with scientists early in their
research, which is our usual service model.
•Due to time constraints and student recollection, our metadata
creation was limited to folder and file documentation (Fig. 2).
•Closely reading and mapping the students' research was central
to being able to ask them relevant questions about the data.
•The BagIt specification worked well for packaging.
Future Work
This pilot project began the process of formalizing our archiving
processes, but we have much more to do! The Data Conservancy
software will have improved functionality over the coming years,
which has implications for how we evolve the process for
archiving. For example, we currently cannot hide deposited data
in the JHU Data Archive; however, researchers may want to
transfer data to us before their project is complete and ready for
public access. We need to develop rigorous processes for
ensuring that we maintain the integrity of the data during the
often significant alterations required to archive datasets that are
useful to others.
Figure 1. Example of data flow diagram Figure 2. Example of unstructured metadata. Folder
and file documentation
Conclusions
[1] http://dataconservancy.org/software/Copyright © 2013, by JHU Data Management Services