Librarians can provide valuable data management services to researchers on campus. An effective strategy includes surveying researchers to identify needs, communicating service offerings through workshops and consultations, and providing in-depth guidance on data management plans and long-term data preservation. Developing workshops involves setting learning objectives, evaluating content, and securing resources like space and food. Consultations allow librarians to help with specific topics like choosing file formats or finding metadata standards. Creating a data management plan requires detailing a data inventory, metadata description, long-term preservation and access methods. Trusted disciplinary repositories and use of stable identifiers help ensure long-term findability and access.
2. My background
• Not technically a librarian
• 10 years of biomedical research experience
• HSL: Bioinformationist
• CSU: Data Management Specialist
• Now: cyberinfrastructure specialist CT Magle et al Infect Immun. 2014
82(2):618-25. doi: 10.1128/IAI.00444-13.
3. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
4. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
5. What PhD students learn
2 years of classwork
~5 years of bench work
Write a
dissertation +
research reports
Congrats Dr.!
7. Researchers are human
Data Management does not come naturally to most researchers.
+
Librarians are service oriented and good at organization
=
Data Management services in libraries
8. What is data
management?
The policies, practices and procedures needed to
manage the storage, access and preservation of data
produced from a research project
10. Why should researchers care about data management?
Rinehart, AK. “Getting emotional about data” College & Research Libraries News September 2015 vol. 76 no. 8 437-440
32. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
33. Where do I start?
•Who are your patrons?
•Which ones need DM services?
•Does anyone else provide these services?
35. Focus your efforts
Bioengineering
Biomedical Sciences
Biostatistics
Cancer Biology
Cell Biology, Stem Cells, and Development
Clinical Science
Computational Bioscience
Epidemiology
Health Services Research (CSPH collaborative)
Human Medical Genetics
Immunology
Integrated Physiology Program
Pathology
Pharmacology
Physiology and Biophysics
Medical Scientist Training Program
Microbiology
Molecular and Cellular Pharmacology
Molecular Biology
Neuroscience
Nursing
Pharmaceutical Sciences
Rehabilitation Science
Reproductive Sciences
Structural Biology and Biochemistry
Toxicology
Biochemistry and Molecular Genetics
Cell & Developmental Biology
Immunology/Microbiology
37. Your Campus survey
1. What units are on your campus?
2. Which ones produce research data?
3. Who already provides data management services?
38. Communication
• How do patrons find out about services?
1.Curated lists of department contacts
2.Listserves
3.Web presence
• Leverage existing library strategies
42. Advertising
1.Curated lists of department contacts
• Departments and Graduate programs
• Faculty and administrative staff
2.Data management listserve
• Populated from workshop attendees
3.Web presence
43. Start your communications strategy now!
• Make a list of 5 departments you want to contact
• Who is the department chair? Who is the administrative
contact?
• Can you get your own listserv? If yes, how?
• Where can information about DM services live on the web?
44. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
45. Workshop
• Set a regular time and date: Build in breaks
• Fill speaker slots: Doesn’t have to be you
• ADVERTISE: web, lists,
• Keep attendance records - communications
• Evaluate content: improve
46. ORDER FOOD
• Scientists are hungry.
• Will attend seminars for
food.
• (That’s why data and
donuts exists)
http://arcticdragonwolf97.deviantart.com/
47. Evaluate content
• Set learning objectives
• Survey after class
• Ask question about things you’re not sure of
55. Exercise: Workshops
• Does your library have space to hold workshops? Do you have
access?
• What times of year are patrons available for workshops
• How will you evaluate workshop content?
• What parts of holding a workshop are you the least sure about
56. Consultations*
• Librarians already do this for literature searches, endnote, etc.
• Choose topics you’re comfortable with
• Make sure people know about them through above strategies
• Create a clear mechanism to ask for help
*Some researchers assume consultations are fee-based. Make sure to tell them it’s free.
57. Exercises: Consultation
• Does your library already have a way to ask to consult with a
librarian?
• What topics do you feel comfortable (or will become
comfortable) giving consultations on?
• How will people find out about consultation services?
58. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
59. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
60. What is a data
management plan?
A description of how you plan to describe, preserve
and share your research data.
Often required by funding agencies
61. What is research data?
• “The recorded factual material
commonly accepted in the
scientific community as
necessary to validate research
findings”
- White House Office of
Management and Budget
• Reality: anything that is a
(digital) product or your
research
62. Successful DMPs include
• A data inventory, including type(s) and size
• A strategy for describing the data
• A plan for preserving the data long term
• A method for access to the data
Always make sure to follow funder requirements
63. DMPTool
• Review requirements from
different agencies
• https://dmptool.org/guidance
• Create new DMPs based on
funding agency templates
• Search public DMPs
64. Exercise: DMPs
• What funders do researchers use on your campus?
• What DMP requirements do they have?
• Can you find an example DMP in DMPTool?
65. Data inventory
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
66. Data inventory
miRNA sequences
FASTQ files
1 GB per file
x 64 strains
x 3 replicates
-------------------
~200 GB
R scripts for
analysis and
visualization
Data use tutorials
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
67. Data formats
• Avoid proprietary formats
• Know what software can read your data
Proprietary Format Alternative Format
Excel (.xls, .xlsx) Comma Separated Values (.csv)
Word (.doc, .docx) plain text (.txt)
PowerPoint (.ppt, .pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)
MPEG 4 Protected audio (.m4p) MP3 (.mp3)
68. Exercise: Data Inventory
What kind of data are you going to collect?
What file type will be produced?
What size will these files be? How many files?
What other research outputs will be produced?
69. Exercise: Data Inventory
• A researcher comes to you with an Excel file. What file format
would you recommend for data preservation?
• A researcher comes to you to figure out how much storage they
need for their data. They’re planning on producing image files
for 3 different types of cells. Each cell type will have 12 images,
and each file is about 50 megabytes each. How much storage
will they need minimum?
70. A strategy for describing the data
• Metadata: Relevant information
for re-creation and re-use
• Contact info
• How data was collected
• Details about collection
• Date, location of collection
• Units
• Can be as simple as a text file
71. Genomics example (README)
This project contains next-generation miRNA sequencing data from 64 mouse strains.
Brain tissue from 10 week old male mice were harvested, stored in RNA later. RNA was
extracted using an RNeasy kit, and miRNA libraries were produced using an Illumina kit.
They were run on an Illumina mySeq sequencer. The FASTQ Files produced were analyzed
in R using Bioconductor.
The data and descriptive will be made available on NCBI in the bioproject (PRJXXXX). The
scripts used to analyzed the data are available on github (URL). Tutorials for data use will
be made available in the Digital Collections of Colorado (handle).
Contact Tobin Magle (tobin.magle@colostate.edu) for more information.
http://orcid.org/0000-0003-3185-7034
72. Metadata standards
• Dublin Core: http://dublincore.org/documents/dcmi-terms/
• Can be applied to anything
• Many discipline specific metadata standards
• EML: https://knb.ecoinformatics.org/#external//emlparser/docs/index.html
• MIAME: http://fged.org/projects/miame/
• Search for other standards:
• http://www.dcc.ac.uk/resources/metadata-standards
• https://biosharing.org/standards/
74. Exercise: Describe your data
What do people need to know to reuse your data?
Are there any discipline-specific metadata standards?
What format will you describe your data in (text, XML, tabular)?
What fields will you include (author, date, format, identifier?)
75. Exercise: Metadata
• A researcher comes to you with a microarray dataset. What
type of metadata standard would you recommend?
• A researcher thinks there might be a metadata standard for their
type of data, but isn’t sure where to find it. Where would you
have them look for one?
• You help this researcher look for a standard, but there isn’t one.
How would you help them document their research?
76. A plan for preserving the data long term
• What will you do to ensure
data are properly stored and
preserved?
• Include metadata and other
products needed for reuse
• Might change over course of
the project
77. Preservation questions
• What will you store?
• Who will be in charge?
• How long will you store it?
• Where will you store it?
• Multiple copies
78. Recommendations for backing up data
• Store in geographically distinct
locations
• Automation: Will you remember to do it
manually?
• Security: Are you working with PHI?
79. Exercise: Preservation plan
What will you store?
Who will be responsible for the data (person or position)?
How long will you store it?
Where will you store it?
How will you back it up?
80. Exercise: Preservation
• Does your campus have data storage options? Who handles it?
Departments? The university? Individual researchers?
• How are these data storage solutions backed up?
• Do these storage solutions meet preservation best practices? If
not, how can they be improved?
81. A method to access the data
• Important to funding agencies
• Reproduce existing research
• Promote further research
• Must be easily available:
• No “by request only”
• Embargoes are “ok”
• Data security: consider privacy
and IP issues before sharing
82. Data access and sharing best practices
• Non-proprietary formats
• Include metadata
• Proper storage
• Stable identifier
• Licensing: conditions for reuse
83. Trusted Repositories: store and share
• Discipline specific repositories
• Search:
http://service.re3data.org/browse/by-
subject/
• Generic:
• Figshare - https://figshare.com/
• Dryad - http://datadryad.org/
• CSU Digital Repository:
• http://lib.colostate.edu/digital-collections/ http://67.media.tumblr.com/6228cbe58a9652f1a85e8a
b1ed08d715/tumblr_inline_n6oukhNlZW1qf11bs.png
84. Data archiving service
• Finished products for
sharing
• CSU Digital Repository
• Over 100 Datasets
• Satisfy requirements for
manuscripts and grants
• At no cost <1 TB
• $150/TB for 5 years
• $300/TB for >5 years
85. Stable identifiers
• URLs break
• Stable identifiers are
permanent in a database
• Some provide linking
capabilities
• DOI –
https://doi.org/10.1109/5.771073
• Handle-
http://hdl.handle.net/10217/177356
86. Licensing
• State your conditions for reuse
• Paper citation?
• Disclaimers
• Must justify limitations, describe
how you’ll advertise them
• Creative common licenses are a
good starting point
87. Exercise: Access methods
Where will people be able to access the data?
Does your discipline have a repository?
What kind of stable identifier will it have?
What are the conditions for reuse?
Are there any limitations to use of these data? Why?
88. Exercise: Data Access
• Does your institution have a digital repository?
• Do you currently accept datasets?
• If not, where could researchers put a dataset?
• A researcher wants to put his dataset in a discipline specific
repository. Where would you tell him to find one?
89. Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
90. Data preservation and sharing
• Preserve: File formats and storage
• Describe: Metadata standards and standard languages
• Share: FAIR principles and repositories
92. Digital data preservation
Short Term
• During the project
• Frequent changes
• Your responsibility
Long term
• After the project is over
• Little to no changes
• Can be outsourced
94. Data backup
• Make 3 copies
• Protects against natural
disasters
• Example
• Computer HD
• External hard drive
• Cloud
95. Data formats
• Avoid proprietary formats
• Use common data standards
in your field
• Find standards:
https://fairsharing.org/standar
ds/?q=&selected_facets=type
_exact:model/format
96. Proprietary formats and alternatives
Proprietary Format Alternative Format
Excel (.xlsx) Comma Separated Values (.csv)
Word (.docx) plain text (.txt) or PDF/A (.pdf)
PowerPoint (.pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)
MPEG 4 (.m4p) .mp3 (compressed)/.wav (uncompressed)
https://www.loc.gov/preservation/digital/formats/content/content_categories.shtml
97. Excel guidelines
• One table per sheet
• Sheet -> .csv
• No formatting
• No images
• No formulas
• Make your data tidy
98. Excel Archival Tool
• Input: Excel File
• Output (per tab):
• One .csv
• One .txt for formulas
• One HTML visualization
• https://github.com/mcgrory/
ExcelArchivalTool
99. Exercise 2: Formats
Think of one type of data that you produce:
• What format is it in?
• Is it proprietary? If so, what alternative format?
• Is it the most commonly used data format for that type of data in
your field?
101. Metadata
• Relevant information for
discovery, re-creation and re-use
• Descriptive – using data
• Discovery – finding data
http://library.umassmed.edu/necdmc/necdmc_module3.pptx
102. Descriptive Metadata
• Provides context: everything
you need to know to interpret
and reuse the data
• Examples
• Readme files
• Code books
http://library.umassmed.edu/necdmc/necdmc_module3.pptx
103. README files
• Describe the contents of
data files
• List software necessary to
interpret the data
• Unstructured format:
• (+) human readable
• (-) not machine readable
104. Codebooks
• Define the variables and
their units
• Explain the formats for
dates, time, geographic
coordinates
• Define any coded values
and missing values
105. Discipline specific metadata
• Specify pieces of information to include
• Specify format
• Not available in all fields
• Find standards
• http://www.dcc.ac.uk//resources/metadata-standards
• https://fairsharing.org/standards/?q=&selected_facets=type_exact:reporting%
20guideline
106. Exercise 3:
• Think of the data you chose to work with in Exercise 2
• What information would you include a README file? A
codebook?
• Is there a metadata standard you could use?
108. FAIR principles
• Findable: searchable and has a unique ID
• Accessible: in a repository
• Interoperable: Described in common standards for your field
• Reusable: Properly described and licensed for reuse
https://www.nature.com/articles/sdata201618
112. Controlled Vocabularies
• “Official” names for things
• Ontologies: include relationships
between terms
• Search for relevant ontologies:
https://fairsharing.org/standards/
?q=&selected_facets=type_exa
ct:terminology%20artifact https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/
113. Exercise 4:
• Think about the data from exercises
• Is there a standard ontology you could use to describe the
data?
115. Repositories (aka databases) provide
• A place to put (meta)data
• Unique IDs for each dataset
• A search interface
• Metadata requirements
116. Finding Research Data Repositories
FAIRSharing
https://fairsharing.org/database
s/
Registry of Research Data
Repositories
http://www.re3data.org/
117. CSU Digital Repository
• Over 100 Datasets
• Dublin core metadata
• Supports all* (meta)
data types
• At no cost <1 TB
• $150/TB for 5 years
• $300/TB for >5 years
*that we know of
118. CSU Repository Deposit Steps
• Contact Tobin! – self deposit is in the works
• Prepare:
• Data files
• Metadata (Readme.txt, codebooks, etc)
• Any related additional files (like a license)
• Deposit agreement:
• Declare you have right to deposit
• Give permission to repository to perform preservation and access procedures
• Upload
Module 7: Archiving & Preservation
119. Stable identifiers
• URLs break
• Stable identifiers are
permanent in a database
• Some provide linking
capabilities
• DOI –
https://doi.org/10.1109/5.771073
• Handle-
http://hdl.handle.net/10217/177356
121. Discovery Metadata
• Make your datable findable
• Metadata standards
• Defined by the repository
http://library.umassmed.edu/necdmc/necdmc_module3.pptx
122. Metadata standard: Dublin Core
• Can be applied to anything
• 15 core Elements:
• http://dublincore.org/documents/dc
es/
• CSU librarians help you write
this metadata
• Example:
http://hdl.handle.net/10217/1802
80
123. Exercise 5:
• How FAIR is your data?
• Findable?
• Accessible?
• Interoperable?
• Reusable?
124. Preservation topics not in DMPs
• Licensing
• Controlled vocabularies/Ontologies
• Discovery vs. descriptive metadata
125. Summary
• Most of the work for sharing is part of Preservation
- Think about it BEFORE you start your project (DMP)
- Do the work as you go
• Make preservation and sharing easier with a trusted
repository
• It’s complicated! Ask for help
126. Other resources to learn
• The Medical Library Association Guide to Data Management for
Librarians
• The DART project: https://osf.io/kh2y6/
• New England Collaborative Data Management Curriculum
• Data and donuts: http://libguides.colostate.edu/data-and-donuts