SlideShare une entreprise Scribd logo
1  sur  126
Data Management
for librarians
C. Tobin Magle, PhD
Cyberinfrastructure facilitator
University Libraries
Colorado State University
My background
• Not technically a librarian
• 10 years of biomedical research experience
• HSL: Bioinformationist
• CSU: Data Management Specialist
• Now: cyberinfrastructure specialist CT Magle et al Infect Immun. 2014
82(2):618-25. doi: 10.1128/IAI.00444-13.
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
What PhD students learn
2 years of classwork
~5 years of bench work
Write a
dissertation +
research reports
Congrats Dr.!
What professors do
Researchers are human
Data Management does not come naturally to most researchers.
+
Librarians are service oriented and good at organization
=
Data Management services in libraries
What is data
management?
The policies, practices and procedures needed to
manage the storage, access and preservation of data
produced from a research project
data management != data sharing
• but the same principles apply to both
Why should researchers care about data management?
Rinehart, AK. “Getting emotional about data” College & Research Libraries News September 2015 vol. 76 no. 8 437-440
*ok not everything, but most things
More researchers
https://www.nsf.gov/statistics/2016/nsf16300/digest/nsf16300.pdf
See arXiv:1402.4578 for details
Working Email
Data are extant
(If status known)
Status of data
(if response)
Response
(if email
working)
doi:10.1016/j.cub.2013.11.014
We are losing vast amounts of data
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
11
1
1
1
1
1
1
1
0
0
0
0
0
0
0
00
0
00 0
1
1
1 1
1
0
Research funding is tight
http://www.bu.edu/research/articles/funding-for-scientific-research/
Federal agencies advocate OA
https://obamawhitehouse.archives.gov/blog/2017/01/09/making-federal-research-results-
available-all
Private funders require sharing
http://www.gatesfoundation.org/how-we-work/general-information/open-access-policy
It’s good for science
• Improves research reproducibility
• Improves efficiency
• Spurs innovation
It’s good for researchers
• “You are the future data user”
• Data gets used (and cited)
• Exposure to collaborators
• More competitive grants
Where does data management
fit into research?
Throughout the whole research cycle
Hypothesis
The research cycle
Hypothesis
Experimental
design
The research cycle
Hypothesis Data
Experimental
design
The research cycle
Hypothesis Data
Experimental
design
Results
The research cycle
Hypothesis Data
Experimental
design
ResultsArticle
The research cycle
Hypothesis Data
Experimental
design
ResultsArticle
The research cycle
Hypothesis Data
Experimental
design
ResultsArticle
Data
Management
Plans
The research cycle
Hypothesis
Raw
data
Experimental
design
ResultsArticle
Data
Management
Plans
Archived
Data
Preservation
The research cycle
Hypothesis
Raw
data
Experimental
design
ResultsArticle
Data
Management
Plans
Sharing
Open Data
Archived
Data
Preservation
The research cycle
Hypothesis
Raw
data
Experimental
design
ResultsArticle
Data
Management
Plans
Sharing
Open Data
Archived
Data
Preservation
The research cycle
Reuse
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
Where do I start?
•Who are your patrons?
•Which ones need DM services?
•Does anyone else provide these services?
Who are your patrons?
Affiliates
Partners
Focus your efforts
Bioengineering
Biomedical Sciences
Biostatistics
Cancer Biology
Cell Biology, Stem Cells, and Development
Clinical Science
Computational Bioscience
Epidemiology
Health Services Research (CSPH collaborative)
Human Medical Genetics
Immunology
Integrated Physiology Program
Pathology
Pharmacology
Physiology and Biophysics
Medical Scientist Training Program
Microbiology
Molecular and Cellular Pharmacology
Molecular Biology
Neuroscience
Nursing
Pharmaceutical Sciences
Rehabilitation Science
Reproductive Sciences
Structural Biology and Biochemistry
Toxicology
Biochemistry and Molecular Genetics
Cell & Developmental Biology
Immunology/Microbiology
Who already provides services?
Your Campus survey
1. What units are on your campus?
2. Which ones produce research data?
3. Who already provides data management services?
Communication
• How do patrons find out about services?
1.Curated lists of department contacts
2.Listserves
3.Web presence
• Leverage existing library strategies
Web presence
https://hslibrary.ucdenver.edu/node/4497
Web presence
https://lib.colostate.edu/services/data-management/
Research Events Calendar
Advertising
1.Curated lists of department contacts
• Departments and Graduate programs
• Faculty and administrative staff
2.Data management listserve
• Populated from workshop attendees
3.Web presence
Start your communications strategy now!
• Make a list of 5 departments you want to contact
• Who is the department chair? Who is the administrative
contact?
• Can you get your own listserv? If yes, how?
• Where can information about DM services live on the web?
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
Workshop
• Set a regular time and date: Build in breaks
• Fill speaker slots: Doesn’t have to be you
• ADVERTISE: web, lists,
• Keep attendance records - communications
• Evaluate content: improve
ORDER FOOD
• Scientists are hungry.
• Will attend seminars for
food.
• (That’s why data and
donuts exists)
http://arcticdragonwolf97.deviantart.com/
Evaluate content
• Set learning objectives
• Survey after class
• Ask question about things you’re not sure of
Ask about learning objectives
Learning Objectives Results
As about instructor
Instructor Results
As about content
Content results
Free text comments
Exercise: Workshops
• Does your library have space to hold workshops? Do you have
access?
• What times of year are patrons available for workshops
• How will you evaluate workshop content?
• What parts of holding a workshop are you the least sure about
Consultations*
• Librarians already do this for literature searches, endnote, etc.
• Choose topics you’re comfortable with
• Make sure people know about them through above strategies
• Create a clear mechanism to ask for help
*Some researchers assume consultations are fee-based. Make sure to tell them it’s free.
Exercises: Consultation
• Does your library already have a way to ask to consult with a
librarian?
• What topics do you feel comfortable (or will become
comfortable) giving consultations on?
• How will people find out about consultation services?
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
What is a data
management plan?
A description of how you plan to describe, preserve
and share your research data.
Often required by funding agencies
What is research data?
• “The recorded factual material
commonly accepted in the
scientific community as
necessary to validate research
findings”
- White House Office of
Management and Budget
• Reality: anything that is a
(digital) product or your
research
Successful DMPs include
• A data inventory, including type(s) and size
• A strategy for describing the data
• A plan for preserving the data long term
• A method for access to the data
Always make sure to follow funder requirements
DMPTool
• Review requirements from
different agencies
• https://dmptool.org/guidance
• Create new DMPs based on
funding agency templates
• Search public DMPs
Exercise: DMPs
• What funders do researchers use on your campus?
• What DMP requirements do they have?
• Can you find an example DMP in DMPTool?
Data inventory
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
Data inventory
miRNA sequences
FASTQ files
1 GB per file
x 64 strains
x 3 replicates
-------------------
~200 GB
R scripts for
analysis and
visualization
Data use tutorials
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
Data formats
• Avoid proprietary formats
• Know what software can read your data
Proprietary Format Alternative Format
Excel (.xls, .xlsx) Comma Separated Values (.csv)
Word (.doc, .docx) plain text (.txt)
PowerPoint (.ppt, .pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)
MPEG 4 Protected audio (.m4p) MP3 (.mp3)
Exercise: Data Inventory
What kind of data are you going to collect?
What file type will be produced?
What size will these files be? How many files?
What other research outputs will be produced?
Exercise: Data Inventory
• A researcher comes to you with an Excel file. What file format
would you recommend for data preservation?
• A researcher comes to you to figure out how much storage they
need for their data. They’re planning on producing image files
for 3 different types of cells. Each cell type will have 12 images,
and each file is about 50 megabytes each. How much storage
will they need minimum?
A strategy for describing the data
• Metadata: Relevant information
for re-creation and re-use
• Contact info
• How data was collected
• Details about collection
• Date, location of collection
• Units
• Can be as simple as a text file
Genomics example (README)
This project contains next-generation miRNA sequencing data from 64 mouse strains.
Brain tissue from 10 week old male mice were harvested, stored in RNA later. RNA was
extracted using an RNeasy kit, and miRNA libraries were produced using an Illumina kit.
They were run on an Illumina mySeq sequencer. The FASTQ Files produced were analyzed
in R using Bioconductor.
The data and descriptive will be made available on NCBI in the bioproject (PRJXXXX). The
scripts used to analyzed the data are available on github (URL). Tutorials for data use will
be made available in the Digital Collections of Colorado (handle).
Contact Tobin Magle (tobin.magle@colostate.edu) for more information.
http://orcid.org/0000-0003-3185-7034
Metadata standards
• Dublin Core: http://dublincore.org/documents/dcmi-terms/
• Can be applied to anything
• Many discipline specific metadata standards
• EML: https://knb.ecoinformatics.org/#external//emlparser/docs/index.html
• MIAME: http://fged.org/projects/miame/
• Search for other standards:
• http://www.dcc.ac.uk/resources/metadata-standards
• https://biosharing.org/standards/
Genomics example (NCBI template)
Exercise: Describe your data
What do people need to know to reuse your data?
Are there any discipline-specific metadata standards?
What format will you describe your data in (text, XML, tabular)?
What fields will you include (author, date, format, identifier?)
Exercise: Metadata
• A researcher comes to you with a microarray dataset. What
type of metadata standard would you recommend?
• A researcher thinks there might be a metadata standard for their
type of data, but isn’t sure where to find it. Where would you
have them look for one?
• You help this researcher look for a standard, but there isn’t one.
How would you help them document their research?
A plan for preserving the data long term
• What will you do to ensure
data are properly stored and
preserved?
• Include metadata and other
products needed for reuse
• Might change over course of
the project
Preservation questions
• What will you store?
• Who will be in charge?
• How long will you store it?
• Where will you store it?
• Multiple copies
Recommendations for backing up data
• Store in geographically distinct
locations
• Automation: Will you remember to do it
manually?
• Security: Are you working with PHI?
Exercise: Preservation plan
What will you store?
Who will be responsible for the data (person or position)?
How long will you store it?
Where will you store it?
How will you back it up?
Exercise: Preservation
• Does your campus have data storage options? Who handles it?
Departments? The university? Individual researchers?
• How are these data storage solutions backed up?
• Do these storage solutions meet preservation best practices? If
not, how can they be improved?
A method to access the data
• Important to funding agencies
• Reproduce existing research
• Promote further research
• Must be easily available:
• No “by request only”
• Embargoes are “ok”
• Data security: consider privacy
and IP issues before sharing
Data access and sharing best practices
• Non-proprietary formats
• Include metadata
• Proper storage
• Stable identifier
• Licensing: conditions for reuse
Trusted Repositories: store and share
• Discipline specific repositories
• Search:
http://service.re3data.org/browse/by-
subject/
• Generic:
• Figshare - https://figshare.com/
• Dryad - http://datadryad.org/
• CSU Digital Repository:
• http://lib.colostate.edu/digital-collections/ http://67.media.tumblr.com/6228cbe58a9652f1a85e8a
b1ed08d715/tumblr_inline_n6oukhNlZW1qf11bs.png
Data archiving service
• Finished products for
sharing
• CSU Digital Repository
• Over 100 Datasets
• Satisfy requirements for
manuscripts and grants
• At no cost <1 TB
• $150/TB for 5 years
• $300/TB for >5 years
Stable identifiers
• URLs break
• Stable identifiers are
permanent in a database
• Some provide linking
capabilities
• DOI –
https://doi.org/10.1109/5.771073
• Handle-
http://hdl.handle.net/10217/177356
Licensing
• State your conditions for reuse
• Paper citation?
• Disclaimers
• Must justify limitations, describe
how you’ll advertise them
• Creative common licenses are a
good starting point
Exercise: Access methods
Where will people be able to access the data?
Does your discipline have a repository?
What kind of stable identifier will it have?
What are the conditions for reuse?
Are there any limitations to use of these data? Why?
Exercise: Data Access
• Does your institution have a digital repository?
• Do you currently accept datasets?
• If not, where could researchers put a dataset?
• A researcher wants to put his dataset in a discipline specific
repository. Where would you tell him to find one?
Outline
• Why data management services?
• Making a plan: campus survey and communication strategy
• Services to provide: workshops and consultations
• Deep dive into two topics
• Data management plans
• Data preservation and sharing
Data preservation and sharing
• Preserve: File formats and storage
• Describe: Metadata standards and standard languages
• Share: FAIR principles and repositories
Preserve
Backup, archival formats, description
Digital data preservation
Short Term
• During the project
• Frequent changes
• Your responsibility
Long term
• After the project is over
• Little to no changes
• Can be outsourced
Preservation best practices
• Back up your data!
• Save in archival formats
• Include metadata
Data backup
• Make 3 copies
• Protects against natural
disasters
• Example
• Computer HD
• External hard drive
• Cloud
Data formats
• Avoid proprietary formats
• Use common data standards
in your field
• Find standards:
https://fairsharing.org/standar
ds/?q=&selected_facets=type
_exact:model/format
Proprietary formats and alternatives
Proprietary Format Alternative Format
Excel (.xlsx) Comma Separated Values (.csv)
Word (.docx) plain text (.txt) or PDF/A (.pdf)
PowerPoint (.pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)
MPEG 4 (.m4p) .mp3 (compressed)/.wav (uncompressed)
https://www.loc.gov/preservation/digital/formats/content/content_categories.shtml
Excel guidelines
• One table per sheet
• Sheet -> .csv
• No formatting
• No images
• No formulas
• Make your data tidy
Excel Archival Tool
• Input: Excel File
• Output (per tab):
• One .csv
• One .txt for formulas
• One HTML visualization
• https://github.com/mcgrory/
ExcelArchivalTool
Exercise 2: Formats
Think of one type of data that you produce:
• What format is it in?
• Is it proprietary? If so, what alternative format?
• Is it the most commonly used data format for that type of data in
your field?
Describe
Metadata (README files, codebooks, Metadata standards)
Metadata
• Relevant information for
discovery, re-creation and re-use
• Descriptive – using data
• Discovery – finding data
http://library.umassmed.edu/necdmc/necdmc_module3.pptx
Descriptive Metadata
• Provides context: everything
you need to know to interpret
and reuse the data
• Examples
• Readme files
• Code books
http://library.umassmed.edu/necdmc/necdmc_module3.pptx
README files
• Describe the contents of
data files
• List software necessary to
interpret the data
• Unstructured format:
• (+) human readable
• (-) not machine readable
Codebooks
• Define the variables and
their units
• Explain the formats for
dates, time, geographic
coordinates
• Define any coded values
and missing values
Discipline specific metadata
• Specify pieces of information to include
• Specify format
• Not available in all fields
• Find standards
• http://www.dcc.ac.uk//resources/metadata-standards
• https://fairsharing.org/standards/?q=&selected_facets=type_exact:reporting%
20guideline
Exercise 3:
• Think of the data you chose to work with in Exercise 2
• What information would you include a README file? A
codebook?
• Is there a metadata standard you could use?
Share
FAIR principles, repositories, discovery metadata
FAIR principles
• Findable: searchable and has a unique ID
• Accessible: in a repository
• Interoperable: Described in common standards for your field
• Reusable: Properly described and licensed for reuse
https://www.nature.com/articles/sdata201618
Reusable
Licenses
File formatsMetadata
Descriptive
Metadata
Codebooks
Readme
Community
Standards
Non-proprietary
Licensing
• State your conditions for reuse
• Citation?
• Creative common licenses are a
good starting point
• CC-0 for data
Interoperable Reusable
Controlled
Vocabularies Licenses
File formatsMetadata
Descriptive
Metadata
Codebooks
Metadata
standards
Readme
Community
Standards
Non-proprietary
Controlled Vocabularies
• “Official” names for things
• Ontologies: include relationships
between terms
• Search for relevant ontologies:
https://fairsharing.org/standards/
?q=&selected_facets=type_exa
ct:terminology%20artifact https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/
Exercise 4:
• Think about the data from exercises
• Is there a standard ontology you could use to describe the
data?
Accessible Interoperable Reusable
Controlled
Vocabularies
Repositories
Licenses
File formats
Persistent
identifiers
Metadata
Descriptive
Metadata
Codebooks
Web
interface
Metadata
standards
Readme
Community
Standards
Non-proprietary
Repositories (aka databases) provide
• A place to put (meta)data
• Unique IDs for each dataset
• A search interface
• Metadata requirements
Finding Research Data Repositories
FAIRSharing
https://fairsharing.org/database
s/
Registry of Research Data
Repositories
http://www.re3data.org/
CSU Digital Repository
• Over 100 Datasets
• Dublin core metadata
• Supports all* (meta)
data types
• At no cost <1 TB
• $150/TB for 5 years
• $300/TB for >5 years
*that we know of
CSU Repository Deposit Steps
• Contact Tobin! – self deposit is in the works
• Prepare:
• Data files
• Metadata (Readme.txt, codebooks, etc)
• Any related additional files (like a license)
• Deposit agreement:
• Declare you have right to deposit
• Give permission to repository to perform preservation and access procedures
• Upload
Module 7: Archiving & Preservation
Stable identifiers
• URLs break
• Stable identifiers are
permanent in a database
• Some provide linking
capabilities
• DOI –
https://doi.org/10.1109/5.771073
• Handle-
http://hdl.handle.net/10217/177356
Findable Accessible Interoperable Reusable
Controlled
Vocabularies
Repositories
Licenses
File formats
Persistent
identifiers
Metadata
Descriptive
Metadata
Discovery
Metadata
Codebooks
Search
Interface
Web
interface
Metadata
standards
Readme
Community
Standards
Non-proprietary
Discovery Metadata
• Make your datable findable
• Metadata standards
• Defined by the repository
http://library.umassmed.edu/necdmc/necdmc_module3.pptx
Metadata standard: Dublin Core
• Can be applied to anything
• 15 core Elements:
• http://dublincore.org/documents/dc
es/
• CSU librarians help you write
this metadata
• Example:
http://hdl.handle.net/10217/1802
80
Exercise 5:
• How FAIR is your data?
• Findable?
• Accessible?
• Interoperable?
• Reusable?
Preservation topics not in DMPs
• Licensing
• Controlled vocabularies/Ontologies
• Discovery vs. descriptive metadata
Summary
• Most of the work for sharing is part of Preservation
- Think about it BEFORE you start your project (DMP)
- Do the work as you go
• Make preservation and sharing easier with a trusted
repository
• It’s complicated! Ask for help
Other resources to learn
• The Medical Library Association Guide to Data Management for
Librarians
• The DART project: https://osf.io/kh2y6/
• New England Collaborative Data Management Curriculum
• Data and donuts: http://libguides.colostate.edu/data-and-donuts

Contenu connexe

Tendances

pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
Gregor Hagedorn
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
Sherry Lake
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
Sherry Lake
 

Tendances (20)

Data and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefineData and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefine
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinar
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Arakno
AraknoArakno
Arakno
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
TAIR ICAR 2010 Presentation
TAIR ICAR 2010 PresentationTAIR ICAR 2010 Presentation
TAIR ICAR 2010 Presentation
 

Similaire à Data Management for librarians

Similaire à Data Management for librarians (20)

Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data Management
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Managing your research data
Managing your research dataManaging your research data
Managing your research data
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Data Management Planning in the arts
Data Management Planning in the artsData Management Planning in the arts
Data Management Planning in the arts
 
Creating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchCreating a Data Management Plan for your Research
Creating a Data Management Plan for your Research
 
Creating and Maintaining a Sustainable Research Data Management Service: Wher...
Creating and Maintaining a Sustainable Research Data Management Service: Wher...Creating and Maintaining a Sustainable Research Data Management Service: Wher...
Creating and Maintaining a Sustainable Research Data Management Service: Wher...
 
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
 

Plus de C. Tobin Magle

Plus de C. Tobin Magle (11)

Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Open access day
Open access dayOpen access day
Open access day
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Dernier

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Dernier (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Data Management for librarians

  • 1. Data Management for librarians C. Tobin Magle, PhD Cyberinfrastructure facilitator University Libraries Colorado State University
  • 2. My background • Not technically a librarian • 10 years of biomedical research experience • HSL: Bioinformationist • CSU: Data Management Specialist • Now: cyberinfrastructure specialist CT Magle et al Infect Immun. 2014 82(2):618-25. doi: 10.1128/IAI.00444-13.
  • 3. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 4. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 5. What PhD students learn 2 years of classwork ~5 years of bench work Write a dissertation + research reports Congrats Dr.!
  • 7. Researchers are human Data Management does not come naturally to most researchers. + Librarians are service oriented and good at organization = Data Management services in libraries
  • 8. What is data management? The policies, practices and procedures needed to manage the storage, access and preservation of data produced from a research project
  • 9. data management != data sharing • but the same principles apply to both
  • 10. Why should researchers care about data management? Rinehart, AK. “Getting emotional about data” College & Research Libraries News September 2015 vol. 76 no. 8 437-440
  • 11. *ok not everything, but most things
  • 14. Working Email Data are extant (If status known) Status of data (if response) Response (if email working) doi:10.1016/j.cub.2013.11.014
  • 15. We are losing vast amounts of data 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 11 1 1 1 1 1 1 1 0 0 0 0 0 0 0 00 0 00 0 1 1 1 1 1 0
  • 16. Research funding is tight http://www.bu.edu/research/articles/funding-for-scientific-research/
  • 17. Federal agencies advocate OA https://obamawhitehouse.archives.gov/blog/2017/01/09/making-federal-research-results- available-all
  • 18. Private funders require sharing http://www.gatesfoundation.org/how-we-work/general-information/open-access-policy
  • 19. It’s good for science • Improves research reproducibility • Improves efficiency • Spurs innovation
  • 20. It’s good for researchers • “You are the future data user” • Data gets used (and cited) • Exposure to collaborators • More competitive grants
  • 21. Where does data management fit into research? Throughout the whole research cycle
  • 32. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 33. Where do I start? •Who are your patrons? •Which ones need DM services? •Does anyone else provide these services?
  • 34. Who are your patrons? Affiliates Partners
  • 35. Focus your efforts Bioengineering Biomedical Sciences Biostatistics Cancer Biology Cell Biology, Stem Cells, and Development Clinical Science Computational Bioscience Epidemiology Health Services Research (CSPH collaborative) Human Medical Genetics Immunology Integrated Physiology Program Pathology Pharmacology Physiology and Biophysics Medical Scientist Training Program Microbiology Molecular and Cellular Pharmacology Molecular Biology Neuroscience Nursing Pharmaceutical Sciences Rehabilitation Science Reproductive Sciences Structural Biology and Biochemistry Toxicology Biochemistry and Molecular Genetics Cell & Developmental Biology Immunology/Microbiology
  • 36. Who already provides services?
  • 37. Your Campus survey 1. What units are on your campus? 2. Which ones produce research data? 3. Who already provides data management services?
  • 38. Communication • How do patrons find out about services? 1.Curated lists of department contacts 2.Listserves 3.Web presence • Leverage existing library strategies
  • 42. Advertising 1.Curated lists of department contacts • Departments and Graduate programs • Faculty and administrative staff 2.Data management listserve • Populated from workshop attendees 3.Web presence
  • 43. Start your communications strategy now! • Make a list of 5 departments you want to contact • Who is the department chair? Who is the administrative contact? • Can you get your own listserv? If yes, how? • Where can information about DM services live on the web?
  • 44. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 45. Workshop • Set a regular time and date: Build in breaks • Fill speaker slots: Doesn’t have to be you • ADVERTISE: web, lists, • Keep attendance records - communications • Evaluate content: improve
  • 46. ORDER FOOD • Scientists are hungry. • Will attend seminars for food. • (That’s why data and donuts exists) http://arcticdragonwolf97.deviantart.com/
  • 47. Evaluate content • Set learning objectives • Survey after class • Ask question about things you’re not sure of
  • 48. Ask about learning objectives
  • 55. Exercise: Workshops • Does your library have space to hold workshops? Do you have access? • What times of year are patrons available for workshops • How will you evaluate workshop content? • What parts of holding a workshop are you the least sure about
  • 56. Consultations* • Librarians already do this for literature searches, endnote, etc. • Choose topics you’re comfortable with • Make sure people know about them through above strategies • Create a clear mechanism to ask for help *Some researchers assume consultations are fee-based. Make sure to tell them it’s free.
  • 57. Exercises: Consultation • Does your library already have a way to ask to consult with a librarian? • What topics do you feel comfortable (or will become comfortable) giving consultations on? • How will people find out about consultation services?
  • 58. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 59. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 60. What is a data management plan? A description of how you plan to describe, preserve and share your research data. Often required by funding agencies
  • 61. What is research data? • “The recorded factual material commonly accepted in the scientific community as necessary to validate research findings” - White House Office of Management and Budget • Reality: anything that is a (digital) product or your research
  • 62. Successful DMPs include • A data inventory, including type(s) and size • A strategy for describing the data • A plan for preserving the data long term • A method for access to the data Always make sure to follow funder requirements
  • 63. DMPTool • Review requirements from different agencies • https://dmptool.org/guidance • Create new DMPs based on funding agency templates • Search public DMPs
  • 64. Exercise: DMPs • What funders do researchers use on your campus? • What DMP requirements do they have? • Can you find an example DMP in DMPTool?
  • 65. Data inventory • What type of data are you going to collect? • What file type will be produced? • What size will these files be? How many files? • What other research outputs will be produced? • Code/Software? • Templates/protocols?
  • 66. Data inventory miRNA sequences FASTQ files 1 GB per file x 64 strains x 3 replicates ------------------- ~200 GB R scripts for analysis and visualization Data use tutorials • What type of data are you going to collect? • What file type will be produced? • What size will these files be? How many files? • What other research outputs will be produced? • Code/Software? • Templates/protocols?
  • 67. Data formats • Avoid proprietary formats • Know what software can read your data Proprietary Format Alternative Format Excel (.xls, .xlsx) Comma Separated Values (.csv) Word (.doc, .docx) plain text (.txt) PowerPoint (.ppt, .pptx) PDF/A (.pdf) Photoshop (.psd) TIFF (.tif, .tiff) Quicktime (.mov) MPEG-4 (.mp4) MPEG 4 Protected audio (.m4p) MP3 (.mp3)
  • 68. Exercise: Data Inventory What kind of data are you going to collect? What file type will be produced? What size will these files be? How many files? What other research outputs will be produced?
  • 69. Exercise: Data Inventory • A researcher comes to you with an Excel file. What file format would you recommend for data preservation? • A researcher comes to you to figure out how much storage they need for their data. They’re planning on producing image files for 3 different types of cells. Each cell type will have 12 images, and each file is about 50 megabytes each. How much storage will they need minimum?
  • 70. A strategy for describing the data • Metadata: Relevant information for re-creation and re-use • Contact info • How data was collected • Details about collection • Date, location of collection • Units • Can be as simple as a text file
  • 71. Genomics example (README) This project contains next-generation miRNA sequencing data from 64 mouse strains. Brain tissue from 10 week old male mice were harvested, stored in RNA later. RNA was extracted using an RNeasy kit, and miRNA libraries were produced using an Illumina kit. They were run on an Illumina mySeq sequencer. The FASTQ Files produced were analyzed in R using Bioconductor. The data and descriptive will be made available on NCBI in the bioproject (PRJXXXX). The scripts used to analyzed the data are available on github (URL). Tutorials for data use will be made available in the Digital Collections of Colorado (handle). Contact Tobin Magle (tobin.magle@colostate.edu) for more information. http://orcid.org/0000-0003-3185-7034
  • 72. Metadata standards • Dublin Core: http://dublincore.org/documents/dcmi-terms/ • Can be applied to anything • Many discipline specific metadata standards • EML: https://knb.ecoinformatics.org/#external//emlparser/docs/index.html • MIAME: http://fged.org/projects/miame/ • Search for other standards: • http://www.dcc.ac.uk/resources/metadata-standards • https://biosharing.org/standards/
  • 74. Exercise: Describe your data What do people need to know to reuse your data? Are there any discipline-specific metadata standards? What format will you describe your data in (text, XML, tabular)? What fields will you include (author, date, format, identifier?)
  • 75. Exercise: Metadata • A researcher comes to you with a microarray dataset. What type of metadata standard would you recommend? • A researcher thinks there might be a metadata standard for their type of data, but isn’t sure where to find it. Where would you have them look for one? • You help this researcher look for a standard, but there isn’t one. How would you help them document their research?
  • 76. A plan for preserving the data long term • What will you do to ensure data are properly stored and preserved? • Include metadata and other products needed for reuse • Might change over course of the project
  • 77. Preservation questions • What will you store? • Who will be in charge? • How long will you store it? • Where will you store it? • Multiple copies
  • 78. Recommendations for backing up data • Store in geographically distinct locations • Automation: Will you remember to do it manually? • Security: Are you working with PHI?
  • 79. Exercise: Preservation plan What will you store? Who will be responsible for the data (person or position)? How long will you store it? Where will you store it? How will you back it up?
  • 80. Exercise: Preservation • Does your campus have data storage options? Who handles it? Departments? The university? Individual researchers? • How are these data storage solutions backed up? • Do these storage solutions meet preservation best practices? If not, how can they be improved?
  • 81. A method to access the data • Important to funding agencies • Reproduce existing research • Promote further research • Must be easily available: • No “by request only” • Embargoes are “ok” • Data security: consider privacy and IP issues before sharing
  • 82. Data access and sharing best practices • Non-proprietary formats • Include metadata • Proper storage • Stable identifier • Licensing: conditions for reuse
  • 83. Trusted Repositories: store and share • Discipline specific repositories • Search: http://service.re3data.org/browse/by- subject/ • Generic: • Figshare - https://figshare.com/ • Dryad - http://datadryad.org/ • CSU Digital Repository: • http://lib.colostate.edu/digital-collections/ http://67.media.tumblr.com/6228cbe58a9652f1a85e8a b1ed08d715/tumblr_inline_n6oukhNlZW1qf11bs.png
  • 84. Data archiving service • Finished products for sharing • CSU Digital Repository • Over 100 Datasets • Satisfy requirements for manuscripts and grants • At no cost <1 TB • $150/TB for 5 years • $300/TB for >5 years
  • 85. Stable identifiers • URLs break • Stable identifiers are permanent in a database • Some provide linking capabilities • DOI – https://doi.org/10.1109/5.771073 • Handle- http://hdl.handle.net/10217/177356
  • 86. Licensing • State your conditions for reuse • Paper citation? • Disclaimers • Must justify limitations, describe how you’ll advertise them • Creative common licenses are a good starting point
  • 87. Exercise: Access methods Where will people be able to access the data? Does your discipline have a repository? What kind of stable identifier will it have? What are the conditions for reuse? Are there any limitations to use of these data? Why?
  • 88. Exercise: Data Access • Does your institution have a digital repository? • Do you currently accept datasets? • If not, where could researchers put a dataset? • A researcher wants to put his dataset in a discipline specific repository. Where would you tell him to find one?
  • 89. Outline • Why data management services? • Making a plan: campus survey and communication strategy • Services to provide: workshops and consultations • Deep dive into two topics • Data management plans • Data preservation and sharing
  • 90. Data preservation and sharing • Preserve: File formats and storage • Describe: Metadata standards and standard languages • Share: FAIR principles and repositories
  • 92. Digital data preservation Short Term • During the project • Frequent changes • Your responsibility Long term • After the project is over • Little to no changes • Can be outsourced
  • 93. Preservation best practices • Back up your data! • Save in archival formats • Include metadata
  • 94. Data backup • Make 3 copies • Protects against natural disasters • Example • Computer HD • External hard drive • Cloud
  • 95. Data formats • Avoid proprietary formats • Use common data standards in your field • Find standards: https://fairsharing.org/standar ds/?q=&selected_facets=type _exact:model/format
  • 96. Proprietary formats and alternatives Proprietary Format Alternative Format Excel (.xlsx) Comma Separated Values (.csv) Word (.docx) plain text (.txt) or PDF/A (.pdf) PowerPoint (.pptx) PDF/A (.pdf) Photoshop (.psd) TIFF (.tif, .tiff) Quicktime (.mov) MPEG-4 (.mp4) MPEG 4 (.m4p) .mp3 (compressed)/.wav (uncompressed) https://www.loc.gov/preservation/digital/formats/content/content_categories.shtml
  • 97. Excel guidelines • One table per sheet • Sheet -> .csv • No formatting • No images • No formulas • Make your data tidy
  • 98. Excel Archival Tool • Input: Excel File • Output (per tab): • One .csv • One .txt for formulas • One HTML visualization • https://github.com/mcgrory/ ExcelArchivalTool
  • 99. Exercise 2: Formats Think of one type of data that you produce: • What format is it in? • Is it proprietary? If so, what alternative format? • Is it the most commonly used data format for that type of data in your field?
  • 100. Describe Metadata (README files, codebooks, Metadata standards)
  • 101. Metadata • Relevant information for discovery, re-creation and re-use • Descriptive – using data • Discovery – finding data http://library.umassmed.edu/necdmc/necdmc_module3.pptx
  • 102. Descriptive Metadata • Provides context: everything you need to know to interpret and reuse the data • Examples • Readme files • Code books http://library.umassmed.edu/necdmc/necdmc_module3.pptx
  • 103. README files • Describe the contents of data files • List software necessary to interpret the data • Unstructured format: • (+) human readable • (-) not machine readable
  • 104. Codebooks • Define the variables and their units • Explain the formats for dates, time, geographic coordinates • Define any coded values and missing values
  • 105. Discipline specific metadata • Specify pieces of information to include • Specify format • Not available in all fields • Find standards • http://www.dcc.ac.uk//resources/metadata-standards • https://fairsharing.org/standards/?q=&selected_facets=type_exact:reporting% 20guideline
  • 106. Exercise 3: • Think of the data you chose to work with in Exercise 2 • What information would you include a README file? A codebook? • Is there a metadata standard you could use?
  • 108. FAIR principles • Findable: searchable and has a unique ID • Accessible: in a repository • Interoperable: Described in common standards for your field • Reusable: Properly described and licensed for reuse https://www.nature.com/articles/sdata201618
  • 110. Licensing • State your conditions for reuse • Citation? • Creative common licenses are a good starting point • CC-0 for data
  • 111. Interoperable Reusable Controlled Vocabularies Licenses File formatsMetadata Descriptive Metadata Codebooks Metadata standards Readme Community Standards Non-proprietary
  • 112. Controlled Vocabularies • “Official” names for things • Ontologies: include relationships between terms • Search for relevant ontologies: https://fairsharing.org/standards/ ?q=&selected_facets=type_exa ct:terminology%20artifact https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/
  • 113. Exercise 4: • Think about the data from exercises • Is there a standard ontology you could use to describe the data?
  • 114. Accessible Interoperable Reusable Controlled Vocabularies Repositories Licenses File formats Persistent identifiers Metadata Descriptive Metadata Codebooks Web interface Metadata standards Readme Community Standards Non-proprietary
  • 115. Repositories (aka databases) provide • A place to put (meta)data • Unique IDs for each dataset • A search interface • Metadata requirements
  • 116. Finding Research Data Repositories FAIRSharing https://fairsharing.org/database s/ Registry of Research Data Repositories http://www.re3data.org/
  • 117. CSU Digital Repository • Over 100 Datasets • Dublin core metadata • Supports all* (meta) data types • At no cost <1 TB • $150/TB for 5 years • $300/TB for >5 years *that we know of
  • 118. CSU Repository Deposit Steps • Contact Tobin! – self deposit is in the works • Prepare: • Data files • Metadata (Readme.txt, codebooks, etc) • Any related additional files (like a license) • Deposit agreement: • Declare you have right to deposit • Give permission to repository to perform preservation and access procedures • Upload Module 7: Archiving & Preservation
  • 119. Stable identifiers • URLs break • Stable identifiers are permanent in a database • Some provide linking capabilities • DOI – https://doi.org/10.1109/5.771073 • Handle- http://hdl.handle.net/10217/177356
  • 120. Findable Accessible Interoperable Reusable Controlled Vocabularies Repositories Licenses File formats Persistent identifiers Metadata Descriptive Metadata Discovery Metadata Codebooks Search Interface Web interface Metadata standards Readme Community Standards Non-proprietary
  • 121. Discovery Metadata • Make your datable findable • Metadata standards • Defined by the repository http://library.umassmed.edu/necdmc/necdmc_module3.pptx
  • 122. Metadata standard: Dublin Core • Can be applied to anything • 15 core Elements: • http://dublincore.org/documents/dc es/ • CSU librarians help you write this metadata • Example: http://hdl.handle.net/10217/1802 80
  • 123. Exercise 5: • How FAIR is your data? • Findable? • Accessible? • Interoperable? • Reusable?
  • 124. Preservation topics not in DMPs • Licensing • Controlled vocabularies/Ontologies • Discovery vs. descriptive metadata
  • 125. Summary • Most of the work for sharing is part of Preservation - Think about it BEFORE you start your project (DMP) - Do the work as you go • Make preservation and sharing easier with a trusted repository • It’s complicated! Ask for help
  • 126. Other resources to learn • The Medical Library Association Guide to Data Management for Librarians • The DART project: https://osf.io/kh2y6/ • New England Collaborative Data Management Curriculum • Data and donuts: http://libguides.colostate.edu/data-and-donuts