SlideShare une entreprise Scribd logo
1  sur  47
Data Management for
Graduate Students
Marriott Library Graduate Student Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
September 27, 2016
• Introductions
• What are data?
• Why manage data?
• Data Management Plans
• Data Organization
• Metadata
• Storage and Archiving
• Questions
In the next hour…
Name
MajorResearch Project
What is data management?
Activities and practices that support long-
term preservation, access, and use of data
What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110
Data are diverse
Data are messy
We manage data first and
foremost for ourselves
Why else manage data?
• Meet grant and journal
requirements
• Promote reproducible research
• Enable new discoveries from
your data
We are trying to avoid
this scenario…
Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee
Data Management Plans
• What data are generated by your research?
• What is your plan for managing the data?
• How will your data be shared?
Elements of a DMP
• Types of data, including file formats
• Data description
• Data storage
• Data sharing, including confidentiality or
security restrictions
• Data archiving and responsibility
• Data management costs
DMPTool – CDL
Data organization
File naming
MyData.xls
MeetingNotes.doc
Presentation.ppt
Assignment1.pdf
File naming best practices
1. Be descriptive not
generic
2. Appropriate length
(about 25 chars or less)
3. Be consistent
4. Think critically about
your file names
File naming best practices
• Files should include only letters,
numbers, and underscores/dashes.
• No special characters.
• No spaces; Use dashes, underscores, or
camel case (likeThis).
• Avoid case dependency.Assume this,
THIS, and tHiS are the same.
• Have a strategy for version control.
• Don’t overwrite file extensions
One potential strategy
Version Control - Numbering
001
002
003
009
010
099
Use leading zeros for
scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version
changes and decimals for minor changes (v1.1, v2.6)
1
10
2
3
9
99
Version Control - Dates
If using dates useYYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too 
From a DMP…
“Each file name, for all types of data, will
contain the project acronym PUCCUK; a
reference to the file content (survey,
interview, media) and the date of an event
(such as the date of an interview).
• PLPP_EvaluationData_Workshop2_2014.xlsx
• MyData.xlsx
• publiclibrarypartnershipsprojectevaluationdataw
orkshop22014CummingsHelenaMontana.xlsx
Who filed better?
Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL
Structuring folders and files
• Consider all the types of files you will handle during the course
of your project.
• Develop a nested folder structure that makes sense for your
project and your team’s retrieval needs.
• Name folders clearly, without special characters.
• Use a standard folder structure for each project or subproject
(including making folders for files not yet created)
• Create a reference document (README file) that notes the
purpose of different folder.
University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management
README files
File organization exercise
Describing data
Research Documentation
• Grant proposals and related reports
• Applications and approvals (e.g. IRB)
• Codebooks, data dictionaries
• Consent forms
• Surveys, questionnaires, interview protocols
• Transcripts, hard copies of audio and video files
• Any software or code you used (no matter how
insignificant or buggy)
IJ?
XVAR?
FNAME?
What goes in a codebook?
• Variable name
• Variable meaning
• Variable data types
• Precision of data
• Units
• Known issues with the data
• Relationships to other
variables
• Null values
• Anything else someone
needs to better understand
the data
Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary Bradshaw from
the University of Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney Cells in Serum
Media and the Effect of Viral Transformation On
Growth”. It concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology
At the very least…
• Title
• Creator
• Description
• Date
• Type
• Publisher
• Format
• Identifier (DOI)
• Rights
• Any other critical
information to understand
or cite the data.
Data ownership
Data Storage
LOCKSS (Lots of
Copies Keeps
Stuff Safe)
Options for data
storage
• Personal computers or laptops
• Networked drives
• External storage devices
3-2-1 Backup Rule
Have 3 copies of your data
On 2 different media
In more than 1 physical location
Ubox – box.utah.edu
Language from a DMP
“All data files will be stored on the University server that is backed
up nightly.The University's computing network is protected from
viruses by a firewall and anti-virus software. Digital recordings will
be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the
office. Interview recordings and transcripts, which may contain
personal information, will be password protected at file-level and
stored on the server.
Original versions of the files will always be kept on the server. If
copies of files are held on a laptop and edits made, their file names
will be changed.”
Thinking long-
term
Archiving options
• Domain-specific repository
• General Purpose Data Repository
• Institutional repository
When you archive…
• Save the data in both its proprietary and non-proprietary
format (e.g. Excel and CSV; Microsoft Word and ASCII)
• Consider any restrictions on your data (copyright, patent,
privacy, etc.)
• When possible/mandated/desired, share your data online
with a persistent identifier (DOI or ARK)
• Include a data citation and state how you want to get
credit for your data
• Link your data to your publications as often as possible
Your data librarians
Daureen Nesdill
Research Data
Management
Librarian,
Sciences
Darell Schmick
Research
Librarian, Health
Sciences
Rebekah Cummings
Research Data
Management
Librarian, Social
Sciences &
Humanities
Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project
Questions?
Rebekah Cummings
rebekah.cummings@utah.edu
(801) 581-7701
Marriott Library, 1705Y
…or ask now!

Contenu connexe

Tendances

Best practices data management
Best practices data managementBest practices data management
Best practices data managementSherry Lake
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAleatha Parker-Wood
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Mojtaba Lotfaliany
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data ManagementOpenAIRE
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
 
NPA Data science: Progression pathway topics
NPA Data science: Progression pathway topicsNPA Data science: Progression pathway topics
NPA Data science: Progression pathway topicsKate Farrell
 
Data Citation and DOIs
Data Citation and DOIsData Citation and DOIs
Data Citation and DOIsARDC
 
Top (10) challenging problems in data mining
Top (10) challenging problems  in data miningTop (10) challenging problems  in data mining
Top (10) challenging problems in data miningAhmedasbasb
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)mhb120
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your DataElaine Martin
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverseMerce Crosas
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planC. Tobin Magle
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDMMarieke Guy
 
Data challenges for researchers
Data challenges for researchersData challenges for researchers
Data challenges for researchersMichael Hoffman
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librariansC. Tobin Magle
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and LibrariansJohann van Wyk
 

Tendances (20)

Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
 
NPA Data science: Progression pathway topics
NPA Data science: Progression pathway topicsNPA Data science: Progression pathway topics
NPA Data science: Progression pathway topics
 
Data Citation and DOIs
Data Citation and DOIsData Citation and DOIs
Data Citation and DOIs
 
Top (10) challenging problems in data mining
Top (10) challenging problems  in data miningTop (10) challenging problems  in data mining
Top (10) challenging problems in data mining
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your Data
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
Data challenges for researchers
Data challenges for researchersData challenges for researchers
Data challenges for researchers
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 

En vedette

Textile Technology - Diplomo Certificate
Textile Technology - Diplomo CertificateTextile Technology - Diplomo Certificate
Textile Technology - Diplomo CertificateEaswarlal Juttu
 
Compuestos Orgánicos
Compuestos Orgánicos Compuestos Orgánicos
Compuestos Orgánicos lorenscristina
 
Микола Зеров. Життя та творчість. Поезії.
Микола Зеров. Життя та творчість. Поезії.Микола Зеров. Життя та творчість. Поезії.
Микола Зеров. Життя та творчість. Поезії.Elena Pritula
 
Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...
Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...
Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...Mari Smith
 
19th February 2017 - What is a living Sacrifice?
19th February 2017 - What is a living Sacrifice?19th February 2017 - What is a living Sacrifice?
19th February 2017 - What is a living Sacrifice?Thorn Group Pvt Ltd
 

En vedette (8)

Textile Technology - Diplomo Certificate
Textile Technology - Diplomo CertificateTextile Technology - Diplomo Certificate
Textile Technology - Diplomo Certificate
 
Compuestos Orgánicos
Compuestos Orgánicos Compuestos Orgánicos
Compuestos Orgánicos
 
Mapa conceptual
Mapa conceptualMapa conceptual
Mapa conceptual
 
Hr management
Hr managementHr management
Hr management
 
Микола Зеров. Життя та творчість. Поезії.
Микола Зеров. Життя та творчість. Поезії.Микола Зеров. Життя та творчість. Поезії.
Микола Зеров. Життя та творчість. Поезії.
 
CV Rima Kusuma Dewi
CV Rima Kusuma DewiCV Rima Kusuma Dewi
CV Rima Kusuma Dewi
 
Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...
Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...
Facebook Marketing - What is Hot and What is Not - Mari Smith - Social Media ...
 
19th February 2017 - What is a living Sacrifice?
19th February 2017 - What is a living Sacrifice?19th February 2017 - What is a living Sacrifice?
19th February 2017 - What is a living Sacrifice?
 

Similaire à Data Management for Graduate Students

Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016Rebecca Raworth, MLIS
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchersSarah Jones
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016IzzyChad
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data pagetTERN Australia
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collectionSherry Lake
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)Graça Gabriel
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data ManagmentDaniel Crane
 
Conquering Chaos in the Age of Networked Science: Research Data Management
Conquering Chaos in the Age of Networked Science: Research Data ManagementConquering Chaos in the Age of Networked Science: Research Data Management
Conquering Chaos in the Age of Networked Science: Research Data ManagementKathryn Houk
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planC. Tobin Magle
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 

Similaire à Data Management for Graduate Students (20)

Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data Managment
 
Conquering Chaos in the Age of Networked Science: Research Data Management
Conquering Chaos in the Age of Networked Science: Research Data ManagementConquering Chaos in the Age of Networked Science: Research Data Management
Conquering Chaos in the Age of Networked Science: Research Data Management
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management plan
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 

Dernier

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Data Management for Graduate Students

  • 1. Data Management for Graduate Students Marriott Library Graduate Student Workshop Series Rebekah Cummings, Research Data Management Librarian J. Willard Marriott Library, University of Utah September 27, 2016
  • 2. • Introductions • What are data? • Why manage data? • Data Management Plans • Data Organization • Metadata • Storage and Archiving • Questions In the next hour…
  • 4. What is data management? Activities and practices that support long- term preservation, access, and use of data
  • 5. What are data? “The recorded factual material commonly accepted in the research community as necessary to validate research findings.” - U.S. OMB Circular A-110
  • 8. We manage data first and foremost for ourselves
  • 9. Why else manage data? • Meet grant and journal requirements • Promote reproducible research • Enable new discoveries from your data
  • 10. We are trying to avoid this scenario…
  • 11. Two bears data management problems 1. Didn’t know where he stored the data 2. Saved one copy of the data on a USB drive 3. Data was in a format that could only be read by outdated, proprietary software 4. No codebook to explain the variable names 5. Variable names were not descriptive 6. No contact information for the co-author Sam Lee
  • 12. Data Management Plans • What data are generated by your research? • What is your plan for managing the data? • How will your data be shared?
  • 13. Elements of a DMP • Types of data, including file formats • Data description • Data storage • Data sharing, including confidentiality or security restrictions • Data archiving and responsibility • Data management costs
  • 18. File naming best practices 1. Be descriptive not generic 2. Appropriate length (about 25 chars or less) 3. Be consistent 4. Think critically about your file names
  • 19. File naming best practices • Files should include only letters, numbers, and underscores/dashes. • No special characters. • No spaces; Use dashes, underscores, or camel case (likeThis). • Avoid case dependency.Assume this, THIS, and tHiS are the same. • Have a strategy for version control. • Don’t overwrite file extensions
  • 21. Version Control - Numbering 001 002 003 009 010 099 Use leading zeros for scalability Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes (v1.1, v2.6) 1 10 2 3 9 99
  • 22. Version Control - Dates If using dates useYYYYMMDD June2015 = BAD! 06-18-2015 = BAD! 20150618 = GREAT! 2015-06-18 = This is fine too 
  • 23. From a DMP… “Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).
  • 24. • PLPP_EvaluationData_Workshop2_2014.xlsx • MyData.xlsx • publiclibrarypartnershipsprojectevaluationdataw orkshop22014CummingsHelenaMontana.xlsx Who filed better?
  • 25. Who filed better? • July 24 2014_SoilSamples%_v6 • 20140724_NSF_SoilSamples_Cummings • SoilSamples_FINAL
  • 26. Structuring folders and files • Consider all the types of files you will handle during the course of your project. • Develop a nested folder structure that makes sense for your project and your team’s retrieval needs. • Name folders clearly, without special characters. • Use a standard folder structure for each project or subproject (including making folders for files not yet created) • Create a reference document (README file) that notes the purpose of different folder. University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management
  • 30. Research Documentation • Grant proposals and related reports • Applications and approvals (e.g. IRB) • Codebooks, data dictionaries • Consent forms • Surveys, questionnaires, interview protocols • Transcripts, hard copies of audio and video files • Any software or code you used (no matter how insignificant or buggy)
  • 32. What goes in a codebook? • Variable name • Variable meaning • Variable data types • Precision of data • Units • Known issues with the data • Relationships to other variables • Null values • Anything else someone needs to better understand the data
  • 33. Metadata Unstructured Data Structured Data There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells. Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth. Author Gary Bradshaw Date 1982 Publisher University of Nebraska Medical Center Subject Kidney -- Cytology
  • 34. At the very least… • Title • Creator • Description • Date • Type • Publisher • Format • Identifier (DOI) • Rights • Any other critical information to understand or cite the data.
  • 37. LOCKSS (Lots of Copies Keeps Stuff Safe)
  • 38. Options for data storage • Personal computers or laptops • Networked drives • External storage devices
  • 39. 3-2-1 Backup Rule Have 3 copies of your data On 2 different media In more than 1 physical location
  • 41. Language from a DMP “All data files will be stored on the University server that is backed up nightly.The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews. Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server. Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”
  • 43. Archiving options • Domain-specific repository • General Purpose Data Repository • Institutional repository
  • 44. When you archive… • Save the data in both its proprietary and non-proprietary format (e.g. Excel and CSV; Microsoft Word and ASCII) • Consider any restrictions on your data (copyright, patent, privacy, etc.) • When possible/mandated/desired, share your data online with a persistent identifier (DOI or ARK) • Include a data citation and state how you want to get credit for your data • Link your data to your publications as often as possible
  • 45. Your data librarians Daureen Nesdill Research Data Management Librarian, Sciences Darell Schmick Research Librarian, Health Sciences Rebekah Cummings Research Data Management Librarian, Social Sciences & Humanities
  • 46. Major takeaways • Data management starts at the beginning of a project • Document your data so that someone else could understand it • Have more than one copy of your data • Consider archiving options when you are done with your project

Notes de l'éditeur

  1. Specifically we are going to be be talking about data management of your research data, but some of the principles will help you when thinking about the organization of any digital materials, your notes, your PowerPoints, your grocery lists…. Most of these concepts are pretty straightforward, they almost seem like common sense, but the reality is that very few people manage their data well and if you do, you will be at a big advantage.
  2. Overview of what we will be covering in this session.
  3. Introductions Name Major Are you working on a research project?
  4. Data Management refers to activities throughout the data lifecycle. – The activities surrounding data management include Being a responsible reseracher. These activities happen during the research and after the research is completed.
  5. This is the most commonly cited definition when someone wants to pin a definition on data, which is surprisingly difficult to do. What data really is is evidence. Or as Michael Buckland puts it “alleged evidence”. It’s what you are putting forth as evidence for your research findings. “We’ve looked at all this stuff” using these methods and here are our conclusions. Research papers often give methods and conclusions but what they don’t usually contain is the underlying data or evidence. So what is data – EVIDENCE FOR YOUR RESEARCH
  6. One of the characteristics of data is that it tends to be incredibly diverse. Scientific data – observations, computational models, lab notebooks Social sciences – results of surveys, video recordings, field notes Humanities – text mining, newspapers, records of human history Each field tends to have their own practices around data collection, analysis, sharing, etc.
  7. Another attribute of data is that it tends to get messy Most of us just don’t realize this because our messy, disorganized files are locked up in a neat little box called your computer. Don’t believe me? How long would it take you to find a photo from five years ago on your computer? Here is a hint. If your image files start with DSC_ or IMG_ and some number following it, it will probably take you a very long time. If most people’s digital files were analog, this is exactly what they would look like.
  8. Why manage data? The main reason you should manage your data is for yourself and for your own research team. Data management is one of those essential skills you need to get just like learning how manage citations or understand research methods. But it can feel a bit boring like filing. But six months later when you want to locate a file, or even understand your file, your future self will thank you. Most important reason to have good data management is for your own good and the good of your research team. If you want to be able to locate your files or understand your files in the future, good data management is crucial. Plus, unlike research methods and managing citations, this is something that even seasoned scientists are not very good at. So you will have something to offer your research team in the future even as a young scientists. USE THE “DOING YOUR TAXES” ANALOGY – it’s easier if you’ve managing your receipts effectively throughout the year and compiling spreadsheets throughout the year, you will be in much better shape in April. Can you scramble for information at the end? Of course! But you are not maximizing your time and resources. You are likely not getting the returns you shoud and you are wasting time. Sometimes, the documentation isn’t available later. You won’t have to make guesses.
  9. Data management is one of those essential skills you need to get just like learning how manage citations or understand research methods. But it can feel a bit boring like filing. But six months later when you want to locate a file, or even understand your file, your future self will thank you. Most important reason to have good data management is for your own good and the good of your research team. If you want to be able to locate your files or understand your files in the future, good data management is crucial. Plus, unlike research methods and managing citations, this is something that even seasoned scientists are not very good at. So you will have something to offer your research team in the future even as a young scientists. NSF is now starting to look at DMPs as part of their post-award assessment checking to see if researchers did what they said they were going to do with data
  10. https://www.youtube.com/watch?v=N2zK3sAtr-4
  11. The most important thing you can do is to have and follow and data management plan. Next we are going to move on and talk a little bit about these data management plans that funding agencies are requiring (and I am promoting as a good idea in general!!) Your DMP should answer three main questions…
  12. Mention that in the UK your data management plan has to show that you’ve already looked for existing data. – ESRC Email me I would be happy to send you more examples.
  13. We’ve talked in broad strokes about data management but now we are going to focus in one some of the more specific aspects of managing data well. One of the simplest things that you can do is to be more consistent with file naming, version control, and folder structures. This section has a lot to do with organizing and naming your research materials so that you can find them later and so they will open in any environment.
  14. We’ve talked about data management at kind of a high level. What is data? Why should you manage it well? Now we are going to talk about some of the nuts and bolts of data management. Starting with file naming. How do you currently name files? Do you have a system? To some extent we are all guilty of bad file naming but when it comes to your research it is important to create a system that makes sense not just to you, but other people as well. are all guilty of bad file naming but when it comes to your research it is important to create a system that makes sense not just to you, but other people as well.
  15. Here are some examples of bad file names because they aren’t descriptive and don’t help us find the file later, and also because there is a possibility that these files will be overwritten the next time you name a file the same thing.
  16. File names should reflect the contents of a file and enough information to uniquely identify the data file without getting way too long. Don’t be generic in your file names Be consistent!!!! Your file name may include project acronym, location, investigator, date of data collection, data type, and version number. Whatever will help you or someone else uniquely identify that file in the future. Think about what can be added and what can be omitted in your file names. If you are the only person on a project, you probably don’t need your name. If there are going to be multiple versions of a file, make sure you add a version number or a date to differentiate.
  17. Here are some file naming best practices that will make sure your file will open in any environment with any operating system. Special characters can have special meaning in certain programming languages and operating systems and can be misinterpreted in file names. Uppercase lettering can affect numbering. Ex: $ = beginning of a variable names in php. A backslash designates file path locations in the Windows operating system. Spaces make things easier for humans to read but some browsers and software don’t know how to interpret spaces. Sometimes it only reads a file up to the space, which can cause problems.
  18. There are also best practices around version control and numbering. Version control is often achieved by using dates or a standard numbering system
  19. January, June, and July are going to line up next to each other. April and August are going to be together December is going to come before June, etc and all your Januarys from every year are going to line up.
  20. #1 is the best one. Descriptive Not too long, not too short
  21. #2 is the best choice here. First example here has spaces, irregular dates that won’t line up in order, special characters Third example may not be descriptive enough for for a secondary user. Also, beware of the “FINAL” as opposed to using a standardized numbering system.
  22. That is how to name an individual file. What about your whole file structure? All your research materials need to be in one folder. The top level folder should include the project title and year. If it is multiple year, include the first and last year in the title. The substructures should have a clear and consistent naming convention that is documented in a README file.
  23. Exercise!! Possible solutions: Organize by type of file (all transcripts in one folder all audio recordings in another) Organize by person (Have a Cliff Barrett folder and a Robert Bennett folder) Problems with file names: Dates are not standardized Special characters/spaces File type in the file name which is unnecessary Unnecessary information in file name – “found on Internet, think okay, better than mine” picture NO consistency to file naming
  24. Next we are going to talk about data description. A third characteristic of data is that it often needs context in order to be understandable If you have a spreadsheet of survey responses, you need to have the survey to understand the responses. You also need the codebook that explains your variable names and the values that you used, how you cleaned your data. Once again, try to think how a secondary user would interpret your data. When we say metadata we are really talking about two things: human readable documentation and machine-readable metadata The importance of documenting your data throughout your research project cannot be overestimated. Document your data with a certain level of reuse in mind. Replication? Verification? inspection?
  25. First and foremost, metadata includes any surrounding documentation you may need to make sense of your data. An excel spreadsheet of survey responses is fairly useless if you haven’t kept the survey that generated those responses.
  26. If you are working with variables, you must make a codebook and include it in your documentation.
  27. Metadata is very important for other people looking to use your project. Human readable vs. machine readable
  28. https://schema.labs.datacite.org/meta/kernel-4.0/doc/DataCite-MetadataKernel_v4.0.pdf
  29. Most researchers are very protective of their data. You work hard to collect it and you have a huge intellectual investment in it. Also, since most people have never been asked to hand over or even share their data, the assumption is often that the researcher is the one who owns the data. The truth, however, is more complicated than that. If you are an employee of the University, your data belongs to te University. If you move your research, you can request to take a copy of your data with you. UCSD/ USC court case – database Usually the PI is responsible for the data – data governance
  30. Through the course of your research your data needs to be stored securely, backed up, and maintained regularly. Once again this sounds like common sense, but you will be happy when you pay some attention to it. (e.g. when your laptop crashes or is stolen.). I’m going to play a short video clip that has nothing to do with research data, but I think it perfectly captures the way we approach the storage aspect of data management. https://www.youtube.com/watch?v=QyMgNZHtdk8
  31. #1 rule of data storage – never just keep your data on one device. You are one dropped computer, one spilled glass of water, one unscrupulous thief away from losing all of your data. Every single day I go to Mom’s Café and see people leave their computers at their table while they go to the bathroom or grab a cup of coffee. LOCKSS - There should never just be one copy of your data. Do you backup your data? Most important data management task. NO less than two, preferably three copies of research data. How well are you covered against unexpected loss? Make sure that when disaster strikes, it isn’t a disaster
  32. There are three options for Personal computers and laptops – Convenient for storing your data while in use. Should not be used for storing master copies of your data. Networked drives – Highly recommended. You can share data. Your data is stored in a single place and backed up regularly. Available to you from any place at any time. If using a department drive or Box stored securing thereby minimizing the risk of loss, theft, or authorized access. BEST!!! External storage devices – thumb drives, flash drives, external hard drive. Cheap, easy to store and pass around. Feel better knowing it’s in your hands where you can see it. Not recommended for the long-term storage of your data.
  33. 3,2,1 – 3 copies in 2 physical locations, or more than one media.
  34. 1 TB free storage and an additional 50 GB if you are on a sponsored project. Free! Secure! When you leave you can take a copy with you or create a new account
  35. This is an example of social science research where the data are interview recording and transcripts.
  36. Another area of data management that you will have to consider is data archiving. Archiving is not the same thing as storage Archiving adds additional value to your data. Long-term preservation Metadata Sharable, usually through a persistent identifier Makes data citable
  37. There are lots of archiving options for your data. Some people choose to put their data on their website which is an option, but not a best practice.