SlideShare une entreprise Scribd logo
1  sur  52
Data Management 101
Kristin Briney, PhD
Data Services Librarian
Do You Still Have Your Data?
• What if your hard drive crashes?
• What if you are accused of fraud?
• What if your collaborator abruptly quits?
• What if the building burns down?
• What if you need to use your old data?
• What if your backup fails?
• What if your computer gets stolen?
• What if…
What Are Data?
• Observational
– Sensor data, telemetry, survey data, sample data,
images
• Experimental
– Gene sequences, chromatograms, toroid magnetic
field data
• Simulation
– Climate models, economic models
• Derived or compiled
– Text and data mining, compiled database, 3D models,
data gathered from public documents
Why Data Management?
• Don’t loose data
• Find data more easily
– Especially if you need older data
• Easier to analyze organized, documented data
• Avoid accusations of fraud & misconduct
• Get credit for your data
• Don’t drown in irrelevant data
For each minute of planning at
beginning of a project, you will save
10 minutes of headache later
What This Session Covers
• Introduction to a few topics in data
management
– File organization conventions
– Documentation
– Storage and backups
What This Session Covers
• Hands-on exercises in each topic
• My goal is to offer practical, usable solutions
– Recognize that I can’t cover everything
Introduce Yourself!
• Name
• Department
• Most common data format
– Text, Excel, SPSS, Google Docs, etc.
FILE ORGANIZATION CONVENTIONS
File Naming Conventions
• Make it easier to find files
• Avoid many duplicates
– Especially when you’re not sure which is the latest
or most correct!
• Make it easier to wrap up a project because
you know which files belong to it!
File Naming Conventions
• Files should be named consistently
• Files names should be descriptive but short
(<25 characters)
• Use underscores instead of spaces
• Avoid these characters: “ /  : * ? ‘ < > * + & $
• Use the file dating convention: YYYY-MM-DD
– This works well with a lab notebook
File Versioning
• Why?
– If you only have one copy and you make a
mistake…
– If your data is stored in multiple locations
File Versioning
• For analyzed data, use version numbers
• Save files often to a new version
• Label the final version FINAL
• For code, consider GIT or SVN
File Organization
• Any system is better than none
• Possibilities
– One project, one folder (for small projects)
– Separate folders for data or project stages
– Separate folders for different types of data
– Date-based folders (pairs well with a lab
notebook!)
What To Avoid
• One person data hoards
• Data scattered across several machines
– Not backups! Backups are fine
• Storage that doesn’t mirror “ownership”
– If it’s communal, it belongs in a communal place
– If data collection happens on an individual’s
machine, that doesn’t mean the data should stay
there!
Document Your Conventions
• No point to have a system without
documentation
– README.txt
• Use .txt over .doc because it’s more durable
– Front cover of research notebook
– A printout by the computer
– Etc.
Document Your Conventions
• In project-wide README.txt
– Basic project information
• Title
• Contributors
• Grant info
• etc.
– Contact information for at least one person
– All locations where data live, including backups
– Useful information about the files and how
they’re organized
Exercise: File Naming Conventions
• Develop a file naming convention for your
most common data type
DOCUMENTATION
What would someone unfamiliar
with your data need in order to find,
evaluate, understand, and reuse
them?
Documentation
• Consider the differences between
– someone inside your lab
– someone outside your lab but in your field
– someone outside your field
• Two parts: metadata and methods
Documentation
Methods
• How the data were
gathered
• How the data should be
interpreted
• What you did
– Limitations on what you did
• …build trust in your data
Metadata
• What you’re looking at
• Who made it and when
• How it got there
• What it means and
• What you can do with it
• …before you even look at
the file
Methods
• Examples of methods to document
– Code
– Survey
– Codebook
– Data dictionary
– Anything that lets someone reproduce your results
• Don’t forget the units!
Metadata
• Informal and formal description of data
• Informal standard can fit your unique research
• Benefits of a formal standard
– Completeness
– Aids in sharing
– Often required for deposit into a repository
• May be required by your funder
Metadata
• Tons of formal standards available across
many, many disciplines
• Consult
– Disciplinary repository
– Your peers
– Subject librarian
– Data Services Librarian
Metadata
• Decide on a metadata standard before you
collect the data!
– Easier to record metadata when collecting data
than to convert later
• Standard or no, keep metadata CONSISTENTLY
and COMPUTABLY whenever you can
Metadata Standard: Dublin Core
• contributor
• coverage
• creator
• date
• description
• format
• identifier
• language
• publisher
• relation
• rights
• source
• subject
• title
• type
Metadata Example
• Contributor
– Jane Collaborator
• Creator
– Kristin Briney
• Date
– 2013 Apr 15
• Description
– A microscopy image of
cancerous breast tissues
under 20x zoom. This image is
my control, so it has only the
standard staining describe on
2013 Feb 2 in my notebook.
• Format
– JPEG
• Identifier
– IMG00057.jpg
• Relation
– Same sample as images
IMG00056.jpg and
IMG00055.jpg
• Subject
– Breast cancer
• Title
– Cancerous breast tissue
control
Exercise: Documentation
• For your most common data type, make a list
of the most important information to record
for each dataset
STORAGE AND BACKUPS
A Note on Security
• Does your data fall under the following?
– HIPAA
• Health information
– FERPA
• Student information
– FISMA
• Government subcontractor
– Human subject research, etc.
 Ask for help!
A Note on Security
• Secure storage
• Controlled access
• De-identification of personal information
• Security training
UWM Security Resources
• UWM Information Security Office
– Visit: https://www4.uwm.edu/itsecurity/
– Email: infosec@uwm.edu
• Certificate in Information Security
• HIPAA
– https://www4.uwm.edu/legal/hipaa/index.cfm
• FERPA
– http://www4.uwm.edu/academics/ferpa.cfm
Storage
• Library motto: Lots of Copies Keeps Stuff Safe!
• Rule of 3: 2 onsite, 1 offsite
• Storage run by experts is more reliable than
storage you run yourself
– It costs more, but that’s for a reason
Storage Options
• Computer
• USB/flash drive
• CDs/DVDs
• External hard drive
• Shared drives/servers
• Tape backup
• Cloud storage
Your Computer
• You’re using it, but should you keep data on it?
– What happens if you lose it?
– What happens if it is stolen?
– What happens if it breaks?
– Will the data stay there as long as you are required to
keep them?
• Don’t be disorganized
• Don’t keep sensitive data here
• Verdict: By itself it is not enough
USB/Flash Drive
• Pros
– Small, convenient package
– Big enough for a wide variety of datasets
• Cons
– Will you remember to back your data up onto it?
– Easy to lose
– Easy to perpetuate out-of-date copies
• Verdict: good for data transport, but not for
backup
CD-ROMs/DVD-ROMs
• Pros
– More reliable (but if one does fail, you won’t know
until it’s too late)
– Portable
• Cons
– Will you remember to back your data up onto it?
– Hassle to deal with
– Slow to write to
– Difficult to keep track of old copies
• Verdict: Not good for quick backup, and just okay
for periodic offsite backup
External Hard Drive
• Pros
– Relatively cheap
– Large storage capacity
• • Cons
– You have to set up, maintain, and audit it yourself
– Some brands are less reliable
– Disorganization a problem
• Verdict: Coupled with automatic-backup
software, an okay choice for onsite backup
– You’ll still want a second backup offsite
Shared Drives/Servers
• Pros
– Keeps data off your easily-stolen laptop
– Not your problem to manage
– Shared costs typically mean lower costs
• Cons
– Who’s managing the thing? Are they competent?
– Can have storage quotas
– Can be hard to get to outside the lab or the office
• Verdict: If well-managed, a good choice for regular use,
onsite, or offsite backup
– Beware the dusty Linux box under the desk!
Tape Backup
• Pros
– Can happen near-invisibly
– Highly reliable
– Tolerably secure (not always on network)
• Cons
– Can be hard or slow to get data back
– Not always audited as often as they should be
• Verdict: Good for onsite or offsite backup, if
somebody else is running them and you know
they’re regularly audited
Cloud Storage
• Pros
– Convenient syncing
– Cheap
– If client-side encryption is involved, decently secure
• Cons
– Required network connection
• Possible security risks and inconvenience if off-network
– Ongoing (and out of your control) costs
– Your backup is hostage to their business risks
– Reliability, security, privacy not guaranteed
• Verdict: For savvy shoppers, fine for offsite backup. A
little risky for your only backup.
Exercise: Storage
• Conduct a quick inventory of your data
– What datasets do you have?
– How big are they?
• Inventory where your files are currently
stored, including backups. How safe are your
data?
Backups
• Any backup is better than none
• Automatic backup is better than manual
• Your research is only as safe as your backup
plan
– Lots of horror stories here
Ideal Backup Characteristics
• Low effort
• High reliability
• As secure as necessary
– Tradeoffs between security and convenience
• As open as possible to collaborators
• Well organized
Check Your Backups
• Backups only as good as ability to recover data
• Test your backups periodically
– Preferably a fixed schedule
– 1 or 2 times a year may be enough
– Bigger/more complex data should be checked
more often
• Test your backup whenever you change things
A Final Note
• Must retain data at least 3 years post-project
per OMB Circular A-110
– Better to retain for >6 years
• Consider letting someone else worry about
this
– A disciplinary repository
– The UWM Digital Commons
Exercise: Backups
• Sketch out your ideal backup system, and
identify the first step in getting to there from
your current system.
WHERE TO GO FROM HERE
Where to Go from Here
• Talk to your coworkers
– …but be aware you might not be able to change
things
– Discuss
• Common schemes for metadata and file naming
• Centralized documentation
• Robust backup
• Use good practices and be a model for others
UWM Resources
• Data management resources
– dataplan.uwm.edu
• Information Security Office
– www4.uwm.edu/itsecurity/
• Data Services Librarian
– Kristin Briney, briney@uwm.edu
Thank You!
• This presentation available under a Creative Commons
Attribution (CC-BY) license
• Some content courtesy of Dorothea Salo
– http://dsalo.info/
– http://www.graduateschool.uwm.edu/research/researcher
-central/proposal-development/data-plan/boot-camp/

Contenu connexe

Tendances

Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research DataKristin Briney
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and ProcessingCRRC-Armenia
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data ManagementOpenAIRE
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Best practices data management
Best practices data managementBest practices data management
Best practices data managementSherry Lake
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification303Computing
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data scienceLoïc Lejoly
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data ManagementUW Research Data Services
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)Graça Gabriel
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01Aseem Chakrabarthy
 

Tendances (20)

Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
DBMS
DBMSDBMS
DBMS
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and Processing
 
rEDCap At A Glance
rEDCap At A GlancerEDCap At A Glance
rEDCap At A Glance
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
 

Similaire à Data Management 101

Data Management 101 (2015)
Data Management 101 (2015)Data Management 101 (2015)
Data Management 101 (2015)Kristin Briney
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementKristin Briney
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
Data Management Crash Course
Data Management Crash CourseData Management Crash Course
Data Management Crash CourseKristin Briney
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The TrenchesGeorge Ang
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Rebekah Cummings
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of ZurichMind the Byte
 
Introduction to Digital Preservation
Introduction to Digital PreservationIntroduction to Digital Preservation
Introduction to Digital PreservationBill LeFurgy
 
7-Backups of security Devices-03-06-2023.ppt
7-Backups of security Devices-03-06-2023.ppt7-Backups of security Devices-03-06-2023.ppt
7-Backups of security Devices-03-06-2023.pptabhichowdary16
 
CNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis MethodologyCNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis MethodologySam Bowne
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate StudentsRebekah Cummings
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolkfear
 
Data for Action Talk - 2016-02-22
Data for Action Talk - 2016-02-22Data for Action Talk - 2016-02-22
Data for Action Talk - 2016-02-22David E Drummond
 

Similaire à Data Management 101 (20)

Data Storage & Preservation
Data Storage & PreservationData Storage & Preservation
Data Storage & Preservation
 
Data Management 101 (2015)
Data Management 101 (2015)Data Management 101 (2015)
Data Management 101 (2015)
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Data Management Crash Course
Data Management Crash CourseData Management Crash Course
Data Management Crash Course
 
Andrew Waugh presentation
Andrew Waugh   presentationAndrew Waugh   presentation
Andrew Waugh presentation
 
Andrew waugh
Andrew waughAndrew waugh
Andrew waugh
 
Data Management 101
Data Management 101Data Management 101
Data Management 101
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of Zurich
 
Introduction to Digital Preservation
Introduction to Digital PreservationIntroduction to Digital Preservation
Introduction to Digital Preservation
 
7-Backups of security Devices-03-06-2023.ppt
7-Backups of security Devices-03-06-2023.ppt7-Backups of security Devices-03-06-2023.ppt
7-Backups of security Devices-03-06-2023.ppt
 
CNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis MethodologyCNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis Methodology
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPTool
 
Data for Action Talk - 2016-02-22
Data for Action Talk - 2016-02-22Data for Action Talk - 2016-02-22
Data for Action Talk - 2016-02-22
 

Plus de Kristin Briney

NCURA Webinar on Open Data
NCURA Webinar on Open DataNCURA Webinar on Open Data
NCURA Webinar on Open DataKristin Briney
 
Leveling Up Data Management
Leveling Up Data ManagementLeveling Up Data Management
Leveling Up Data ManagementKristin Briney
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management BarrierKristin Briney
 
TEDxUWMilwaukee: Rethinking Research Data
TEDxUWMilwaukee: Rethinking Research DataTEDxUWMilwaukee: Rethinking Research Data
TEDxUWMilwaukee: Rethinking Research DataKristin Briney
 
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...Kristin Briney
 
Measuring Research Impact
Measuring Research ImpactMeasuring Research Impact
Measuring Research ImpactKristin Briney
 
Retaining Your Old Research Data
Retaining Your Old Research DataRetaining Your Old Research Data
Retaining Your Old Research DataKristin Briney
 
Organizing Your Research Data
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research DataKristin Briney
 
Documenting Your Research Data
Documenting Your Research DataDocumenting Your Research Data
Documenting Your Research DataKristin Briney
 
Storing Your Research Data
Storing Your Research DataStoring Your Research Data
Storing Your Research DataKristin Briney
 
Research Data & Digital Preservation - CUWL Conference 2014
Research Data & Digital Preservation - CUWL Conference 2014Research Data & Digital Preservation - CUWL Conference 2014
Research Data & Digital Preservation - CUWL Conference 2014Kristin Briney
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarKristin Briney
 
Electronic Laboratory Notebooks
Electronic Laboratory NotebooksElectronic Laboratory Notebooks
Electronic Laboratory NotebooksKristin Briney
 
Data Management Tips Handout
Data Management Tips HandoutData Management Tips Handout
Data Management Tips HandoutKristin Briney
 
Data Management Plan Checklist
Data Management Plan ChecklistData Management Plan Checklist
Data Management Plan ChecklistKristin Briney
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Kristin Briney
 
Electronic Lab Notebooks
Electronic Lab NotebooksElectronic Lab Notebooks
Electronic Lab NotebooksKristin Briney
 

Plus de Kristin Briney (20)

NCURA Webinar on Open Data
NCURA Webinar on Open DataNCURA Webinar on Open Data
NCURA Webinar on Open Data
 
Internet Privacy
Internet PrivacyInternet Privacy
Internet Privacy
 
Leveling Up Data Management
Leveling Up Data ManagementLeveling Up Data Management
Leveling Up Data Management
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management Barrier
 
Twitter For Academics
Twitter For AcademicsTwitter For Academics
Twitter For Academics
 
TEDxUWMilwaukee: Rethinking Research Data
TEDxUWMilwaukee: Rethinking Research DataTEDxUWMilwaukee: Rethinking Research Data
TEDxUWMilwaukee: Rethinking Research Data
 
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
 
Measuring Research Impact
Measuring Research ImpactMeasuring Research Impact
Measuring Research Impact
 
Retaining Your Old Research Data
Retaining Your Old Research DataRetaining Your Old Research Data
Retaining Your Old Research Data
 
Organizing Your Research Data
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research Data
 
Documenting Your Research Data
Documenting Your Research DataDocumenting Your Research Data
Documenting Your Research Data
 
Storing Your Research Data
Storing Your Research DataStoring Your Research Data
Storing Your Research Data
 
Research Data & Digital Preservation - CUWL Conference 2014
Research Data & Digital Preservation - CUWL Conference 2014Research Data & Digital Preservation - CUWL Conference 2014
Research Data & Digital Preservation - CUWL Conference 2014
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG Webinar
 
Electronic Laboratory Notebooks
Electronic Laboratory NotebooksElectronic Laboratory Notebooks
Electronic Laboratory Notebooks
 
Data Management Tips Handout
Data Management Tips HandoutData Management Tips Handout
Data Management Tips Handout
 
Data Management Plan Checklist
Data Management Plan ChecklistData Management Plan Checklist
Data Management Plan Checklist
 
Data Services
Data ServicesData Services
Data Services
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
 
Electronic Lab Notebooks
Electronic Lab NotebooksElectronic Lab Notebooks
Electronic Lab Notebooks
 

Dernier

WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 

Dernier (20)

WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

Data Management 101

  • 1. Data Management 101 Kristin Briney, PhD Data Services Librarian
  • 2. Do You Still Have Your Data? • What if your hard drive crashes? • What if you are accused of fraud? • What if your collaborator abruptly quits? • What if the building burns down? • What if you need to use your old data? • What if your backup fails? • What if your computer gets stolen? • What if…
  • 3. What Are Data? • Observational – Sensor data, telemetry, survey data, sample data, images • Experimental – Gene sequences, chromatograms, toroid magnetic field data • Simulation – Climate models, economic models • Derived or compiled – Text and data mining, compiled database, 3D models, data gathered from public documents
  • 4. Why Data Management? • Don’t loose data • Find data more easily – Especially if you need older data • Easier to analyze organized, documented data • Avoid accusations of fraud & misconduct • Get credit for your data • Don’t drown in irrelevant data
  • 5. For each minute of planning at beginning of a project, you will save 10 minutes of headache later
  • 6. What This Session Covers • Introduction to a few topics in data management – File organization conventions – Documentation – Storage and backups
  • 7. What This Session Covers • Hands-on exercises in each topic • My goal is to offer practical, usable solutions – Recognize that I can’t cover everything
  • 8. Introduce Yourself! • Name • Department • Most common data format – Text, Excel, SPSS, Google Docs, etc.
  • 10. File Naming Conventions • Make it easier to find files • Avoid many duplicates – Especially when you’re not sure which is the latest or most correct! • Make it easier to wrap up a project because you know which files belong to it!
  • 11. File Naming Conventions • Files should be named consistently • Files names should be descriptive but short (<25 characters) • Use underscores instead of spaces • Avoid these characters: “ / : * ? ‘ < > * + & $ • Use the file dating convention: YYYY-MM-DD – This works well with a lab notebook
  • 12. File Versioning • Why? – If you only have one copy and you make a mistake… – If your data is stored in multiple locations
  • 13. File Versioning • For analyzed data, use version numbers • Save files often to a new version • Label the final version FINAL • For code, consider GIT or SVN
  • 14. File Organization • Any system is better than none • Possibilities – One project, one folder (for small projects) – Separate folders for data or project stages – Separate folders for different types of data – Date-based folders (pairs well with a lab notebook!)
  • 15. What To Avoid • One person data hoards • Data scattered across several machines – Not backups! Backups are fine • Storage that doesn’t mirror “ownership” – If it’s communal, it belongs in a communal place – If data collection happens on an individual’s machine, that doesn’t mean the data should stay there!
  • 16. Document Your Conventions • No point to have a system without documentation – README.txt • Use .txt over .doc because it’s more durable – Front cover of research notebook – A printout by the computer – Etc.
  • 17. Document Your Conventions • In project-wide README.txt – Basic project information • Title • Contributors • Grant info • etc. – Contact information for at least one person – All locations where data live, including backups – Useful information about the files and how they’re organized
  • 18. Exercise: File Naming Conventions • Develop a file naming convention for your most common data type
  • 20. What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them?
  • 21. Documentation • Consider the differences between – someone inside your lab – someone outside your lab but in your field – someone outside your field • Two parts: metadata and methods
  • 22. Documentation Methods • How the data were gathered • How the data should be interpreted • What you did – Limitations on what you did • …build trust in your data Metadata • What you’re looking at • Who made it and when • How it got there • What it means and • What you can do with it • …before you even look at the file
  • 23. Methods • Examples of methods to document – Code – Survey – Codebook – Data dictionary – Anything that lets someone reproduce your results • Don’t forget the units!
  • 24. Metadata • Informal and formal description of data • Informal standard can fit your unique research • Benefits of a formal standard – Completeness – Aids in sharing – Often required for deposit into a repository • May be required by your funder
  • 25. Metadata • Tons of formal standards available across many, many disciplines • Consult – Disciplinary repository – Your peers – Subject librarian – Data Services Librarian
  • 26. Metadata • Decide on a metadata standard before you collect the data! – Easier to record metadata when collecting data than to convert later • Standard or no, keep metadata CONSISTENTLY and COMPUTABLY whenever you can
  • 27. Metadata Standard: Dublin Core • contributor • coverage • creator • date • description • format • identifier • language • publisher • relation • rights • source • subject • title • type
  • 28. Metadata Example • Contributor – Jane Collaborator • Creator – Kristin Briney • Date – 2013 Apr 15 • Description – A microscopy image of cancerous breast tissues under 20x zoom. This image is my control, so it has only the standard staining describe on 2013 Feb 2 in my notebook. • Format – JPEG • Identifier – IMG00057.jpg • Relation – Same sample as images IMG00056.jpg and IMG00055.jpg • Subject – Breast cancer • Title – Cancerous breast tissue control
  • 29. Exercise: Documentation • For your most common data type, make a list of the most important information to record for each dataset
  • 31. A Note on Security • Does your data fall under the following? – HIPAA • Health information – FERPA • Student information – FISMA • Government subcontractor – Human subject research, etc.  Ask for help!
  • 32. A Note on Security • Secure storage • Controlled access • De-identification of personal information • Security training
  • 33. UWM Security Resources • UWM Information Security Office – Visit: https://www4.uwm.edu/itsecurity/ – Email: infosec@uwm.edu • Certificate in Information Security • HIPAA – https://www4.uwm.edu/legal/hipaa/index.cfm • FERPA – http://www4.uwm.edu/academics/ferpa.cfm
  • 34. Storage • Library motto: Lots of Copies Keeps Stuff Safe! • Rule of 3: 2 onsite, 1 offsite • Storage run by experts is more reliable than storage you run yourself – It costs more, but that’s for a reason
  • 35. Storage Options • Computer • USB/flash drive • CDs/DVDs • External hard drive • Shared drives/servers • Tape backup • Cloud storage
  • 36. Your Computer • You’re using it, but should you keep data on it? – What happens if you lose it? – What happens if it is stolen? – What happens if it breaks? – Will the data stay there as long as you are required to keep them? • Don’t be disorganized • Don’t keep sensitive data here • Verdict: By itself it is not enough
  • 37. USB/Flash Drive • Pros – Small, convenient package – Big enough for a wide variety of datasets • Cons – Will you remember to back your data up onto it? – Easy to lose – Easy to perpetuate out-of-date copies • Verdict: good for data transport, but not for backup
  • 38. CD-ROMs/DVD-ROMs • Pros – More reliable (but if one does fail, you won’t know until it’s too late) – Portable • Cons – Will you remember to back your data up onto it? – Hassle to deal with – Slow to write to – Difficult to keep track of old copies • Verdict: Not good for quick backup, and just okay for periodic offsite backup
  • 39. External Hard Drive • Pros – Relatively cheap – Large storage capacity • • Cons – You have to set up, maintain, and audit it yourself – Some brands are less reliable – Disorganization a problem • Verdict: Coupled with automatic-backup software, an okay choice for onsite backup – You’ll still want a second backup offsite
  • 40. Shared Drives/Servers • Pros – Keeps data off your easily-stolen laptop – Not your problem to manage – Shared costs typically mean lower costs • Cons – Who’s managing the thing? Are they competent? – Can have storage quotas – Can be hard to get to outside the lab or the office • Verdict: If well-managed, a good choice for regular use, onsite, or offsite backup – Beware the dusty Linux box under the desk!
  • 41. Tape Backup • Pros – Can happen near-invisibly – Highly reliable – Tolerably secure (not always on network) • Cons – Can be hard or slow to get data back – Not always audited as often as they should be • Verdict: Good for onsite or offsite backup, if somebody else is running them and you know they’re regularly audited
  • 42. Cloud Storage • Pros – Convenient syncing – Cheap – If client-side encryption is involved, decently secure • Cons – Required network connection • Possible security risks and inconvenience if off-network – Ongoing (and out of your control) costs – Your backup is hostage to their business risks – Reliability, security, privacy not guaranteed • Verdict: For savvy shoppers, fine for offsite backup. A little risky for your only backup.
  • 43. Exercise: Storage • Conduct a quick inventory of your data – What datasets do you have? – How big are they? • Inventory where your files are currently stored, including backups. How safe are your data?
  • 44. Backups • Any backup is better than none • Automatic backup is better than manual • Your research is only as safe as your backup plan – Lots of horror stories here
  • 45. Ideal Backup Characteristics • Low effort • High reliability • As secure as necessary – Tradeoffs between security and convenience • As open as possible to collaborators • Well organized
  • 46. Check Your Backups • Backups only as good as ability to recover data • Test your backups periodically – Preferably a fixed schedule – 1 or 2 times a year may be enough – Bigger/more complex data should be checked more often • Test your backup whenever you change things
  • 47. A Final Note • Must retain data at least 3 years post-project per OMB Circular A-110 – Better to retain for >6 years • Consider letting someone else worry about this – A disciplinary repository – The UWM Digital Commons
  • 48. Exercise: Backups • Sketch out your ideal backup system, and identify the first step in getting to there from your current system.
  • 49. WHERE TO GO FROM HERE
  • 50. Where to Go from Here • Talk to your coworkers – …but be aware you might not be able to change things – Discuss • Common schemes for metadata and file naming • Centralized documentation • Robust backup • Use good practices and be a model for others
  • 51. UWM Resources • Data management resources – dataplan.uwm.edu • Information Security Office – www4.uwm.edu/itsecurity/ • Data Services Librarian – Kristin Briney, briney@uwm.edu
  • 52. Thank You! • This presentation available under a Creative Commons Attribution (CC-BY) license • Some content courtesy of Dorothea Salo – http://dsalo.info/ – http://www.graduateschool.uwm.edu/research/researcher -central/proposal-development/data-plan/boot-camp/

Notes de l'éditeur

  1. PantherFILE, but limited space