SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Escaping
Datageddon
                                 Dorothea Salo
                                 Ryan Schryver
                        Graduate Support Series




 Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
Why are you here?

• You’re managing data (your own or your lab’s)
• Or you think you maybe should be
• You’re not sure why it matters
• You’re not sure how best to do it
• You’d like to know whether you’re on the right
  track
            Adapted from Graham et al. “Managing Research Data 101.” http://libraries.mit.edu/guides/subjects/data-management/
                                                                                   Managing_Research_Data_101_IAP_2010.pdf



                                                             Photo: Jaysin, http://www.flickr.com/photos/orijinal/3539418133/
Why manage data?
• To make your research easier!
• Because somebody else said so
  • Your lab PI
  • Your lab PI’s funder
• In case you need it later
• To avoid accusations of fraud or bad science
• To share it for others to use and learn from
• To get credit for producing it
• To keep from drowning in irrelevant stuff
  • ... especially at grant/project end
                           Photo: Shashi Bellamkonda, http://www.flickr.com/photos/drbeachvacation/2874078655/
Research is changing...
• Research datasets were second-class citizens.
  • Publications were all that mattered!
  • And publishing data in print was uneconomical even when possible.
  • So nobody saw anybody’s data.
• Data are now digital. The game changes!
  • Data are shared more, and more openly! Open Source, Open Access,
    Open Data.
  • There’s a lot still to be worked out about how to share, cite, credit, and
    license digital data.
  • But data will unquestionably matter to your research careers, more
    than it does to your advisors’ generation.
• Learn good data habits now! You’ll need
  them later.
                           Photo: Karl-Ludwig Poggemann, http://www.flickr.com/photos/hinkelstone/2435823037/
Did you know?
• Gene expression microarray data: “Publicly
  available data was significantly (p=0.006)
  associated with a 69% increase in citations,
  independently of journal impact factor, date
  of publication, and author country of origin.”
  • Piwowar, Heather et al. “Sharing detailed research data is associated
    with increased citation rate.” PLoS One 2010. DOI: 10.1371/
    journal.pone.0000308
• Maybe there’s an advantage here!


                                        Photo: ynse, http://www.flickr.com/photos/ynse/2341095044/
Did you see?
Did you see?
Did you see?
How to plan
to keep data



 Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
Step 1: Inventory

• What data are you collecting or making?
  • Observational, experimental, simulation? Raw, derived, compiled?
  • Can it be recreated? How much would that cost?
• How much of it? How fast is it growing? Does it change?
• What file format(s)?
• What’s your infrastructure for data collection and
  storage like?
  • How do you find it, or find what you’re looking for in it?
  • How easy is it to get new people up to speed? Or share data with others?


                                           Photo: Anssi Koskinen, http://www.flickr.com/photos/ansik/304526237/
Step 2: Needs
• Who are the audiences for your data?
  • You (including Future You), your lab colleagues (including future ones), your PIs
  • Disciplinary colleagues, at your institution or at others
  • Colleagues in allied disciplines
  • The world!
• What are your obligations to others?
  • Funder requirements
  • Confidentiality issues
  • IP questions
  • Security
• How long do you need to keep your data?

                              Photo: Celeste “Vitamin C9000,” http://www.flickr.com/photos/celestemarie/2193327230/
Step 3: Process planning
• How do you and your lab get from where
  you are to where you need to be?
• Document, document, document all decisions
  and all processes!
• Secret sauce: the more you strategize up-
  front, the less angst and panic later.
  • “Make it up as you go along” is very bad practice!
  • But the best-laid plans go agley... so be flexible.
  • And watch your field! Best practices are still in flux.


                                     Photo: Kevin Utting, http://www.flickr.com/photos/tallkev/256810217/
Things to
think about



Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
File formats
• Will anybody be able to read these files at
  the end of your time horizon?
• Where possible, prefer file formats that are:
  • Open, standardized
  • Documented
  • In wide use
  • Easy to data-mine, transform, recast
• If you need to transform data for durability,
  do it now, not later.


                                   Photo: Bart Everson, http://www.flickr.com/photos/editor/859824333/
Documentation

• Fundamental question: What would someone
  unfamiliar with your data need in order to
  find, evaluate, understand, and reuse them?
 • Consider the differences between someone inside your lab, someone
   outside your lab but in your field, and someone outside your field.
• Two parts: metadata and methods




                                  Photo: “striatic,” http://www.flickr.com/photos/striatic/2144933705/
Metadata

• About the project
  • Title, people, key dates, funders and grants
• About the data
  • Title, key dates, creator(s), subjects, rights, included files, format(s),
    versions, checksums
• Keep this with the data.



                                        Photo: Paul Downey, http://www.flickr.com/photos/psd/422206144/
Methods
• Reason #1 for not reusing someone else’s
  data: “I don’t know enough about how it was
  gathered to trust it.”
• Document what you did. (A published article
  may or may not be enough.)
• Document any limitations of what you did.
• If you ran code on the data, document the
  code and keep it with the data.
• Need a codebook? Or a data dictionary?
  • If I can’t identify at sight what each bit of your dataset means, yes, you
    do need a codebook or data dictionary.
  • DO NOT FORGET UNITS!
                                  Photo: Joe Sullivan, http://www.flickr.com/photos/skycaptaintwo/90415435/
Standards

• Why reinvent the wheel? If there’s a standard
  format for your data or how to describe it,
  use that!
• The tricky part is finding the right standard.
  • Standards are like toothbrushes...
  • But using standards is good hygiene!
  • Your librarian can often help you find relevant standards.



                                    Photo: Kenneth Lu, http://www.flickr.com/photos/toasty/412580888/
Where to put
  your data



 Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
Storage, short-term
• Your own drive (PC, server, flash drive, etc.)
  • And if you lose it? Or it breaks?
• Somebody else’s drive
  • Departmental drive
  • “Cloud” drive
  • Do they care as much about your data as you do?
• What about versioning?
• Library motto: Lots Of Copies Keeps Stuff Safe.
  • Two onsite copies, one offsite copy.
  • Keep confidentiality and security requirements in mind, of course.

                            Photo: Vadim Molochnikov, http://www.flickr.com/photos/molotalk/3305001454/
Storage, long-term
• No, gold CD-ROMs don’t cut it.
• If data need to persist beyond project end, you
  have to deal with a new kind of risk:
  organizational risk.
  • Servers come and go. So do labs. So do entire departments.
  • In the churn, your data may well be lost or destroyed.
  • This is especially important if you share data! Don’t let it 404!
• You need to find a trustworthy partner.
  • On campus: try the library.
  • Off campus: look for a disciplinary data repository, or a journal that accepts
    data. (It’s a good idea to do this as part of your planning process.)
  • Let somebody else worry! You have new projects to get on with.
                                Photo: Simon Davison, http://www.flickr.com/photos/suzanneandsimon/84038024/
Summing up



Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
So, these “data
      management plans...”
• Here’s what MIT suggests should be in them:
  • name of the person responsible for data management within your research project
  • description of data to be collected
  • how data will be documented
  • data quality issues
  • backup procedures
  • how data will be made available for public use and potential secondary uses
  • preservation plans
  • any exceptional arrangements that might be needed to protect participant
    confidentiality
• Feel like common sense now? Good.
                                          Source: http://libraries.mit.edu/guides/subjects/data-management/
Help on campus

• “What’s Your Data Plan?” website:
  http://dataplan.wisc.edu/
  • Use the contact page!
• Your department’s liaison librarian
  • We can help you find how-tos, relevant standards, on- and off-campus
    archiving services, etc.
• MINDS@UW: http://minds.wisconsin.edu/
  • Data in final form that make sense as discrete files.


                                Photo: Jordan Pérez Nobody, http://www.flickr.com/photos/jp-/2548073841/
Thank you!
    This presentation is available under a
Creative Commons 3.0 Attribution license.

If you reuse it, please remember to credit
                 the included photographs.

    Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/

Contenu connexe

Tendances

Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewOpenAIRE
 
Open Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Knowledge Maps
 
Organizing Your Research Data
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research DataKristin Briney
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management BarrierKristin Briney
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...DuraSpace
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Datakfear
 
Hacking the research process final version cil 2014
Hacking the research process final version   cil 2014Hacking the research process final version   cil 2014
Hacking the research process final version cil 2014Cheryl Peltier-Davis
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behaviorJames Howison
 
A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"Michael Nelson
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo Ciccarese
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...Paolo Ciccarese
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Jon Voss
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 

Tendances (20)

Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data Overview
 
Open Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the Humanities
 
Organizing Your Research Data
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research Data
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management Barrier
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 
Hacking the research process final version cil 2014
Hacking the research process final version   cil 2014Hacking the research process final version   cil 2014
Hacking the research process final version cil 2014
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynote
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Demography pro sem
Demography pro semDemography pro sem
Demography pro sem
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 

Similaire à Escaping Datageddon

Data citations: who cares?
Data citations:  who cares?Data citations:  who cares?
Data citations: who cares?Heather Piwowar
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research DataKristin Briney
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementKristin Briney
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our dataHeather Piwowar
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librariansC. Tobin Magle
 
Taming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsTaming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsDorothea Salo
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Conservation's Digital Landscape: one conservator's perspective
Conservation's Digital Landscape: one conservator's perspectiveConservation's Digital Landscape: one conservator's perspective
Conservation's Digital Landscape: one conservator's perspectiveNancie Ravenel
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 

Similaire à Escaping Datageddon (20)

Data citations: who cares?
Data citations:  who cares?Data citations:  who cares?
Data citations: who cares?
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Little eScience
Little eScienceLittle eScience
Little eScience
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our data
 
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Taming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsTaming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation Tools
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Conservation's Digital Landscape: one conservator's perspective
Conservation's Digital Landscape: one conservator's perspectiveConservation's Digital Landscape: one conservator's perspective
Conservation's Digital Landscape: one conservator's perspective
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
Data and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentivesData and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentives
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 

Plus de Dorothea Salo

Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)Dorothea Salo
 
Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Dorothea Salo
 
Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and librariesDorothea Salo
 
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)Dorothea Salo
 
Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Dorothea Salo
 
MARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesMARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesDorothea Salo
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly CommunicationDorothea Salo
 
Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Dorothea Salo
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing SerendipityDorothea Salo
 
RDF, RDA, and other TLAs
RDF, RDA, and other TLAsRDF, RDA, and other TLAs
RDF, RDA, and other TLAsDorothea Salo
 
I own copyright, so I pwn you!
I own copyright, so I pwn you!I own copyright, so I pwn you!
I own copyright, so I pwn you!Dorothea Salo
 
Librarians love data!
Librarians love data!Librarians love data!
Librarians love data!Dorothea Salo
 
Avoiding the Heron's Way
Avoiding the Heron's WayAvoiding the Heron's Way
Avoiding the Heron's WayDorothea Salo
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing SerendipityDorothea Salo
 

Plus de Dorothea Salo (20)

Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)
 
Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!
 
Encryption
EncryptionEncryption
Encryption
 
Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and libraries
 
Paying for it
Paying for itPaying for it
Paying for it
 
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
 
Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
 
MARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesMARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archives
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
FRBR and RDA
FRBR and RDAFRBR and RDA
FRBR and RDA
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly Communication
 
Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
 
What We Organize
What We OrganizeWhat We Organize
What We Organize
 
Occupy Copyright!
Occupy Copyright!Occupy Copyright!
Occupy Copyright!
 
RDF, RDA, and other TLAs
RDF, RDA, and other TLAsRDF, RDA, and other TLAs
RDF, RDA, and other TLAs
 
I own copyright, so I pwn you!
I own copyright, so I pwn you!I own copyright, so I pwn you!
I own copyright, so I pwn you!
 
Librarians love data!
Librarians love data!Librarians love data!
Librarians love data!
 
Avoiding the Heron's Way
Avoiding the Heron's WayAvoiding the Heron's Way
Avoiding the Heron's Way
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
 

Dernier

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Escaping Datageddon

  • 1. Escaping Datageddon Dorothea Salo Ryan Schryver Graduate Support Series Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
  • 2. Why are you here? • You’re managing data (your own or your lab’s) • Or you think you maybe should be • You’re not sure why it matters • You’re not sure how best to do it • You’d like to know whether you’re on the right track Adapted from Graham et al. “Managing Research Data 101.” http://libraries.mit.edu/guides/subjects/data-management/ Managing_Research_Data_101_IAP_2010.pdf Photo: Jaysin, http://www.flickr.com/photos/orijinal/3539418133/
  • 3. Why manage data? • To make your research easier! • Because somebody else said so • Your lab PI • Your lab PI’s funder • In case you need it later • To avoid accusations of fraud or bad science • To share it for others to use and learn from • To get credit for producing it • To keep from drowning in irrelevant stuff • ... especially at grant/project end Photo: Shashi Bellamkonda, http://www.flickr.com/photos/drbeachvacation/2874078655/
  • 4. Research is changing... • Research datasets were second-class citizens. • Publications were all that mattered! • And publishing data in print was uneconomical even when possible. • So nobody saw anybody’s data. • Data are now digital. The game changes! • Data are shared more, and more openly! Open Source, Open Access, Open Data. • There’s a lot still to be worked out about how to share, cite, credit, and license digital data. • But data will unquestionably matter to your research careers, more than it does to your advisors’ generation. • Learn good data habits now! You’ll need them later. Photo: Karl-Ludwig Poggemann, http://www.flickr.com/photos/hinkelstone/2435823037/
  • 5. Did you know? • Gene expression microarray data: “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.” • Piwowar, Heather et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI: 10.1371/ journal.pone.0000308 • Maybe there’s an advantage here! Photo: ynse, http://www.flickr.com/photos/ynse/2341095044/
  • 9. How to plan to keep data Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
  • 10. Step 1: Inventory • What data are you collecting or making? • Observational, experimental, simulation? Raw, derived, compiled? • Can it be recreated? How much would that cost? • How much of it? How fast is it growing? Does it change? • What file format(s)? • What’s your infrastructure for data collection and storage like? • How do you find it, or find what you’re looking for in it? • How easy is it to get new people up to speed? Or share data with others? Photo: Anssi Koskinen, http://www.flickr.com/photos/ansik/304526237/
  • 11. Step 2: Needs • Who are the audiences for your data? • You (including Future You), your lab colleagues (including future ones), your PIs • Disciplinary colleagues, at your institution or at others • Colleagues in allied disciplines • The world! • What are your obligations to others? • Funder requirements • Confidentiality issues • IP questions • Security • How long do you need to keep your data? Photo: Celeste “Vitamin C9000,” http://www.flickr.com/photos/celestemarie/2193327230/
  • 12. Step 3: Process planning • How do you and your lab get from where you are to where you need to be? • Document, document, document all decisions and all processes! • Secret sauce: the more you strategize up- front, the less angst and panic later. • “Make it up as you go along” is very bad practice! • But the best-laid plans go agley... so be flexible. • And watch your field! Best practices are still in flux. Photo: Kevin Utting, http://www.flickr.com/photos/tallkev/256810217/
  • 13. Things to think about Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
  • 14. File formats • Will anybody be able to read these files at the end of your time horizon? • Where possible, prefer file formats that are: • Open, standardized • Documented • In wide use • Easy to data-mine, transform, recast • If you need to transform data for durability, do it now, not later. Photo: Bart Everson, http://www.flickr.com/photos/editor/859824333/
  • 15. Documentation • Fundamental question: What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them? • Consider the differences between someone inside your lab, someone outside your lab but in your field, and someone outside your field. • Two parts: metadata and methods Photo: “striatic,” http://www.flickr.com/photos/striatic/2144933705/
  • 16. Metadata • About the project • Title, people, key dates, funders and grants • About the data • Title, key dates, creator(s), subjects, rights, included files, format(s), versions, checksums • Keep this with the data. Photo: Paul Downey, http://www.flickr.com/photos/psd/422206144/
  • 17. Methods • Reason #1 for not reusing someone else’s data: “I don’t know enough about how it was gathered to trust it.” • Document what you did. (A published article may or may not be enough.) • Document any limitations of what you did. • If you ran code on the data, document the code and keep it with the data. • Need a codebook? Or a data dictionary? • If I can’t identify at sight what each bit of your dataset means, yes, you do need a codebook or data dictionary. • DO NOT FORGET UNITS! Photo: Joe Sullivan, http://www.flickr.com/photos/skycaptaintwo/90415435/
  • 18. Standards • Why reinvent the wheel? If there’s a standard format for your data or how to describe it, use that! • The tricky part is finding the right standard. • Standards are like toothbrushes... • But using standards is good hygiene! • Your librarian can often help you find relevant standards. Photo: Kenneth Lu, http://www.flickr.com/photos/toasty/412580888/
  • 19. Where to put your data Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
  • 20. Storage, short-term • Your own drive (PC, server, flash drive, etc.) • And if you lose it? Or it breaks? • Somebody else’s drive • Departmental drive • “Cloud” drive • Do they care as much about your data as you do? • What about versioning? • Library motto: Lots Of Copies Keeps Stuff Safe. • Two onsite copies, one offsite copy. • Keep confidentiality and security requirements in mind, of course. Photo: Vadim Molochnikov, http://www.flickr.com/photos/molotalk/3305001454/
  • 21. Storage, long-term • No, gold CD-ROMs don’t cut it. • If data need to persist beyond project end, you have to deal with a new kind of risk: organizational risk. • Servers come and go. So do labs. So do entire departments. • In the churn, your data may well be lost or destroyed. • This is especially important if you share data! Don’t let it 404! • You need to find a trustworthy partner. • On campus: try the library. • Off campus: look for a disciplinary data repository, or a journal that accepts data. (It’s a good idea to do this as part of your planning process.) • Let somebody else worry! You have new projects to get on with. Photo: Simon Davison, http://www.flickr.com/photos/suzanneandsimon/84038024/
  • 22. Summing up Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/
  • 23. So, these “data management plans...” • Here’s what MIT suggests should be in them: • name of the person responsible for data management within your research project • description of data to be collected • how data will be documented • data quality issues • backup procedures • how data will be made available for public use and potential secondary uses • preservation plans • any exceptional arrangements that might be needed to protect participant confidentiality • Feel like common sense now? Good. Source: http://libraries.mit.edu/guides/subjects/data-management/
  • 24. Help on campus • “What’s Your Data Plan?” website: http://dataplan.wisc.edu/ • Use the contact page! • Your department’s liaison librarian • We can help you find how-tos, relevant standards, on- and off-campus archiving services, etc. • MINDS@UW: http://minds.wisconsin.edu/ • Data in final form that make sense as discrete files. Photo: Jordan Pérez Nobody, http://www.flickr.com/photos/jp-/2548073841/
  • 25. Thank you! This presentation is available under a Creative Commons 3.0 Attribution license. If you reuse it, please remember to credit the included photographs. Photo: Steve Punter, http://www.flickr.com/photos/spunter/2554405690/