SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Avoiding DATApocalypse!

        Laura Guy
       ENUG 2011
Overview
•   The What and Why of Research Data
•   A Data Sharing Revolution
•   Important Questions
•   Data Management
•   A Word (or Two) About Documentation
•   Avoiding DATApocalypse
THE WHAT AND WHY OF
RESEARCH DATA
Something’s happening here…

• Are you managing research data

OR...

• Should you be managing research data1
 1   Because the NSF told you so
What it is ain’t exactly clear…
• What’s this all about?
• What’s the best way to do it?
• Are you doing it properly?
What are Research Data?
“The recorded factual material commonly
accepted in the scientific community as
necessary to validate research findings."
(OMB Circular 110)
One Possible Definition
“Research data means data in the form of facts,
observations, images, computer program results,
recordings, measurements or experiences on
which an argument, theory, test or hypothesis, or
another research output is based. Data may be
numerical, descriptive, visual or tactile. It may be
raw, cleaned or processed, and may be held in
any format or media.” (The Queensland University
of Technology)
What aren’t Research Data?
•   Preliminary data amd analyses
•   Drafts of scientific papers
•   Plans for future research
•   Peer reviews
•   Communications with colleagues
•   Administrative data (treated independently)
•   Research publications (dealt with
    elsewhere)
Why Manage Research Data?
• Funding agency requirement (aka: NSF Data
  Management Plan)
• Cost effective
• Make things easier during the research project
• Data are fragile! Can be changed, corrupted, altered
• So it doesn’t go missing
• To avoid charges of fraud, bad science
• Share data with others
• Get proper credit for creating them
• Prevent chaos at the end of the project
A DATA SHARING REVOLUTION
The Times They Are a-Changin'

• Research data have always been valuable
• There has always been re-use (ICPSR,
  Census Bureau, etc.)
• The 2010 NSF Notice “Dissemination and
  Sharing of Research Results” upped the
  ante
• Other funders and sponsors are
  recognizing the importance of well-curated
  data and following suit
A (Digital) Revolution
• Advanced technologies make it easier, cheaper to share
  as do open data, open access, open source initiatives
• Publications are still important, but credit for producing
  data is also good!
• Cost effectiveness is the name of the game! (especially
  for the Feds, but private funders care, too)
• As funding money gets scarcer, reusable data become
  more and more valuable
• Besides, graduate students have always needed data for
  secondary analysis!
• Good data management habits at the start of a project
  will assist EVERYONE later
Data Sharing Rocks!
• Piwowar, Heather et al. "Sharing detailed
  research data is associated with increased
  citation rate.“
  http://www.plosone.org/article/info:doi%2F10.13
  71%2Fjournal.pone.0000308
• “Sharing of Data Leads to Progress on
  Alzheimer’s”
  http://www.nytimes.com/2010/08/13/health/resea
  rch/13alzheimer.html
• And then there's the Japan earthquake... (could
  prompt data sharing have helped?)
Data Sharing Sucks
• Recalcitrant Researchers
• Where’s the money going to come from for
  staff, technology?
• Need new policies, new procedures
• Who’s responsible?
• Shear volume (est: 1.2 zettabytes in 2010)
• How many of these data sets are actually
  going to be reused? (And should you care?)
IMPORTANT QUESTIONS
A Fistful of Questions
• What research data are being collected?
• How many active researchers are on your
  campus? How many research projects?
• How much data are out there? How fast are
  they growing?
• Who owns the data?
• What types of data are being collected
  (simulations? surveys? experiments?
  derived/data-mined? Etc.)?
• What file formats are being used?
And a Few Questions More…
• If those data were to be lost, how expensive
  would it be to recreate them (if even possible)?
• What infrastructure is in place to: protect data
  during research projects, and
  secure/archive/preserve them after?
• What infrastructure is in place to collect,
  organize, describe and provide access to
  research data?
Who’s the Audience?
•   The original researcher!
•   His/her colleagues?
•   Other researchers in the field?
•   Cross-disciplinary use?
•   Policy makers?
•   Students?
•   The Press?
•   "Concerned Citizens"?
What are the Responsibilities?
•   Funder?
•   Audience?
•   Respondents (Confidentiality, Sensitivity)?
•   Security?
•   Copyright?
•   Intellectual Property?
•   Embargo?
•   Forever Dark?
What About Retention?
•   How long do data need to be retained?
•   Three years?
•   Five years?
•   One hundred years?
•   Forever? (And BTW, what is “forever”?)
•   By definition retention includes the secure
    destruction of data
DATA MANAGEMENT
Data Management Planning
• Do you have policies in place?
• What about money? Staff? Tech?
• What are the current best practices?
• What tools/resources are available (there
  are loads of them! Maybe too many!)
• Planning is important…
• …but so is staying flexible and scalable
• “On-the-fly” is probably not a good thing
What’s a Data Management Plan?
• Many sponsors (like the NSF) require Data
  Management Plans (DMP)
• A good DMP enables data to retain their
  value during and after the research project
• A DMP describes the data that will be
  created and how they will be managed
  and made accessible throughout their
  entire lifetime
DMP During a Research Project
• Who’s responsible for the data? The
  documentation?
• How are they being stored?
• What about versioning? Backups?
• Protections? Encryption? Firewalls?
• Who’s responsible for preparing data for
  sharing?
LOCKSS!
• Lots Of Copies Keeps Stuff Safe
• Need multiple copies and offsite copies
• Need to store the copies securely
• If data contain confidential or sensitive
  information, security becomes even more
  critical
• Basic truth: the best way to protect data is
  to limit access to it
DMP After a Project Ends
• Preparation of data, metadata
• Long-term preservation and accessibility
• Curators, I.T. Professionals, and
  Researchers all work together
• Partners should be identified:
  – Library/Campus I.T., Institutional Repository
  – Disciplinary Data Repository where like data are
    stored together (e.g., ICPSR for social science
    data, GenBank for genetic sequencing,
    DataONE for Earth observational data)
Data Ownership
• Sharing involves making reuse rights
  clear. If they are ambiguous, who’d want
  to use them?
• Ownership, possession and right to
  publish can be complicated issues
• Many datasets aren’t copyrightable
• Europe does things differently!
• Get the details hashed out early
• Work with your legal folks
Durable Data
• When possible, use common formats,
  non-proprietary systems, migratable
  standards
• The best are open, standardized,
  documented, in wide use and easy to work
  with (analyze, transform, etc.)
• What is best for your potential audience?
• File formats can change!
• You need to think about storage media,
  too
A WORD (OR TWO) ABOUT
DOCUMENTATION
Data Documentation
• WHAT is required for someone to identify,
  evaluate, understand and reuse the data?
  – Data content (Codebook, Data Dictionary)
  – Data collection methods, frequency,
    instrumentation
  – Data limitations
  – Dataset provenance
  – Methods used for derived data creation
Minimal Metadata Requirements
• About the project:
   – Title, people, key dates, funders and grants
• About the data:
   – Title, key dates, creator(s), subjects, rights,
     included files, format(s), versions, checksums
• Interpretive aids:
   – Codebooks, data dictionaries, algorithms,
     code
Metadata Schema
There are many metadata schema already out there.
They'll save you time and effort!

•   Astronomy Visualization Metadata Standard
•   Content Standard for Digital Geospatial Metadata
•   Darwin Core
•   Data Documentation Initiative
•   Dublin Core
•   Ecological Metadata Language
•   Directory Interchange Format
AVOIDING DATAPOCALYPSE
Avoiding DATApocalyse
• Start Data Management Planning
  – Do it soon
  – Use Common Sense
  – Talk to and get buy-in from your stakeholders
  – Keep it simple
  – Keep it flexible and scalable
  – Lots of examples out there; You needn’t re-
    invent the wheel
  – Remember the “Virtual Team Model”
•   Definition of Research Data
•   Description of project (purpose of research, staff)
•   Description of data (type, format, methodology)
•   Applicable format, metadata, etc. standards
•   Short-term storage, backup, security plan
•   Legal and ethical issues (confidentiality,
    intellectual property, etc.)
•   Access policies and provisions (restrictions)
•   Long-term archiving and preservation
•   Retention period
•   Parties responsible for data management during
    the project, after the project ends, and who is
    responsible for disposing of the data if necessary
A Few Good Resources
•   ICPSR
•   CIESIN
•   ARL
•   DataONE
•   Digital Curation Centre
•   UK Data Archive
•   Australian National University / Data Service
•   MIT, Cornell, UCSD, etc.
NSF Dissemination and Access
“Investigators are expected to share with
other researchers, at no more than
incremental cost and within a reasonable
time, the primary data, samples, physical
collections and other supporting materials
created or gathered in the course of work
under NSF grants. Grantees are expected to
encourage and facilitate such sharing.”
Guy avoiding-dat apocalypse

Contenu connexe

Tendances

Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Data presentation and transfer
Data presentation and transferData presentation and transfer
Data presentation and transferIyad Abou Rabii
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital HumanitiesThea Atwood
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Datakfear
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansSherry Lake
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data ManagementUW Research Data Services
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data ManagmentDaniel Crane
 

Tendances (20)

Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
Data presentation and transfer
Data presentation and transferData presentation and transfer
Data presentation and transfer
 
Data management plans
Data management plansData management plans
Data management plans
 
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital Humanities
 
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to Librarians
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data Managment
 

En vedette

Collins whats buggingyou-aleph
Collins whats buggingyou-alephCollins whats buggingyou-aleph
Collins whats buggingyou-alephENUG
 
Jones aleph acqorders
Jones aleph acqordersJones aleph acqorders
Jones aleph acqordersENUG
 
Enug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalEnug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalENUG
 
Jaffe o'brien-rosetta
Jaffe o'brien-rosettaJaffe o'brien-rosetta
Jaffe o'brien-rosettaENUG
 
Bischof custom-circ-ov
Bischof custom-circ-ovBischof custom-circ-ov
Bischof custom-circ-ovENUG
 
Wenger sf xin-barton
Wenger sf xin-bartonWenger sf xin-barton
Wenger sf xin-bartonENUG
 
Wagner whats buggingyou-voyager
Wagner whats buggingyou-voyagerWagner whats buggingyou-voyager
Wagner whats buggingyou-voyagerENUG
 

En vedette (8)

BUSQUEDA EN INTERNET
BUSQUEDA EN INTERNETBUSQUEDA EN INTERNET
BUSQUEDA EN INTERNET
 
Collins whats buggingyou-aleph
Collins whats buggingyou-alephCollins whats buggingyou-aleph
Collins whats buggingyou-aleph
 
Jones aleph acqorders
Jones aleph acqordersJones aleph acqorders
Jones aleph acqorders
 
Enug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalEnug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-final
 
Jaffe o'brien-rosetta
Jaffe o'brien-rosettaJaffe o'brien-rosetta
Jaffe o'brien-rosetta
 
Bischof custom-circ-ov
Bischof custom-circ-ovBischof custom-circ-ov
Bischof custom-circ-ov
 
Wenger sf xin-barton
Wenger sf xin-bartonWenger sf xin-barton
Wenger sf xin-barton
 
Wagner whats buggingyou-voyager
Wagner whats buggingyou-voyagerWagner whats buggingyou-voyager
Wagner whats buggingyou-voyager
 

Similaire à Guy avoiding-dat apocalypse

20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research DataKristin Briney
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-awardMartin Donnelly
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Anna Maria Tammaro
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management Rachel Di Cresce
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Managing Research Data in the Life Sciences
Managing Research Data in the Life SciencesManaging Research Data in the Life Sciences
Managing Research Data in the Life Sciencesalwerhane
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Research data lifecycle diagram
Research data lifecycle diagramResearch data lifecycle diagram
Research data lifecycle diagramSteven Cracknell
 

Similaire à Guy avoiding-dat apocalypse (20)

20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-award
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Data management plans
Data management plansData management plans
Data management plans
 
Managing Research Data in the Life Sciences
Managing Research Data in the Life SciencesManaging Research Data in the Life Sciences
Managing Research Data in the Life Sciences
 
Rdm slides march 2014
Rdm slides march 2014Rdm slides march 2014
Rdm slides march 2014
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Research data lifecycle diagram
Research data lifecycle diagramResearch data lifecycle diagram
Research data lifecycle diagram
 

Plus de ENUG

Yang enhance-voyager-user-innovations
Yang enhance-voyager-user-innovationsYang enhance-voyager-user-innovations
Yang enhance-voyager-user-innovationsENUG
 
Yang hofmann-next generationcatalogforenug
Yang hofmann-next generationcatalogforenugYang hofmann-next generationcatalogforenug
Yang hofmann-next generationcatalogforenugENUG
 
Oneal perl-code-to-extract-from-voyager
Oneal perl-code-to-extract-from-voyagerOneal perl-code-to-extract-from-voyager
Oneal perl-code-to-extract-from-voyagerENUG
 
Schwartz ez proxy-logs
Schwartz ez proxy-logsSchwartz ez proxy-logs
Schwartz ez proxy-logsENUG
 
Moulen digital bookplates
Moulen digital bookplatesMoulen digital bookplates
Moulen digital bookplatesENUG
 
Moulen aleph update
Moulen aleph updateMoulen aleph update
Moulen aleph updateENUG
 
Callahan princetonenug2011
Callahan princetonenug2011Callahan princetonenug2011
Callahan princetonenug2011ENUG
 
Baksik3 enug baksik_xmlinvoice
Baksik3 enug baksik_xmlinvoiceBaksik3 enug baksik_xmlinvoice
Baksik3 enug baksik_xmlinvoiceENUG
 
Baksik2 enug baksik_ebookplates
Baksik2 enug baksik_ebookplatesBaksik2 enug baksik_ebookplates
Baksik2 enug baksik_ebookplatesENUG
 
Baksik1 enug baksik_rest
Baksik1 enug baksik_restBaksik1 enug baksik_rest
Baksik1 enug baksik_restENUG
 
O neal columbia
O neal columbiaO neal columbia
O neal columbiaENUG
 
Moulen batch loadingebookspdf
Moulen batch loadingebookspdfMoulen batch loadingebookspdf
Moulen batch loadingebookspdfENUG
 

Plus de ENUG (12)

Yang enhance-voyager-user-innovations
Yang enhance-voyager-user-innovationsYang enhance-voyager-user-innovations
Yang enhance-voyager-user-innovations
 
Yang hofmann-next generationcatalogforenug
Yang hofmann-next generationcatalogforenugYang hofmann-next generationcatalogforenug
Yang hofmann-next generationcatalogforenug
 
Oneal perl-code-to-extract-from-voyager
Oneal perl-code-to-extract-from-voyagerOneal perl-code-to-extract-from-voyager
Oneal perl-code-to-extract-from-voyager
 
Schwartz ez proxy-logs
Schwartz ez proxy-logsSchwartz ez proxy-logs
Schwartz ez proxy-logs
 
Moulen digital bookplates
Moulen digital bookplatesMoulen digital bookplates
Moulen digital bookplates
 
Moulen aleph update
Moulen aleph updateMoulen aleph update
Moulen aleph update
 
Callahan princetonenug2011
Callahan princetonenug2011Callahan princetonenug2011
Callahan princetonenug2011
 
Baksik3 enug baksik_xmlinvoice
Baksik3 enug baksik_xmlinvoiceBaksik3 enug baksik_xmlinvoice
Baksik3 enug baksik_xmlinvoice
 
Baksik2 enug baksik_ebookplates
Baksik2 enug baksik_ebookplatesBaksik2 enug baksik_ebookplates
Baksik2 enug baksik_ebookplates
 
Baksik1 enug baksik_rest
Baksik1 enug baksik_restBaksik1 enug baksik_rest
Baksik1 enug baksik_rest
 
O neal columbia
O neal columbiaO neal columbia
O neal columbia
 
Moulen batch loadingebookspdf
Moulen batch loadingebookspdfMoulen batch loadingebookspdf
Moulen batch loadingebookspdf
 

Dernier

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 

Dernier (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Guy avoiding-dat apocalypse

  • 1. Avoiding DATApocalypse! Laura Guy ENUG 2011
  • 2. Overview • The What and Why of Research Data • A Data Sharing Revolution • Important Questions • Data Management • A Word (or Two) About Documentation • Avoiding DATApocalypse
  • 3. THE WHAT AND WHY OF RESEARCH DATA
  • 4. Something’s happening here… • Are you managing research data OR... • Should you be managing research data1 1 Because the NSF told you so
  • 5. What it is ain’t exactly clear… • What’s this all about? • What’s the best way to do it? • Are you doing it properly?
  • 6. What are Research Data? “The recorded factual material commonly accepted in the scientific community as necessary to validate research findings." (OMB Circular 110)
  • 7. One Possible Definition “Research data means data in the form of facts, observations, images, computer program results, recordings, measurements or experiences on which an argument, theory, test or hypothesis, or another research output is based. Data may be numerical, descriptive, visual or tactile. It may be raw, cleaned or processed, and may be held in any format or media.” (The Queensland University of Technology)
  • 8. What aren’t Research Data? • Preliminary data amd analyses • Drafts of scientific papers • Plans for future research • Peer reviews • Communications with colleagues • Administrative data (treated independently) • Research publications (dealt with elsewhere)
  • 9. Why Manage Research Data? • Funding agency requirement (aka: NSF Data Management Plan) • Cost effective • Make things easier during the research project • Data are fragile! Can be changed, corrupted, altered • So it doesn’t go missing • To avoid charges of fraud, bad science • Share data with others • Get proper credit for creating them • Prevent chaos at the end of the project
  • 10. A DATA SHARING REVOLUTION
  • 11. The Times They Are a-Changin' • Research data have always been valuable • There has always been re-use (ICPSR, Census Bureau, etc.) • The 2010 NSF Notice “Dissemination and Sharing of Research Results” upped the ante • Other funders and sponsors are recognizing the importance of well-curated data and following suit
  • 12. A (Digital) Revolution • Advanced technologies make it easier, cheaper to share as do open data, open access, open source initiatives • Publications are still important, but credit for producing data is also good! • Cost effectiveness is the name of the game! (especially for the Feds, but private funders care, too) • As funding money gets scarcer, reusable data become more and more valuable • Besides, graduate students have always needed data for secondary analysis! • Good data management habits at the start of a project will assist EVERYONE later
  • 13. Data Sharing Rocks! • Piwowar, Heather et al. "Sharing detailed research data is associated with increased citation rate.“ http://www.plosone.org/article/info:doi%2F10.13 71%2Fjournal.pone.0000308 • “Sharing of Data Leads to Progress on Alzheimer’s” http://www.nytimes.com/2010/08/13/health/resea rch/13alzheimer.html • And then there's the Japan earthquake... (could prompt data sharing have helped?)
  • 14. Data Sharing Sucks • Recalcitrant Researchers • Where’s the money going to come from for staff, technology? • Need new policies, new procedures • Who’s responsible? • Shear volume (est: 1.2 zettabytes in 2010) • How many of these data sets are actually going to be reused? (And should you care?)
  • 16. A Fistful of Questions • What research data are being collected? • How many active researchers are on your campus? How many research projects? • How much data are out there? How fast are they growing? • Who owns the data? • What types of data are being collected (simulations? surveys? experiments? derived/data-mined? Etc.)? • What file formats are being used?
  • 17. And a Few Questions More… • If those data were to be lost, how expensive would it be to recreate them (if even possible)? • What infrastructure is in place to: protect data during research projects, and secure/archive/preserve them after? • What infrastructure is in place to collect, organize, describe and provide access to research data?
  • 18. Who’s the Audience? • The original researcher! • His/her colleagues? • Other researchers in the field? • Cross-disciplinary use? • Policy makers? • Students? • The Press? • "Concerned Citizens"?
  • 19. What are the Responsibilities? • Funder? • Audience? • Respondents (Confidentiality, Sensitivity)? • Security? • Copyright? • Intellectual Property? • Embargo? • Forever Dark?
  • 20. What About Retention? • How long do data need to be retained? • Three years? • Five years? • One hundred years? • Forever? (And BTW, what is “forever”?) • By definition retention includes the secure destruction of data
  • 22. Data Management Planning • Do you have policies in place? • What about money? Staff? Tech? • What are the current best practices? • What tools/resources are available (there are loads of them! Maybe too many!) • Planning is important… • …but so is staying flexible and scalable • “On-the-fly” is probably not a good thing
  • 23. What’s a Data Management Plan? • Many sponsors (like the NSF) require Data Management Plans (DMP) • A good DMP enables data to retain their value during and after the research project • A DMP describes the data that will be created and how they will be managed and made accessible throughout their entire lifetime
  • 24. DMP During a Research Project • Who’s responsible for the data? The documentation? • How are they being stored? • What about versioning? Backups? • Protections? Encryption? Firewalls? • Who’s responsible for preparing data for sharing?
  • 25. LOCKSS! • Lots Of Copies Keeps Stuff Safe • Need multiple copies and offsite copies • Need to store the copies securely • If data contain confidential or sensitive information, security becomes even more critical • Basic truth: the best way to protect data is to limit access to it
  • 26. DMP After a Project Ends • Preparation of data, metadata • Long-term preservation and accessibility • Curators, I.T. Professionals, and Researchers all work together • Partners should be identified: – Library/Campus I.T., Institutional Repository – Disciplinary Data Repository where like data are stored together (e.g., ICPSR for social science data, GenBank for genetic sequencing, DataONE for Earth observational data)
  • 27. Data Ownership • Sharing involves making reuse rights clear. If they are ambiguous, who’d want to use them? • Ownership, possession and right to publish can be complicated issues • Many datasets aren’t copyrightable • Europe does things differently! • Get the details hashed out early • Work with your legal folks
  • 28. Durable Data • When possible, use common formats, non-proprietary systems, migratable standards • The best are open, standardized, documented, in wide use and easy to work with (analyze, transform, etc.) • What is best for your potential audience? • File formats can change! • You need to think about storage media, too
  • 29. A WORD (OR TWO) ABOUT DOCUMENTATION
  • 30. Data Documentation • WHAT is required for someone to identify, evaluate, understand and reuse the data? – Data content (Codebook, Data Dictionary) – Data collection methods, frequency, instrumentation – Data limitations – Dataset provenance – Methods used for derived data creation
  • 31. Minimal Metadata Requirements • About the project: – Title, people, key dates, funders and grants • About the data: – Title, key dates, creator(s), subjects, rights, included files, format(s), versions, checksums • Interpretive aids: – Codebooks, data dictionaries, algorithms, code
  • 32. Metadata Schema There are many metadata schema already out there. They'll save you time and effort! • Astronomy Visualization Metadata Standard • Content Standard for Digital Geospatial Metadata • Darwin Core • Data Documentation Initiative • Dublin Core • Ecological Metadata Language • Directory Interchange Format
  • 34. Avoiding DATApocalyse • Start Data Management Planning – Do it soon – Use Common Sense – Talk to and get buy-in from your stakeholders – Keep it simple – Keep it flexible and scalable – Lots of examples out there; You needn’t re- invent the wheel – Remember the “Virtual Team Model”
  • 35. Definition of Research Data • Description of project (purpose of research, staff) • Description of data (type, format, methodology) • Applicable format, metadata, etc. standards • Short-term storage, backup, security plan • Legal and ethical issues (confidentiality, intellectual property, etc.) • Access policies and provisions (restrictions) • Long-term archiving and preservation • Retention period • Parties responsible for data management during the project, after the project ends, and who is responsible for disposing of the data if necessary
  • 36. A Few Good Resources • ICPSR • CIESIN • ARL • DataONE • Digital Curation Centre • UK Data Archive • Australian National University / Data Service • MIT, Cornell, UCSD, etc.
  • 37. NSF Dissemination and Access “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.”