SlideShare une entreprise Scribd logo
1  sur  48
Research Data Curation
Data documentation, organization, storage and sharing
Aaron Collie
Digital Curation Librarian
collie@msu.edu
Data Management. Isn’t that… trivial?
 Not so much. Data is a primary output of research; it is very
expensive to produce high quality data. Data may be collected
in nanoseconds, but it takes the expert application of
research protocol and design to generate quality data.
CC-BY-SA-3.0 Rob Lavinsky
CC-BY-SA-3.0 Rob
 To put that into perspective, consider data as the
product of an industry. Data is the output of a
process that generates higher orders of
understanding.
Wisdom
Knowledge
Information
Data
Understanding
is hierarchical!
Russell Ackoff
Data Industries
 In the academic sector that industry is called scholarly
communication.
 In the private sector that industry is called research &
development.
Data New
Product
Data Research
Article
Industry is changing
Multiauthor Papers: Onward and Upward - ScienceWatch Newsletter. (n.d.). Retrieved October
4, 2013, from http://archive.sciencewatch.com/newsletter/2012/201207/multiauthor_papers/ The demise of the lone author : Article : History
of the Journal Nature. (n.d.). Retrieved October
4, 2013, from
http://www.nature.com/nature/history/full/nat
ure06243.html
Science is always changing
• Thousand years ago:
science was empirical
describing natural phenomena
• Last few hundred years:
theoretical branch
using models, generalizations
• Last few decades:
a computational branch
simulating complex phenomena
• Today:
data exploration (eScience)
unify theory, experiment, and simulation
– Data captured by instruments
or generated by simulator
– Processed by software
– Information/Knowledge stored in computer
– Scientist analyzes database / files
using data management and statistics
2
2
2
.
3
4
a
cG
a
a












Slide credit: Gray, J. & Szalay, A. (11 January 2007). eScience Talk at NRC-CSTB meeting. http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
Research is now a team sport
(cc) SpoiltCat
This has been noticed.
NASA “promotes the full and open sharing of all data”
“…requires that data…be submitted to and archived by
designated national data centers.”
“…expects the timely release and sharing of final research
data"
"IMLS encourages sharing of research data."
“…should describe how the project team will manage and
disseminate data generated by the project”
“…must include a supplementary document of no more
than two pages labeled ‘Data Management Plan’.”
But why are we really here?
 Impetus: NSF has mandated that all grant applications
submitted after January 18th, 2011 must include a
supplemental “Data Management Plan”
 Effect: The original NSF mandate has had a domino effect, and
many funders now require or state guidelines for data
management of grant funded research
 Response: Data management has not traditionally received a
full treatment in (many) graduate and doctoral curricula;
intervention is necessary
Positive reinforcement….
 National Science Foundation Data Management
Plan mandate (January 18, 2011)
 Presidential Memorandum on Managing
Government Records (August 24, 2012)
 Managing Government Records Directive: All permanent
electronic records in Federal agencies will be managed
electronically to the fullest extent possible for eventual
transfer and accessioning by NARA in an electronic format.
Positive reinforcement… (cont.)
 White House policy memo (February 22, 2013)
 Increasing Access to the Results of Federally Funded Scientific
Research: Federal agencies with more than $100M in R&D
expenditures must develop plans to make the published results of
federally funded research freely available to the public within one year
of publication.
 OSTP policy memo (March 20, 2014)
 Improving the Management of and Access to Scientific Collections:
directs each Federal agency that owns, maintains, or otherwise
financially supports permanent scientific collections to develop a draft
scientific-collections management and access policy within six months.
Curation responsibilities (Carlson, The Chronicle, 2006)
“Data from Big Science is … easier to handle, understand and archive.
Small Science is horribly heterogeneous and far more vast. In time Small
Science will generate 2-3 times more data than Big Science.”
big science
data
small science data
institution?
domain?
MacColl, John (2010). The Role of libraries in data curation. RLG Partnership Annual Meeting, Chicago. June 2010
This is the engine of the academic industry…
So, things can get a little messy.
The scientific method “is often
misrepresented as a fixed
sequence of steps,” rather than
being seen for what it truly is,
“a highly variable and creative
process” (AAAS 2000:18).
Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)
The Research Depth Chart
Scientific Method
Research Design
Research Method
Research Tasks
MoreSpecificMoreGeneric
Problem
Identification
Study Concept
Literature
Review
Environmental
Scan
Funding &
Proposal
Research
Design
Research
Methodology
Research
Workflow
Hypothesis
Formation
Design
Validation
Research
Activity
Data
Management
Data
Organization
Data
Storage
Data
Description
Data Sharing
Scholarly
Communication
Report
Findings
Publish
Peer Review
Problem
Identification
Study Concept
Literature
Review
Environmental
Scan
Funding &
Proposal
Research
Design
Research
Methodology
Research
Workflow
Hypothesis
Formation
Design
Validation
Research
Activity
Data
Management
Data
Organization
Data
Storage
Data
Description
Data Sharing
Scholarly
Communication
Report
Findings
Publish
Peer Review
How does this apply to you?
 Data Management is an now an expect job skill.
 Especially in the research fields (“RDM”).
 Studies show that data management is not typically a
significant part of undergraduate or graduate curriculum(s).
 We have a causality dilemma!
What’s in it for you?
 Better organization for your classes
 Course Management: Angel / Desire2Learn
 Bibliographic Management: Zotero / Endnote / Mendelay
 File Management: Google Drive / Git / File-system
 Direct application to your career
 Data management is an “unnamed practice”
 Start now so you can this skill on your Resume or CV
 Academia is changing: big data is here
Course Management
http://help.d2l.msu.edu/
Bibliographic Management
http://classes.lib.msu.edu/
File Management
http://tech.msu.edu/storage/
RDM Systems
File Storage
File System
File Format
File Content
 File Systems
 Hierarchical
 Database Systems
 Hierarchical, Relational, or
Object Oriented
 Asset Management
Systems
 Combination of Database
and File System
o Project Documentation
o Process Documentation
o Data Documentation
o Sharing Data
o Publishing Data
o Archiving Data
Data
Management
Storage
Architecture
File
Management
Documentation
Practices
Access
Management
(cc)AlanCleaver(cc)WillScullin
o File Organization
o File Naming
o File Formats
o Storage Options
o Single points of failure
o Backup Strategy
o Storage Options
o Single points of failure
o Backup Strategy
Storage
Architecture
File Storage
File System
File Format
File Content
o Storage Options 
o Single points of failure
o Backup Strategy
Storage
Architecture
Optical Storage
• CD-ROM
• DVD-ROM
• Blu-ray Discs
Solid-State Storage
• USB Flash Drives
• Memory Cards
• “Internal Device Storage”
Magnetic Storage
• Internal Hard Drives
• External Hard Drives
• Tape Drives
Networked Storage
• Server and Web Storage
• Managed Networked Storage
• “Cloud Storage”
• Tape Libraries
Good practices for avoiding single points of error:
 Use managed networked storage whenever possible
 Move data off of portable media
 Never rely on one copy of data
 Do not rely on CD or DVD copies to be readable
 Be wary of software lifespans (e.g. Angel)
o Storage Options
o Single points of failure 
o Backup Strategy
Storage
Architecture
Limited “Task” Term Short “Project” Term Long “Life” Term
• Optical Media
• CD, DVD, Blu-ray
• Portable Flash Media
• USB Flash Drives
• Memory Cards
• Internal Memory
• Magnetic Storage
• Internal HD
• External HD
• Networked Storage
• Server/Web Space
• Cloud Storage
• Networked Storage
• Managed Network
• Magnetic Storage
• Tape Drives
Good practices for creating a backup strategy:
 Make 3 copies
 E.g. original + external/local + external/remote
 E.g. original + 2 formats on 2 drives in 2 locations
 Geographically distribute and secure
 Local vs. remote, depending on needed recovery time
 Know what resources are available to you: personal
computer, external hard drives, departmental, or
university servers may be used
o Storage Options
o Single points of failure
o Backup Strategy 
Storage
Architecture
o Project Documentation
o Process Documentation
o Data Documentation
o Sharing Data
o Publishing Data
o Archiving Data
Data
Management
Storage
Architecture
File
Management
Documentation
Practices
Access
Management
(cc)AlanCleaver(cc)WillScullin
o File Organization
o File Naming
o File Formats
o Storage Options
o Single points of failure
o Backup Strategy
o File Organization
o File Naming
o File Formats
File
Management
File Storage
File System
File Format
File Content
Create a file plan
 Better chance you will use a standard method when the time comes
 Simple organization is intuitive to team members and colleagues
 Reduces unsynchronized copies in personal drives and email
attachments
o File Organization 
o File Naming
o File Formats
File
Management
Utilize a file naming convention
 Create logical sequences for sorting through many files and versions
 Identify what you’re searching for by filename by using a primary term
 If not using a version control system, implement simple versioning
 It’s sort of like a tweet
 Should not exceed 255 characters for most modern operating systems
o File Organization
o File Naming 
o File Formats
File
Management
Example file names using simple version control: Primary term:
lakeLansing_waltM_fieldNotes_20091012_v002.doc location
OrgChart2009_petersK_20090101_d001.svg content
20110117_sharpeW_krillMicrograph_backscatter3_v002.tif date
borgesJ_collocation_20080414.xml person
Make an informed decision in selecting file formats
 It is important to choose platform and vendor-independent file
formats to ensure the best chance for future compatibility
 “Open” formats are often (but not always) supported broadly by a
community rather than individually by a company or vendor
o File Organization
o File Naming
o File Formats 
File
Management
Format Genre Great Not Bad Avoid
TEXT .txt; .odt; .xml; .html .pdf; .rtf; .docx .doc
AUDIO .flac; .wav .ogg; .mp3 .wma; .ra; .ram;
compression
VIDEO .mp2/.mp4, MKV .wmv; .mov; .avi; compression
IMAGE .tif; .png; .svg; .jpg .gif; .psd; compression
DATA .sql; .csv; .xml .xlsx .xls; proprietary DB formats
o Project Documentation
o Process Documentation
o Data Documentation
o Sharing Data
o Publishing Data
o Archiving Data
Data
Management
Storage
Architecture
File
Management
Documentation
Practices
Access
Management
(cc)AlanCleaver(cc)WillScullin
o File Organization
o File Naming
o File Formats
o Storage Options
o Single points of failure
o Backup Strategy
o Project Documentation
o Process Documentation
o Data Documentation
Documentation
Practices
File Storage
File System
File Format
File Content
Good practice for documenting project information:
 Oftentimes a team effort
 At minimum, store documentation in readme.txt file
 Include name of project, people, roles & contact information
 Include executive summary or abstract for basic context
 Include an inventory of servers, directories, data, lab
equipment, and other resources
 A great start for project documentation is a project charter
o Project Documentation 
o Process Documentation
o Data Documentation
Documentation
Practices
Good practices for documenting processes:
 Sometimes an individual effort, sometimes collaborative
 Protocols, software or code settings, code commentary
 Workflow descriptions (text) or diagrams (image)
 Include example scripts, inputs, outputs if applicable
 A great start for process documentation is a lab notebook
o Project Documentation
o Process Documentation 
o Data Documentation
Example of R code commentary
# Cumulative normal density
pnorm(c(-1.96,0,1.96))
Documentation
Practices
Good practices for documenting data:
 Use standard methods of documentation where
they exist
 Metrics/Measurements
 Code Book
 Metadata Standard
o Project Documentation
o Process Documentation
o Data Documentation 
~1.57×107 K = Temperature of the sun (center)
unit
measure/metric
metadata
Documentation
Practices
o Project Documentation
o Process Documentation
o Data Documentation
o Sharing Data
o Publishing Data
o Archiving Data
Data
Management
Storage
Architecture
File
Management
Documentation
Practices
Access
Management
(cc)AlanCleaver
o File Organization
o File Naming
o File Formats
o Storage Options
o Single points of failure
o Backup Strategy
o Sharing Data
o Publishing Data
o Archiving Data
Access
Management
File Storage
File System
File Format
File Content
Good practices for sharing or distributing data:
 Basics
• Synchronization, Versioning, Access Restrictions (and logs)
• Collaborative tools can save time and effort (and help with scale)
 Intellectual property
• Data itself not protected by copyright law in U.S.
• Expressions of data (forms, reports, visuals) can be copyrightable
• Data can be licensed similarly to software
 Ethics
• Human subjects (e.g. IRB restrictions)
• Private/sensitive information
o Sharing Data 
o Publishing Data
o Archiving Data
Access
Management
Good practices for publishing data:
 Not Publishing
 Self Publishing (Web Site)
 Create and add data citations to personal websites
 Journal (Supplementary Material)
 Publish data with a journal that will provide a persistent link to your
dataset (e.g. DOI, handle)
 Archive/Repository
 Institutional (see above example)
 Disciplinary (e.g. article & data)
o Sharing Data
o Publishing Data 
o Archiving Data
Access
Management
Good practices for archiving research data:
 LOCKSS!
 Archive documentation with data
 Write costs for data management and archiving into your
research budgets (and in some cases, proposals)
 Define access policies including restrictions or embargos
 Understand requirements for submission of data prior to
project completion
o Sharing Data
o Publishing Data
o Archiving Data 
Access
Management
o Project Documentation
o Process Documentation
o Data Documentation
o Sharing Data
o Publishing Data
o Archiving Data
Data
Management
Storage
Architecture
File
Management
Documentation
Practices
Access
Management
o File Organization
o File Naming
o File Formats
o Storage Options
o Single points of failure
o Backup Strategy
Questions?
 Store – Three Copies on Three Disks in Three Locations
 Organize – If you make a plan, you just might follow it.
 Document – What would my colleagues need to know to
understand this data?
 Share – Data makes an impact
 Slides are HERE: http://tiny.cc/yvdpqw
Aaron Collie
Digital Curation Librarian
collie@msu.edu

Contenu connexe

Tendances

IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinarSarah Jones
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersJez Cope
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycleMarieke Guy
 
Research data management & planning: an introduction
Research data management & planning: an introductionResearch data management & planning: an introduction
Research data management & planning: an introductionMaggie Neilson
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practicesLeon Osinski
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementDaniel JACOB
 

Tendances (20)

IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinar
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy Issues
 
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Research data management & planning: an introduction
Research data management & planning: an introductionResearch data management & planning: an introduction
Research data management & planning: an introduction
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practices
 
Writing a Research Data Management Plan - 2016-11-09 - University of Oxford
Writing a Research Data Management Plan - 2016-11-09 - University of OxfordWriting a Research Data Management Plan - 2016-11-09 - University of Oxford
Writing a Research Data Management Plan - 2016-11-09 - University of Oxford
 
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of OxfordData Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Working with Global Infrastructure at a National Level
Working with Global Infrastructure at a National LevelWorking with Global Infrastructure at a National Level
Working with Global Infrastructure at a National Level
 

Similaire à Research Data Curation _ Grad Humanities Class

Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)aaroncollie
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementcunera
 
Adding valuethroughdatacuration
Adding valuethroughdatacurationAdding valuethroughdatacuration
Adding valuethroughdatacurationAPLICwebmaster
 
New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...
New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...
New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...Aaron Collie
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research DataMichael Day
 

Similaire à Research Data Curation _ Grad Humanities Class (20)

Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
 
Introduction to Data Management and Sharing
Introduction to Data Management and SharingIntroduction to Data Management and Sharing
Introduction to Data Management and Sharing
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Adding valuethroughdatacuration
Adding valuethroughdatacurationAdding valuethroughdatacuration
Adding valuethroughdatacuration
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...
New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...
New Grantsmanship: Digital Sustainability, Open Access, and Consortia Arrange...
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Data management plans
Data management plansData management plans
Data management plans
 
Data management
Data management Data management
Data management
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Introduction to RDM for trainee physicians
Introduction to RDM for trainee physiciansIntroduction to RDM for trainee physicians
Introduction to RDM for trainee physicians
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 

Dernier

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Dernier (20)

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

Research Data Curation _ Grad Humanities Class

  • 1. Research Data Curation Data documentation, organization, storage and sharing Aaron Collie Digital Curation Librarian collie@msu.edu
  • 2. Data Management. Isn’t that… trivial?  Not so much. Data is a primary output of research; it is very expensive to produce high quality data. Data may be collected in nanoseconds, but it takes the expert application of research protocol and design to generate quality data. CC-BY-SA-3.0 Rob Lavinsky CC-BY-SA-3.0 Rob
  • 3.  To put that into perspective, consider data as the product of an industry. Data is the output of a process that generates higher orders of understanding. Wisdom Knowledge Information Data Understanding is hierarchical! Russell Ackoff
  • 4. Data Industries  In the academic sector that industry is called scholarly communication.  In the private sector that industry is called research & development. Data New Product Data Research Article
  • 5. Industry is changing Multiauthor Papers: Onward and Upward - ScienceWatch Newsletter. (n.d.). Retrieved October 4, 2013, from http://archive.sciencewatch.com/newsletter/2012/201207/multiauthor_papers/ The demise of the lone author : Article : History of the Journal Nature. (n.d.). Retrieved October 4, 2013, from http://www.nature.com/nature/history/full/nat ure06243.html
  • 6. Science is always changing • Thousand years ago: science was empirical describing natural phenomena • Last few hundred years: theoretical branch using models, generalizations • Last few decades: a computational branch simulating complex phenomena • Today: data exploration (eScience) unify theory, experiment, and simulation – Data captured by instruments or generated by simulator – Processed by software – Information/Knowledge stored in computer – Scientist analyzes database / files using data management and statistics 2 2 2 . 3 4 a cG a a             Slide credit: Gray, J. & Szalay, A. (11 January 2007). eScience Talk at NRC-CSTB meeting. http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
  • 7. Research is now a team sport (cc) SpoiltCat
  • 8. This has been noticed. NASA “promotes the full and open sharing of all data” “…requires that data…be submitted to and archived by designated national data centers.” “…expects the timely release and sharing of final research data" "IMLS encourages sharing of research data." “…should describe how the project team will manage and disseminate data generated by the project” “…must include a supplementary document of no more than two pages labeled ‘Data Management Plan’.”
  • 9. But why are we really here?  Impetus: NSF has mandated that all grant applications submitted after January 18th, 2011 must include a supplemental “Data Management Plan”  Effect: The original NSF mandate has had a domino effect, and many funders now require or state guidelines for data management of grant funded research  Response: Data management has not traditionally received a full treatment in (many) graduate and doctoral curricula; intervention is necessary
  • 10. Positive reinforcement….  National Science Foundation Data Management Plan mandate (January 18, 2011)  Presidential Memorandum on Managing Government Records (August 24, 2012)  Managing Government Records Directive: All permanent electronic records in Federal agencies will be managed electronically to the fullest extent possible for eventual transfer and accessioning by NARA in an electronic format.
  • 11. Positive reinforcement… (cont.)  White House policy memo (February 22, 2013)  Increasing Access to the Results of Federally Funded Scientific Research: Federal agencies with more than $100M in R&D expenditures must develop plans to make the published results of federally funded research freely available to the public within one year of publication.  OSTP policy memo (March 20, 2014)  Improving the Management of and Access to Scientific Collections: directs each Federal agency that owns, maintains, or otherwise financially supports permanent scientific collections to develop a draft scientific-collections management and access policy within six months.
  • 12. Curation responsibilities (Carlson, The Chronicle, 2006) “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” big science data small science data institution? domain? MacColl, John (2010). The Role of libraries in data curation. RLG Partnership Annual Meeting, Chicago. June 2010
  • 13. This is the engine of the academic industry…
  • 14.
  • 15. So, things can get a little messy.
  • 16. The scientific method “is often misrepresented as a fixed sequence of steps,” rather than being seen for what it truly is, “a highly variable and creative process” (AAAS 2000:18). Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)
  • 17.
  • 18. The Research Depth Chart Scientific Method Research Design Research Method Research Tasks MoreSpecificMoreGeneric
  • 21. How does this apply to you?  Data Management is an now an expect job skill.  Especially in the research fields (“RDM”).  Studies show that data management is not typically a significant part of undergraduate or graduate curriculum(s).  We have a causality dilemma!
  • 22. What’s in it for you?  Better organization for your classes  Course Management: Angel / Desire2Learn  Bibliographic Management: Zotero / Endnote / Mendelay  File Management: Google Drive / Git / File-system  Direct application to your career  Data management is an “unnamed practice”  Start now so you can this skill on your Resume or CV  Academia is changing: big data is here
  • 26. RDM Systems File Storage File System File Format File Content  File Systems  Hierarchical  Database Systems  Hierarchical, Relational, or Object Oriented  Asset Management Systems  Combination of Database and File System
  • 27. o Project Documentation o Process Documentation o Data Documentation o Sharing Data o Publishing Data o Archiving Data Data Management Storage Architecture File Management Documentation Practices Access Management (cc)AlanCleaver(cc)WillScullin o File Organization o File Naming o File Formats o Storage Options o Single points of failure o Backup Strategy
  • 28. o Storage Options o Single points of failure o Backup Strategy Storage Architecture File Storage File System File Format File Content
  • 29. o Storage Options  o Single points of failure o Backup Strategy Storage Architecture Optical Storage • CD-ROM • DVD-ROM • Blu-ray Discs Solid-State Storage • USB Flash Drives • Memory Cards • “Internal Device Storage” Magnetic Storage • Internal Hard Drives • External Hard Drives • Tape Drives Networked Storage • Server and Web Storage • Managed Networked Storage • “Cloud Storage” • Tape Libraries
  • 30. Good practices for avoiding single points of error:  Use managed networked storage whenever possible  Move data off of portable media  Never rely on one copy of data  Do not rely on CD or DVD copies to be readable  Be wary of software lifespans (e.g. Angel) o Storage Options o Single points of failure  o Backup Strategy Storage Architecture Limited “Task” Term Short “Project” Term Long “Life” Term • Optical Media • CD, DVD, Blu-ray • Portable Flash Media • USB Flash Drives • Memory Cards • Internal Memory • Magnetic Storage • Internal HD • External HD • Networked Storage • Server/Web Space • Cloud Storage • Networked Storage • Managed Network • Magnetic Storage • Tape Drives
  • 31. Good practices for creating a backup strategy:  Make 3 copies  E.g. original + external/local + external/remote  E.g. original + 2 formats on 2 drives in 2 locations  Geographically distribute and secure  Local vs. remote, depending on needed recovery time  Know what resources are available to you: personal computer, external hard drives, departmental, or university servers may be used o Storage Options o Single points of failure o Backup Strategy  Storage Architecture
  • 32. o Project Documentation o Process Documentation o Data Documentation o Sharing Data o Publishing Data o Archiving Data Data Management Storage Architecture File Management Documentation Practices Access Management (cc)AlanCleaver(cc)WillScullin o File Organization o File Naming o File Formats o Storage Options o Single points of failure o Backup Strategy
  • 33. o File Organization o File Naming o File Formats File Management File Storage File System File Format File Content
  • 34. Create a file plan  Better chance you will use a standard method when the time comes  Simple organization is intuitive to team members and colleagues  Reduces unsynchronized copies in personal drives and email attachments o File Organization  o File Naming o File Formats File Management
  • 35. Utilize a file naming convention  Create logical sequences for sorting through many files and versions  Identify what you’re searching for by filename by using a primary term  If not using a version control system, implement simple versioning  It’s sort of like a tweet  Should not exceed 255 characters for most modern operating systems o File Organization o File Naming  o File Formats File Management Example file names using simple version control: Primary term: lakeLansing_waltM_fieldNotes_20091012_v002.doc location OrgChart2009_petersK_20090101_d001.svg content 20110117_sharpeW_krillMicrograph_backscatter3_v002.tif date borgesJ_collocation_20080414.xml person
  • 36. Make an informed decision in selecting file formats  It is important to choose platform and vendor-independent file formats to ensure the best chance for future compatibility  “Open” formats are often (but not always) supported broadly by a community rather than individually by a company or vendor o File Organization o File Naming o File Formats  File Management Format Genre Great Not Bad Avoid TEXT .txt; .odt; .xml; .html .pdf; .rtf; .docx .doc AUDIO .flac; .wav .ogg; .mp3 .wma; .ra; .ram; compression VIDEO .mp2/.mp4, MKV .wmv; .mov; .avi; compression IMAGE .tif; .png; .svg; .jpg .gif; .psd; compression DATA .sql; .csv; .xml .xlsx .xls; proprietary DB formats
  • 37. o Project Documentation o Process Documentation o Data Documentation o Sharing Data o Publishing Data o Archiving Data Data Management Storage Architecture File Management Documentation Practices Access Management (cc)AlanCleaver(cc)WillScullin o File Organization o File Naming o File Formats o Storage Options o Single points of failure o Backup Strategy
  • 38. o Project Documentation o Process Documentation o Data Documentation Documentation Practices File Storage File System File Format File Content
  • 39. Good practice for documenting project information:  Oftentimes a team effort  At minimum, store documentation in readme.txt file  Include name of project, people, roles & contact information  Include executive summary or abstract for basic context  Include an inventory of servers, directories, data, lab equipment, and other resources  A great start for project documentation is a project charter o Project Documentation  o Process Documentation o Data Documentation Documentation Practices
  • 40. Good practices for documenting processes:  Sometimes an individual effort, sometimes collaborative  Protocols, software or code settings, code commentary  Workflow descriptions (text) or diagrams (image)  Include example scripts, inputs, outputs if applicable  A great start for process documentation is a lab notebook o Project Documentation o Process Documentation  o Data Documentation Example of R code commentary # Cumulative normal density pnorm(c(-1.96,0,1.96)) Documentation Practices
  • 41. Good practices for documenting data:  Use standard methods of documentation where they exist  Metrics/Measurements  Code Book  Metadata Standard o Project Documentation o Process Documentation o Data Documentation  ~1.57×107 K = Temperature of the sun (center) unit measure/metric metadata Documentation Practices
  • 42. o Project Documentation o Process Documentation o Data Documentation o Sharing Data o Publishing Data o Archiving Data Data Management Storage Architecture File Management Documentation Practices Access Management (cc)AlanCleaver o File Organization o File Naming o File Formats o Storage Options o Single points of failure o Backup Strategy
  • 43. o Sharing Data o Publishing Data o Archiving Data Access Management File Storage File System File Format File Content
  • 44. Good practices for sharing or distributing data:  Basics • Synchronization, Versioning, Access Restrictions (and logs) • Collaborative tools can save time and effort (and help with scale)  Intellectual property • Data itself not protected by copyright law in U.S. • Expressions of data (forms, reports, visuals) can be copyrightable • Data can be licensed similarly to software  Ethics • Human subjects (e.g. IRB restrictions) • Private/sensitive information o Sharing Data  o Publishing Data o Archiving Data Access Management
  • 45. Good practices for publishing data:  Not Publishing  Self Publishing (Web Site)  Create and add data citations to personal websites  Journal (Supplementary Material)  Publish data with a journal that will provide a persistent link to your dataset (e.g. DOI, handle)  Archive/Repository  Institutional (see above example)  Disciplinary (e.g. article & data) o Sharing Data o Publishing Data  o Archiving Data Access Management
  • 46. Good practices for archiving research data:  LOCKSS!  Archive documentation with data  Write costs for data management and archiving into your research budgets (and in some cases, proposals)  Define access policies including restrictions or embargos  Understand requirements for submission of data prior to project completion o Sharing Data o Publishing Data o Archiving Data  Access Management
  • 47. o Project Documentation o Process Documentation o Data Documentation o Sharing Data o Publishing Data o Archiving Data Data Management Storage Architecture File Management Documentation Practices Access Management o File Organization o File Naming o File Formats o Storage Options o Single points of failure o Backup Strategy
  • 48. Questions?  Store – Three Copies on Three Disks in Three Locations  Organize – If you make a plan, you just might follow it.  Document – What would my colleagues need to know to understand this data?  Share – Data makes an impact  Slides are HERE: http://tiny.cc/yvdpqw Aaron Collie Digital Curation Librarian collie@msu.edu

Notes de l'éditeur

  1. National Oceanic and Atmospheric Administration (NOAA) IMLS encourages sharing of research data. Applications that develop digital products must fill out an additional form with ten questions focused on “Developing Data Management Plans for Research Projects. The federal government has the right to obtain, reproduce, publish or otherwise use the data first produced under an award and authorize others to do so for government purposes.” Ex: Digging Into Data
  2. HANDOUT: DMP (blue)
  3. Research is a process, it is scientific, and we use an overarching model to describe the process at a high level. But this is a conceptual model, it is not a process model. But this is a pretty sterile model; and we know that because it is not prescriptive to all academic disciplines.
  4. In practice, research is a complicated process. It is a creative process as well as a scientific process.
  5. Research is hard, managing research is boring. So we want tips that make it easier.
  6. This has been noticed.
  7. You might think of the scientific method as a bit of an iceberg model. At the tip of the iceberg are these general activities, but research isn’t really conducted at this high of a level.
  8. Research is a thing that happens at many levels simultaneously. The more experience you gain with research, the more of the depth chart you develop expertise within.
  9. Data management is a subprocess of research. It is part of a holistic research method that includes a ton of other functions like funding, literature reviews, workflows and publication.
  10. Today we are just going to focus on the one of these areas. Data management.
  11. Interpretation Content Carrier/computer file Network/file system Hard drive walknboston
  12. A single point of failure occurs when it would only take one event to destroy all data on a device (e.g. dropped hard drive)
  13. Simple File Plan Advanced Directory Manifest GIT, Subversion Content Management Systems (CMS) Expert Data management systems (DMS)
  14. Choose a meaningful directory hierarchy Primary subject, Secondary subject, Tertiary subject Investigator, Process, Date Instrument, Date, Sample
  15. Good Practices for file naming: Meaningful & descriptive Capital letters or underscores differentiate between words Surname first followed by initials of first name Decide on a simple “versioning” method (e.g. file_v001) Use alphanumeric characters (e.g. abc123) Meaningful but short (255 character limit) Descriptive while still making sense Capital letters or underscores differentiate between words Surname first followed by initials of first name More on handout NameOfStudy_Location_Date_FG#_transcribedby_NameOfTranscriber_v###.DOCX
  16. Good choices for file formats: Non-proprietary Open, documented standard Common usage by research community Standard representation (ASCII, Unicode) Unencrypted Uncompressed
  17. Simple README.txt Advanced Wiki’s Workflow diagrams Expert Project Management Metadata Standards Ontologies
  18. Shouldn’t I have already documented basic project information in an abstract or introduction in a paper or thesis? Yes, but this information is meant to be contextual information that can be used to better understand the data. It would accompany the data if shared. Sometimes called a project charter Wiki’s, GIT, or other version control systems can really turn this simple charter into an authoritative record of the research
  19. Why do I need to document the way I process and analyze data? Researchers will need detailed information to reuse or verify your data. Again, Methodology sections are not comprehensive
  20. Simple Email Website Collaboration Tools Advanced Networked Storage Expert Data Repository
  21. Scoop, not IRB approved, etc