SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
Data Matters 
Tips & Tools for Better Research 
Carly Strasser, California Digital Library 
carlystrasser@gmail.com 
AGU Student & Early Career Scientist Conference 
14 Dec 2014 
From Flickr by Lachlan Donald
Why are 
you here? 
Science: you’re (probably) 
doing it wrong
From Wikimedia Commons 
Back in the day… 
From ahswhg.wikispaces.com
Back in the day… 
Da Vinci 
Curie 
Newton 
classicalschool.blogspot.com 
Darwin
Research has 
changed 
Better
From wikimedia 
Such 
Internet! 
So many 
tools! 
From Flickr by John Jobby 
So much 
data!
Research has 
changed 
Worse
Digital data 
From Flickr by Flickmor 
From Flickr by DW0825 
From Flickr by US Army Environmental Command 
C. Strasser 
Courtesey of WHOI 
From Flickr by deltaMike
Digital data 
+ 
Complex workflows
i.telegraph.co.uk
Scientists are bad at 
data management.
An embarrassing 
example… 
From Flickr by lincolnblues
?
From Flickr by ransomtech 
Didn’t share the data 
Didn’t document the data (metadata) 
Didn’t document provenance/workflow
Why should I care? 
From Flickr by johntrainor
Because reproducibility is one of 
the fundamental tenets of science. 
Because we need to be credible.
Because reproducibility is one of 
the fundamental tenets of science. 
Because we need to be credible. 
Because Fox News, creationism, 
and the war on science.
“Help us identify grants that are wasteful 
or that you don’t think are a good use of 
taxpayer dollars.” 
Rep. Adrian Smith (R-Nebraska), a member of the House Committee on Science 
and Technology
Because reproducibility is one of 
the fundamental tenets of science. 
Because we need to be credible. 
Because Fox News, creationism, 
and the war on science 
Because it means faster progress.
Because you are a good person.
From Flickr by Redden-McAllister 
From Flickr by Ken Cowell 
From Flickr Brandi Jordan
Map of Scientific Collaborations 
flowingdata.com
Because you have to.
Journals 
Institutions 
Funders 
From Flickr by Eva Rinaldi Celebrity and Live Music 
Photographer
Feb 
2013 
… “Federal agencies investing in research and 
development (more than $100 million in annual 
expenditures) must have clear and coordinated 
policies for increasing public access to 
research products.”
From 
Flickr 
by 
Michael 
Tinkler
From Flickr by Big Swede Guy 
data management 
Best 
Practices
From Flickr by Mark Sardella 
Plan before data collection
Design sample naming schemePlanning 
• Create a key (data dictionary) 
• Make sure names are unique 
• Define codes 
From Flickr by zebbie
Design file naming schemePlanning 
Use descriptive file names 
• Unique 
• Reflect contents 
From 
R 
Cook, 
ESA 
Best 
Practices 
Workshop 
2010 
Bad: 
Mydata.xls 
2001_data.csv 
best version.txt 
Better: 
Eaffinis_nanaimo_2010_counts.xls 
Site 
name 
Year 
What was 
measured 
Study 
organism 
*Not for everyone 
*
Design file organizationPlanning 
Biodiversity 
Lake 
Experiments 
Field work 
Grassland 
Biodiv_H20_heatExp_2005to2008.csv 
Biodiv_H20_predatorExp_2001to2003.csv 
… 
Biodiv_H20_PlanktonCount_2001toActive.csv 
Biodiv_H20_ChlAprofiles_2003.csv 
… 
Consider… 
• Dependencies? 
• File formats? 
• Time of collection? 
• Order of analysis? 
From S. Hampton
Planning 
Design your spreadsheet 
Constrain entries 
Atomize 
Break down spreadsheets 
From Flickr by Ulleskelf
Consider a databasePlanning 
A relational database is 
A set of tables 
Relationships among the tables 
A language to specify & query the tables 
A RDB provides 
Scalability: millions+ records 
Features for sub-setting, querying, sorting 
Reduced redundancy & entry errors 
From Mark Schildhauer
Pick a data repository 
Store your data in a repository 
Institutional archive 
Discipline/specialty archive 
From Flickr by torkildr 
Planning
Pick a data repository 
Store your data in a repository 
Institutional archive 
Discipline/specialty archive 
From Flickr by torkildr 
Planning 
Ask a librarian
Pick a data repository 
Store your data in a repository 
Institutional archive 
Discipline/specialty archive 
From Flickr by torkildr 
Planning 
Ask a librarian 
Repos of repos: 
databib.org 
re3data.org
Decide on preservation/backup 
From Flickr by sepa synod 
From Flickr by taberandrew 
From Flickr by withassociates 
Planning
Decide on preservation/backup 
From Flickr by sepa synod 
From Flickr by taberandrew 
From Flickr by withassociates 
What software? 
What hardware? 
What personnel? 
How often? 
Set up reminders! 
Test system 
Planning
…document that 
describes what you will 
do with your data 
throughout 
the research project 
From Flickr by Barbies Land 
Write a data 
management plan! 
Planning
Planning 
DMP components 
• What will be collected 
• Methods 
• Standards 
• Metadata 
• Sharing/But they access 
all have 
• Long-term storage 
different requirements 
and express them in 
different ways 
From Flickr by Barbies Land
dmptool.org 
Step-by-step wizard for generating DMP 
create | edit | re-use | share 
Free & open to community 
Planning
During Data Collection & Entry 
From Flickr by Julia Manzerova
Realistically: 
• Archive .csv version of raw data 
• Make a “raw” tab in working data file 
• Do all work on other tabs 
During 
Keep raw data rawcollection
Keep raw data raw 
Raw data as .csv 
During 
collection 
R script for processing & analysis 
Ideally: 
• Use scripts to process data 
• Save them with data
During 
Document your workflowcollection 
Workflow: how you get from the raw data to the final 
products of your research 
Temperature 
data 
Salinity 
data 
Data import into Excel 
Quality control & 
“Clean” T data cleaning 
& S data 
Analysis: mean, SD 
Graph production 
Data in 
spread-sheet 
Summary 
statistics 
Simple workflow: flow chart
During 
collection 
Workflow: how you get from the raw data to the final 
products of your research 
Commented script 
• R, SAS, MATLAB… 
• Well-documented code is 
Easier to review 
Easier to share 
Easier to use for repeat analysis 
# 
%$ 
& 
Document your workflow
Constrain data entries 
• Excel lists 
• Data validation 
• Google docs forms 
Modified from K. Vanderbilt 
During 
collection
Atomize 
During 
collection 
One piece of information per cell
During 
Break down spreadsheetscollection 
Fake a relational database 
Create parameter table 
From doi:10.3334/ORNLDAAC/777 
From doi:10.3334/ORNLDAAC/777 
From R Cook, ESA Best Practices Workshop 2010 
Create a site table
Metadata: data reporting 
WHO created the data? 
WHAT is the content 
of the data set? 
WHEN was it created? 
WHERE was it collected? 
HOW was it developed? 
WHY was it developed? 
From Flickr by //ichael Patric|{ 
During 
Create metadatacollection
Create metadatacollection 
Digital context 
• Name of the data set 
• The name(s) of the data file(s) in the 
data set 
• Date the data set was last modified 
• Example data file records for each data 
type file 
• Pertinent companion files 
• List of related or ancillary data sets 
• Software (including version number) 
used to prepare/read the data set 
• Data processing that was performed 
Personnel & stakeholders 
• Who collected 
• Who to contact with questions 
• Funders 
During 
Scientific context 
• Scientific reason why the data were 
collected 
• What data were collected 
• What instruments (including model & serial 
number) were used 
• Environmental conditions during collection 
• Temporal & spatial resolution 
• Standards or calibrations used 
Information about parameters 
• How each was measured or produced 
• Units of measure 
• Format used in the data set 
• Precision & accuracy if known 
Information about data 
• Definitions of codes used 
• Quality assurance & control measures 
• Known problems that limit data use (e.g. 
uncertainty, sampling problems)
< Create metadata 
St a n da rd 
Metadata standards… 
• Provide structure to describe data 
During 
collection 
What is 
metadata? 
Common terms | definitions | language | structure 
• Come in many flavors 
EML , FGDC, ISO19115, DarwinCore,… 
• Can be met using software tools 
Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)
Back up daily 
During 
collection 
From Flickr by lippo 
From Flickr by see phar 
Original 
Near 
Far
During 
collection 
From Flickr by Barbies Land 
Remember that data 
management plan? 
Revisit 
Review 
Revise
During 
collection 
Schedule a time each 
week or month 
Revisit 
Review 
Revise 
From Flickr by purplemattfish
From 
Flickr 
by 
celikins 
Where to start?
Make a 
resolution 
• Triage on current 
projects 
• Get advisor, lab mates, 
collaborators on board 
• Do better next time 
From Flickr by Andy Graulund
From 
Flickr 
by 
karindalziel 
Start working online
Open notebooks 
http://datapub.cdlib.org
Write a DMPdmptool.org 
Step-by-step wizard for generating DMP 
create | edit | re-use | share 
Free & open to community
databib.org 
Find a repository 
Where 
should I put 
my data?
Learn new skills 
software carpentry 
www.software-carpentry.org
Other Fun Stuff 
From Flickr by Micah Taylor
Credit in academia… 
Altmetrics? 
Impact 
Factors 
+ 
Citation 
Counts
Altmetrics 
Article-level metrics 
Altmetrics for alt-products 
Data 
Code 
Slides 
Blogs 
Downloads 
Tweets 
Mentions 
Views 
From Flickr by Skakerman
Altmetrics 
Article-level metrics 
Altmetrics for alt-products
Researcher 
Identification
BIG initiatives…
NSF funded DataNet Project 
Office of Cyberinfrastructure 
www.dataone.org
New partners…
Better methods…
Better methods…
From Flickr by dotpolka 
Manage & share 
your data!
Website 
Email 
Twitter 
Slides 
carlystrasser.net 
carlystrasser@gmail.com 
@carlystrasser 
slideshare.net/carlystrasser

Contenu connexe

Tendances

Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
NSF Data Management Plan Case Study: UVa’s Response.
NSF Data Management Plan Case Study:  UVa’s Response.NSF Data Management Plan Case Study:  UVa’s Response.
NSF Data Management Plan Case Study: UVa’s Response.Andrew Sallans
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
NSF Data Management Plan - Implications for Librarians
NSF Data Management Plan - Implications for LibrariansNSF Data Management Plan - Implications for Librarians
NSF Data Management Plan - Implications for LibrariansAndrew Sallans
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 

Tendances (20)

Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NSF Data Management Plan Case Study: UVa’s Response.
NSF Data Management Plan Case Study:  UVa’s Response.NSF Data Management Plan Case Study:  UVa’s Response.
NSF Data Management Plan Case Study: UVa’s Response.
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
NSF Data Management Plan - Implications for Librarians
NSF Data Management Plan - Implications for LibrariansNSF Data Management Plan - Implications for Librarians
NSF Data Management Plan - Implications for Librarians
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 

En vedette

UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFCarly Strasser
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalJackie Wirz, PhD
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015Jackie Wirz, PhD
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
 

En vedette (9)

Science101 slideshare
Science101 slideshareScience101 slideshare
Science101 slideshare
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UF
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 

Similaire à Data Matters for AGU Early Career Conference

Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopCarly Strasser
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?Patricia Hswe
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseGigaScience, BGI Hong Kong
 
Data Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesData Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesAdam Marko
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 

Similaire à Data Matters for AGU Early Career Conference (20)

Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
 
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
 
Data Management Plans: Tips, Tricks and Tools
Data Management Plans: Tips, Tricks and ToolsData Management Plans: Tips, Tricks and Tools
Data Management Plans: Tips, Tricks and Tools
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?It's 2015. Do You Know Where Your Data Are?
It's 2015. Do You Know Where Your Data Are?
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
 
Data Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesData Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life Sciences
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
DataUp at ACRL 2013
DataUp at ACRL 2013DataUp at ACRL 2013
DataUp at ACRL 2013
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 

Plus de Carly Strasser

Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14Carly Strasser
 
Cal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCarly Strasser
 
Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?Carly Strasser
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCarly Strasser
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCarly Strasser
 
UNT: Scientific Data Management and Sharing
UNT: Scientific Data Management and SharingUNT: Scientific Data Management and Sharing
UNT: Scientific Data Management and SharingCarly Strasser
 
PLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsPLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsCarly Strasser
 
"Undergrad ecologists aren't learning data management" - ESA 2013
"Undergrad ecologists aren't learning data management" -  ESA 2013"Undergrad ecologists aren't learning data management" -  ESA 2013
"Undergrad ecologists aren't learning data management" - ESA 2013Carly Strasser
 
Data Management Planning for ESA 2013
Data Management Planning for ESA 2013Data Management Planning for ESA 2013
Data Management Planning for ESA 2013Carly Strasser
 

Plus de Carly Strasser (16)

Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14
 
Cal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCal Poly - An Overview of Open Science
Cal Poly - An Overview of Open Science
 
Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?Cal Poly - Data Management: Who knew it was a hot topic?
Cal Poly - Data Management: Who knew it was a hot topic?
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPTool
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 
UNT: Scientific Data Management and Sharing
UNT: Scientific Data Management and SharingUNT: Scientific Data Management and Sharing
UNT: Scientific Data Management and Sharing
 
PLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsPLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and Altmetrics
 
"Undergrad ecologists aren't learning data management" - ESA 2013
"Undergrad ecologists aren't learning data management" -  ESA 2013"Undergrad ecologists aren't learning data management" -  ESA 2013
"Undergrad ecologists aren't learning data management" - ESA 2013
 
Data Management Planning for ESA 2013
Data Management Planning for ESA 2013Data Management Planning for ESA 2013
Data Management Planning for ESA 2013
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Data Matters for AGU Early Career Conference

  • 1. Data Matters Tips & Tools for Better Research Carly Strasser, California Digital Library carlystrasser@gmail.com AGU Student & Early Career Scientist Conference 14 Dec 2014 From Flickr by Lachlan Donald
  • 2. Why are you here? Science: you’re (probably) doing it wrong
  • 3. From Wikimedia Commons Back in the day… From ahswhg.wikispaces.com
  • 4. Back in the day… Da Vinci Curie Newton classicalschool.blogspot.com Darwin
  • 6. From wikimedia Such Internet! So many tools! From Flickr by John Jobby So much data!
  • 8. Digital data From Flickr by Flickmor From Flickr by DW0825 From Flickr by US Army Environmental Command C. Strasser Courtesey of WHOI From Flickr by deltaMike
  • 9. Digital data + Complex workflows
  • 11. Scientists are bad at data management.
  • 12. An embarrassing example… From Flickr by lincolnblues
  • 13.
  • 14.
  • 15. ?
  • 16. From Flickr by ransomtech Didn’t share the data Didn’t document the data (metadata) Didn’t document provenance/workflow
  • 17. Why should I care? From Flickr by johntrainor
  • 18. Because reproducibility is one of the fundamental tenets of science. Because we need to be credible.
  • 19.
  • 20. Because reproducibility is one of the fundamental tenets of science. Because we need to be credible. Because Fox News, creationism, and the war on science.
  • 21. “Help us identify grants that are wasteful or that you don’t think are a good use of taxpayer dollars.” Rep. Adrian Smith (R-Nebraska), a member of the House Committee on Science and Technology
  • 22. Because reproducibility is one of the fundamental tenets of science. Because we need to be credible. Because Fox News, creationism, and the war on science Because it means faster progress.
  • 23.
  • 24. Because you are a good person.
  • 25. From Flickr by Redden-McAllister From Flickr by Ken Cowell From Flickr Brandi Jordan
  • 26. Map of Scientific Collaborations flowingdata.com
  • 28. Journals Institutions Funders From Flickr by Eva Rinaldi Celebrity and Live Music Photographer
  • 29.
  • 30. Feb 2013 … “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”
  • 31. From Flickr by Michael Tinkler
  • 32. From Flickr by Big Swede Guy data management Best Practices
  • 33. From Flickr by Mark Sardella Plan before data collection
  • 34. Design sample naming schemePlanning • Create a key (data dictionary) • Make sure names are unique • Define codes From Flickr by zebbie
  • 35. Design file naming schemePlanning Use descriptive file names • Unique • Reflect contents From R Cook, ESA Best Practices Workshop 2010 Bad: Mydata.xls 2001_data.csv best version.txt Better: Eaffinis_nanaimo_2010_counts.xls Site name Year What was measured Study organism *Not for everyone *
  • 36. Design file organizationPlanning Biodiversity Lake Experiments Field work Grassland Biodiv_H20_heatExp_2005to2008.csv Biodiv_H20_predatorExp_2001to2003.csv … Biodiv_H20_PlanktonCount_2001toActive.csv Biodiv_H20_ChlAprofiles_2003.csv … Consider… • Dependencies? • File formats? • Time of collection? • Order of analysis? From S. Hampton
  • 37. Planning Design your spreadsheet Constrain entries Atomize Break down spreadsheets From Flickr by Ulleskelf
  • 38. Consider a databasePlanning A relational database is A set of tables Relationships among the tables A language to specify & query the tables A RDB provides Scalability: millions+ records Features for sub-setting, querying, sorting Reduced redundancy & entry errors From Mark Schildhauer
  • 39. Pick a data repository Store your data in a repository Institutional archive Discipline/specialty archive From Flickr by torkildr Planning
  • 40. Pick a data repository Store your data in a repository Institutional archive Discipline/specialty archive From Flickr by torkildr Planning Ask a librarian
  • 41. Pick a data repository Store your data in a repository Institutional archive Discipline/specialty archive From Flickr by torkildr Planning Ask a librarian Repos of repos: databib.org re3data.org
  • 42. Decide on preservation/backup From Flickr by sepa synod From Flickr by taberandrew From Flickr by withassociates Planning
  • 43. Decide on preservation/backup From Flickr by sepa synod From Flickr by taberandrew From Flickr by withassociates What software? What hardware? What personnel? How often? Set up reminders! Test system Planning
  • 44. …document that describes what you will do with your data throughout the research project From Flickr by Barbies Land Write a data management plan! Planning
  • 45. Planning DMP components • What will be collected • Methods • Standards • Metadata • Sharing/But they access all have • Long-term storage different requirements and express them in different ways From Flickr by Barbies Land
  • 46. dmptool.org Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community Planning
  • 47. During Data Collection & Entry From Flickr by Julia Manzerova
  • 48. Realistically: • Archive .csv version of raw data • Make a “raw” tab in working data file • Do all work on other tabs During Keep raw data rawcollection
  • 49. Keep raw data raw Raw data as .csv During collection R script for processing & analysis Ideally: • Use scripts to process data • Save them with data
  • 50. During Document your workflowcollection Workflow: how you get from the raw data to the final products of your research Temperature data Salinity data Data import into Excel Quality control & “Clean” T data cleaning & S data Analysis: mean, SD Graph production Data in spread-sheet Summary statistics Simple workflow: flow chart
  • 51. During collection Workflow: how you get from the raw data to the final products of your research Commented script • R, SAS, MATLAB… • Well-documented code is Easier to review Easier to share Easier to use for repeat analysis # %$ & Document your workflow
  • 52. Constrain data entries • Excel lists • Data validation • Google docs forms Modified from K. Vanderbilt During collection
  • 53. Atomize During collection One piece of information per cell
  • 54. During Break down spreadsheetscollection Fake a relational database Create parameter table From doi:10.3334/ORNLDAAC/777 From doi:10.3334/ORNLDAAC/777 From R Cook, ESA Best Practices Workshop 2010 Create a site table
  • 55. Metadata: data reporting WHO created the data? WHAT is the content of the data set? WHEN was it created? WHERE was it collected? HOW was it developed? WHY was it developed? From Flickr by //ichael Patric|{ During Create metadatacollection
  • 56. Create metadatacollection Digital context • Name of the data set • The name(s) of the data file(s) in the data set • Date the data set was last modified • Example data file records for each data type file • Pertinent companion files • List of related or ancillary data sets • Software (including version number) used to prepare/read the data set • Data processing that was performed Personnel & stakeholders • Who collected • Who to contact with questions • Funders During Scientific context • Scientific reason why the data were collected • What data were collected • What instruments (including model & serial number) were used • Environmental conditions during collection • Temporal & spatial resolution • Standards or calibrations used Information about parameters • How each was measured or produced • Units of measure • Format used in the data set • Precision & accuracy if known Information about data • Definitions of codes used • Quality assurance & control measures • Known problems that limit data use (e.g. uncertainty, sampling problems)
  • 57. < Create metadata St a n da rd Metadata standards… • Provide structure to describe data During collection What is metadata? Common terms | definitions | language | structure • Come in many flavors EML , FGDC, ISO19115, DarwinCore,… • Can be met using software tools Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)
  • 58. Back up daily During collection From Flickr by lippo From Flickr by see phar Original Near Far
  • 59. During collection From Flickr by Barbies Land Remember that data management plan? Revisit Review Revise
  • 60. During collection Schedule a time each week or month Revisit Review Revise From Flickr by purplemattfish
  • 61. From Flickr by celikins Where to start?
  • 62. Make a resolution • Triage on current projects • Get advisor, lab mates, collaborators on board • Do better next time From Flickr by Andy Graulund
  • 63. From Flickr by karindalziel Start working online
  • 65. Write a DMPdmptool.org Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community
  • 66. databib.org Find a repository Where should I put my data?
  • 67. Learn new skills software carpentry www.software-carpentry.org
  • 68. Other Fun Stuff From Flickr by Micah Taylor
  • 69. Credit in academia… Altmetrics? Impact Factors + Citation Counts
  • 70. Altmetrics Article-level metrics Altmetrics for alt-products Data Code Slides Blogs Downloads Tweets Mentions Views From Flickr by Skakerman
  • 71. Altmetrics Article-level metrics Altmetrics for alt-products
  • 73.
  • 75. NSF funded DataNet Project Office of Cyberinfrastructure www.dataone.org
  • 76.
  • 80. From Flickr by dotpolka Manage & share your data!
  • 81. Website Email Twitter Slides carlystrasser.net carlystrasser@gmail.com @carlystrasser slideshare.net/carlystrasser