Presentation on data management: the current landscape, barriers to management, and data types. For UC3-CDL data curation for practitioners workshop, 8 Nov 2012 in Oakland CA.
Data Management: Scientist Perspective - UC3 Data Curation Workshop
1. From Calisphere via California State University Libraries,
ark:/13030/c818356g
Data
A Scientist’s
@carlystrasser
Carly Strasser
Perspective
Management
2. Roadmap
5. Landscape
4. Solutions
3. Barriers
2. How Bad Is It?
1. A Brief History
C. Strasser
3. A Brief
From Calisphere via Santa Clara University,
History of
Data
ark:/13030/kt696nc7j2
Collection
Or… how scientists came to be so
bad at data management
5. From Flickr by DW0825
From Flickr by Flickmor
From Flickr by deltaMike
The lab/field notebook…?
www.woodrow.org
C. Strasser
Courtesey of WHOI
From Flickr by US Army Environmental Command
16. UGLY TRUTH
Many (most?)
researchers…
5shortessays.blogspot.com
are not taught data management
don’t know what metadata are
can’t name data centers or repositories
don’t share data publicly or store it in an archive
aren’t convinced they should share data
28. Barriers: Sociocultural
Not the norm
Lack of / too From Flickr by toucanradio
many standards
Disparate data
From Flickr by Chris Campbell
29. Barriers: Sociocultural
From Flickr by uniinnsbruck
Not the norm
Lack of / too
many standards
Disparate data
Lack of training
30. From Flickr by Christina Ann
VanMeter
Missed
opportunities
Loss of rights or benefits
From Flickr by pnh
Barriers: Sociocultural
Conflict
From Flickr by tymesynk
Misuse
31. Barriers: Sociocultural
Lack of incentives
Time
consuming &
expensive
No
From Flickr by bthomso
requirements
Reward
structure
32. From Flickr by MarqueYe University
generation?
But what about the next
33. Are Undergrads Learning About
Data Management?
Survey of Undergraduate Ecology courses
38 Research focused
48 schools
10 Education focused
Schools selected…
• One of top grad schools for Ecology
• Most recipients of NSF GRFP
34. Are Undergrads Learning About
Data Management?
Strasser and Hampton, In press at Ecosphere
Metadata generation
Software choice
File naming
QAQC
Backing up
Workflows
Data sharing
Data re-‐‑use
Meta-‐‑analysis
Reproducibility
Notebook protocols
Databases
35. Are Undergrads Learning About
Data Management?
Quality control & Yes
assurance No
Naming computer files
Types of files &
software to use
Metadata generation
Workflows
Protecting data
Databases & data
archiving
Data re-use
Meta-analysis
Data sharing
Reproducibility
Notebook protocols
0 20 40 60 80 100
Percent
36. Are Undergrads Learning About
Data Management?
4.5
4.0
Importance for Undergraduates
3.5
3.0
2.5
2.0
r = 0.306
1.5
p < 0.05
1.0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Value as a Researcher
37. Are Undergrads Learning About
Data Management?
70
Time
60
Not appropriate
at this level
50
Covered in lab
Percent Citing Barrier
40
Students not Class too
prepared
large
Instructor
30
Funding/ not prepared
resources
20
Covered in
10
other courses
0
e el ab ed es e ed es
Tim lev in l par urc larg par urs
his ed re eso too re r co
at t ver ot p g/ r ss ot p the
te co sn in Cla or n o
pria be ent und uct d in
pro oul
d
Stu
d F
Ins
tr re
t ap Sh C ove
No
39. Preach the Benefits
Short term
• Spend less time on DM & more on research
• Easier to use data
• Collaborators can understand & use data
Long term
• Scientists outside project can find, understand &
use data
• Credit for data products & their use
• Funders protect their investment
40. Educate
DataONE Best Practices Lessons
What is the Data Life Cycle?
Data management planning
Data collection, entry & manipulation
Data quality control & quality assurance
Data protection: Data backups
Data preservation, curation & archiving
Data documentation:
What is metadata?
Value of metadata
How to write good metadata
Data management: Data sharing
Data citation practices
Collaboration/communication technologies
Data analysis: Workflows and other tools
47. From Flickr by Richard Moross
Citing Data
Example:
Sidlauskas, B. 2007. Data from: Testing for unequal rates of
morphological diversification in the absence of a detailed phylogeny: a
case study from characiform fishes. Dryad Digital Repository.
doi:10.5061/dryad.20
48. From Flickr by Richard Moross
Data Management
Requirements
Journal publishers
49. From Flickr by Richard Moross
Data Management
Requirements
Journal publishers
Funders
50. Plan
Analyze
Collect
Integrate
Assure
From
Flickr
by
darkuncle
Discover
Describe
Preserve
51. What is a
Will I get data
credit for my
work?
Plan
management
plan?
Analyze
Collect
What tools do
Are there
I use?
standards?
Integrate
Assure
What is
How much metadata?
Who can help will it cost?
From
Flickr
by
darkuncle
me?
Discover
Describe
Where do I Preserve
How do I
preserve my preserve my
data?
data?
53. NSF funded DataNet Project
Office of Cyberinfrastructure
Community
Cyberinfrastructure
Engagement &
Outreach
Courtesy of DataONE
54.
55. My website
carlystrasser.net
Email me
carlystrasser@gmail.com
Tweet me
@carlystrasser
My slides
slideshare.net/carlystrasser
CDL Blog
datapub.cdlib.org