🔝|97111༒99012🔝 Call Girls In {Delhi} Cr Park ₹5.5k Cash Payment With Room De...
Data Management from a Scientist's Perspective
1. From
Calisphere
via
California
State
University
Libraries,
Data
Management
A
Scientist’s
ark:/13030/c818356g
Perspective
Carly
Strasser
California
Digital
Library
University
of
Florida
Libraries
University
of
California
Curation
Center
August
2012
2. C.
Strasser
C.
Strasser
Courtesy
of
WHOI
C.
Strasser
C.
Strasser
3. C.
Strasser
C.
Strasser
North
Atlantic
right
whale
mother
and
calf,
C.
Strasser
by
Gill
Braulik
under
Permit
No.
655-‐1652
4. Roadmap
5. Landscape
4. Barriers
3. The
Fallout
2. The
world
of
data
1. A
brief
history
of
data
collection
C.
Strasser
5. A
Brief
From
Calisphere
via
Santa
Clara
University,
History
of
Data
ark:/13030/kt696nc7j2
Collection
Or…
how
scientists
came
to
be
so
bad
at
data
management
8. From
Flickr
by
DW0825
From
Flickr
by
Flickmor
From
Flickr
by
deltaMike
The
lab/field
notebook…?
www.woodrow.org
C.
Strasser
Courtesey
of
WHOI
From
Flickr
by
US
Army
Environmental
Command
9. From
Flickr
by
DW0825
From
Flickr
by
Flickmor
From
Flickr
by
deltaMike
Digital
data
www.woodrow.org
C.
Strasser
Courtesey
of
WHOI
From
Flickr
by
US
Army
Environmental
Command
18. The
Long
Tail
Size
of
dataset
grant
($)
#
datasets
#
researchers
#
grants
19. The
Long
Tail
300
NSF
DEB
2005-‐2010
250
n
=
1234
Number of Awards
200
150
100
50
0
0.1 0.5 1 1.5 2 >2.5
Award Amount (millions of dollars)
Hampton
et
al.,
In
press,
Frontiers
in
Ecology
and
Evolution
21. UGLY TRUTH
Many
(most?)
researchers…
5shortessays.blogspot.com
are
not
taught
data
management
don’t
know
what
metadata
are
can’t
name
data
centers
or
repositories
don’t
share
data
publicly
or
store
it
in
an
archive
aren’t
convinced
they
should
share
data
28. The
Fallout:
Where
data
end
up
From
Flickr
by
diylibrarian
www
blog.order2disorder.com
From
Flickr
by
csessums
Data
Metadata
From
Flickr
by
csessums
Recreated
from
Klump
et
al.
2006
29. The
Fallout
Data
Reuse
Data
Sharing
Data
Management
30. Is data produced Is the data produced 100
NSF
Dare data
Where EB
awards
or reused? shared? 2005-‐2009
shared?
Is data produced
or reused?
Is the data produced
shared?
One
paper
from
each
Where areor
GenBank data
shared?
Shared TreeBase
Produced
all award
Else-
Reused Shared where
none GenBank or
Shared
Shared
Produced TreeBase
Is data produced
Both Is the data some
produced Where are data
all Else-
or reused? shared? shared?
Reused Shared where
none
Shared
Produced: 57% (37) Shared all: 28% (17)
some GenBank or
Both GenBank or
Reused: 8% (5) Shared some: 15% (9) TreeBase:
Produced Shared TreeBase (21)
81%
Both: 35% (23) Shared none: 57% (34)
all Elsewhere: 19% (5)
Else-
Reused Shared where
Produced: 57% (37) Shared all: 28% (17) GenBank or
none
Reused: 8% (5) Shared
Shared some: 15% (9) TreeBase: 81% (21)
Both: Both (23)
35% some
Shared none: 57% (34) Elsewhere: 19% (5)
Produced: 57% (37) Shared all: 28% (17) GenBank or
Reused: 8% (5) Shared some: 15% (9) TreeBase: 81% (21)
Both: 35% (23) Shared none: 57% (34) Elsewhere: 19% (5)
Hampton
et
al.,
In
press,
Frontiers
in
Ecology
and
Evolution
31. Why?
Barriers
to
Data
Stewardship
From
Flickr
by
iowa_spirit_walker
32. From
Flickr
by
indigoprime
Barriers:
Cost
From
Flickr
by
kobiz7
C.
Strasser
35. Barriers:
Sociocultural
Not
the
norm
Lack
of
/
too
From
Flickr
by
toucanradio
many
standards
Disparate
data
From
Flickr
by
Chris
Campbell
36. Barriers:
Sociocultural
From
Flickr
by
uniinnsbruck
Not
the
norm
Lack
of
/
too
many
standards
Disparate
data
Lack
of
training
37. From
Flickr
by
Christina
Ann
VanMeter
Missed
opportunities
Loss
of
rights
or
benefits
From
Flickr
by
pnh
Barriers:
Sociocultural
Conflict
From
Flickr
by
tymesynk
Misuse
38. Barriers:
Sociocultural
Lack
of
incentives
Time
consuming
&
expensive
No
requirements
From
Flickr
by
bthomso
Reward
structure
39. From
Flickr
by
Marquette
University
generation?
But
what
about
the
next
40. Are
Undergrads
Learning
About
Data
Management?
• Metadata
generation
40
• Software
choice
35
• File
naming
• QAQC
30
Important
• Backing
up
25
• Workflows
20
• Data
sharing
• Data
re-‐use
15
• Meta-‐analysis
10
• Reproducibility
• Notebook
protocols
5
• Databases
0
If
it’s
important,
why
0
10
Assessed
20
30
40
isn’t
it
taught?
41. Are
Undergrads
Learning
About
Data
Management?
Barriers:
Too
Not
a
Not
advanced
priority
appropriate
level
Students
Time
don’t
know
No
software
Lab
No
training
Covered
Too
in
Lab
big
43. Who
cares?
From
Flickr
by
Redden-‐McAllister
From
Flickr
by
AJC1
www.rba.gov.au
44. Where
data
end
up
From
Flickr
by
diylibrarian
www
Data
www
Metadata
From
Flickr
by
torkildr
Recreated
from
Klump
et
al.
2006
45. Trends
in
Data
Archiving
Journal
publishers
Joint
Data
Archiving
Agreement
Data
Papers
etc.
Ecological
Archives,
Beyond
the
PDF
Funders
Data
management
requirements
46. What
is
a
data
management
plan?
A
document
that
describes
what
you
will
do
with
your
data
during
your
research
and
after
you
complete
your
research
47. Why
should
a
scientist
prepare
a
DMP?
Saves
time
Increases
efficiency
Easier
to
use
data
Others
can
understand
&
use
data
Credit
for
data
products
Funders
require
it
49. NSF
DMP
Requirements
From
Grant
Proposal
Guidelines:
DMP
supplement
may
include:
1. the
types
of
data,
samples,
physical
collections,
software,
curriculum
materials,
and
other
materials
to
be
produced
in
the
course
of
the
project
2.
the
standards
to
be
used
for
data
and
metadata
format
and
content
(where
existing
standards
are
absent
or
deemed
inadequate,
this
should
be
documented
along
with
any
proposed
solutions
or
remedies)
3.
policies
for
access
and
sharing
including
provisions
for
appropriate
protection
of
privacy,
confidentiality,
security,
intellectual
property,
or
other
rights
or
requirements
4.
policies
and
provisions
for
re-‐use,
re-‐distribution,
and
the
production
of
derivatives
5.
plans
for
archiving
data,
samples,
and
other
research
products,
and
for
preservation
of
access
to
them
50. NSF’s
Vision*
DMPs
and
their
evaluation
will
grow
&
change
over
time
(similar
to
broader
impacts)
Peer
review
will
determine
next
steps
Community-‐driven
guidelines
– Different
disciplines
have
different
definitions
of
acceptable
data
sharing
– Flexibility
at
the
directorate
and
division
levels
– Tailor
implementation
of
DMP
requirement
Evaluation
will
vary
with
directorate,
division,
&
program
officer
*Unofficially
Help
from
Jennifer
Schopf,
NSF
52. Individual
Challenges
What
is
a
data
Will
I
get
credit
for
my
work?
Collect
management
plan?
Analyze
Assure
What
is
What
tools
do
I
metadata?
use?
Are
there
standards?
Integrate
Describe
How
much
will
it
cost?
Who
can
help
me?
Discover
Deposit
Where
do
I
How
do
I
preserve
my
Preserve
preserve
my
data?
data?
53. NSF
funded
DataNet
Project
Office
of
Cyberinfrastructure
Community
Cyberinfrastructure
Engagement
&
Outreach
Courtesy
of
DataONE
54. What
role
can
libraries
play
in
data
education?
What
barriers
to
sharing
can
we
eliminate?
Why
don’t
people
share
data?
Is
data
management
Do
attitudes
about
being
taught?
sharing
differ
among
disciplines?
How
can
we
promote
storing
data
in
repositories?