This document discusses challenges and opportunities around research data management. It notes that while the majority of research data is currently stored locally on hard drives, funding agencies and researchers are increasingly focused on sharing, curating and ensuring long-term access to data. However, there are open questions around how to incentivize researchers to share data, ensure sustainable funding models for repositories, and develop interoperable metadata standards. The document explores potential roles for libraries, institutions, publishers and domain-specific repositories in addressing these issues.
1. The Metadata [R]evolution: Transformative Opportunities
September 18, 2013
Some Ideas on Making Research Data
Discoverable and Usable:
“It’s the Metadata, Stupid!”
Anita de Waard, VP Research Data Collaborations,
Elsevier Research Data Services (VT)
2. Everybody’s talking about research data:
Share research outputs
Demonstrate impact to public
Data availability drives growth
Demonstrate impact
Guarantee permanence, discoverability
Avoid fraud
Generate, track outputs
Comply with mandates
Ensure availability
Archive, track, curate
Support researcher/institution
Archive
Add curation
Allow reuse
Todd Vision, DataDryad, OAI8, 6/23/13:
“We need to find a way to keep Dryad funded, and would
love to hear your ideas about doing that.”
Phil Bourne, Associate Vice Chancellor, UCSD, 4/13:
“We are thinking about the university as a digital
enterprise.”
Mike Huerta, Ass. Director NLM O of Health Info at NIH, 6/13:
“Today, the major public product of science are concepts, written
down in papers. But tomorrow, data will be the main product of
science…. We will require scientists to track and share their data as
least as well, if not better, than they are sharing their ideas today.”
Mara Saule, Dean University Libraries/CIO, UVM, 5/13:
“We need to do something about data.”
Derive credit
Comply with mandates
Discover and use
Cite/acknowledge
Gov
Funding
bodies
University
management
Researchers
Librarians
Data
Repositories
Nathan Urban, PI Urban Lab, CMU, 3/13:
“If we can share our data, we can write a paper that will
knock everybody’s socks off!”
Roles and needs wrt Research Data:
Barbara Ransom, NSF Program Director Earth Sciences, 2/13:
“We’re not going to spend any more money for you to go out
and get more data! We want you first to show us how you’re
going to use all the data we paid y’all to collect in the past!”
3. Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
4. Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
How do we get researchers to
curate, store and share their
data?
5. Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
How do we get researchers to
curate, store and share their
data?
How do we ensure
long-term
sustainability for
high-end repositories?
6. Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
How do we get researchers to
curate, store and share their
data?
How do we ensure
long-term
sustainability for high-
end repositories?
What role do
libraries/institution
s play?
10. Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
11. Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
12. Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
13. Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
14. de Waard, A., Burton, S. et al., 2013
An attempt to get researchers to curate
(but only partially share!) their data:
15. de Waard, A., Burton, S. et al., 2013
An attempt to get researchers to curate
(but only partially share!) their data:
16. What to do in the meantime:
49 publications193 publications 76 publications 214 publications 210 publicat
• In 220 publications only 40% of antibodies, 40% of cell lines and 25% of
constructs can be manually identified (Vasilevsky et al, submitted)
• Proposal (with NIH/NIF and Force11 Group):
– Adding minimal data standards
– Tool extracts likely reagents / resources
– User interface asks author to confirm or select
17. How can research databases become
sustainable in the long term?
1. With IEDA:
– Building a database for lunar
geochemistry
– Write joint report on building repository, curation
costs and challenges
2. With WDS/RDA WG:
– Planning survey of cost recovery models
– Input/inspiration: ICPSR Sloane-funded project
‘Sustaining Domain Repositories for Digital Data’
– Developing overarching funding model with Todd
Vision/DataDryad
23. Comparing data repository types:
Repository Advantages Disadvantages
Local data
repository
Easy! No one steals
your data.
No one sees it.
Not compliant with
requirements
Generic data
repository
Not very hard to do.
Have complied!
Data can’t be easily
reused. Credit?
Institutional
Repository
Can use existing IR?
Tracking and
compliance checks.
Data can’t easily be
reused. Credit?
Domain-specific
data repository
Data can be reused.
Credit!
Lot of work for
curators. Long-term
sustainable?
Effort,Reuse,Credit,Compliance
Habit,Ease,Privacy,Control
Higherqualitymetadata
24. Funding Agency: University:
Collaborators:Domain of study:Domain-Specific
Data Repository
Local
Data Repository
Institutional
Data Repository
Generic
Data Repository
AND
THEYALL
WANT
DIFFERENT
METADATA!!!!
Metadata madness…
25. Where do IRs/libraries fit in?
• Planning series of interviews at key institutions:
– What role do libraries/institutions play wrt research
data management?
– What tools/metadata standards are used?
– What aspects of data deposition is the Research
Office/IR/Institution interested in?
– How does this compare with what scientists want
and do in their labs?
• Goal: share knowledge; establish plan of action
26. Principles of Elsevier RDS:
• Main goal: make research data optimally available,
discoverable and reusable.
• Collaboration is tailored to partner’s unique needs:
– Working with a few domain-specific and institutional
repositories and institutions
– Aspects where collaboration is needed are discussed
– Collaboration plan is drawn up using SLA: agree on time,
conditions, etc.
• 2013: series of pilots, studies and reports to enable
feasibility study:
– What are key needs?
– Can Elsevier play a role: skillsets, partnerships?
– Is there a (transparent) business model for this?
27. In summary:
If researchers start to curate and share their data…
And research databases become long-term
sustainable…
… we enable enrichment with high-quality metadata
that makes research data truly discoverable and
reusable.
Many questions remain:
? What role would the institution/library play?
? How do we ensure interoperable metadata?
? What are sustainable models, moving forward?
? Is there a place for publishers, in all this?
28. Thank you!
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy
• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming,
Ilya Zaslavsky
• NIF: Maryann Martone, Anita Bandrowski
• MSU: Brian Bothner
• OHSU: Melissa Haendel, Nicole Vasilevsky
• California Digital Library: Carly Strasser, John Kunze, Stephen Abrams
• Columbia/IEDA: Kerstin Lehnert, Leslie Hsu
• CNI: Clifford Lynch
• Harvard: Michael Kurtz, Chris Erdmann
• MIT: Micah Altman
• UVM: Mara Saurle
29. Your questions?
Anita de Waard
VP Research Data Collaborations,
Elsevier Research Data Services (VT)
a.dewaard@elsevier.com
http://researchdata.elsevier.com/