SlideShare a Scribd company logo
1 of 29
The Metadata [R]evolution: Transformative Opportunities
September 18, 2013
Some Ideas on Making Research Data
Discoverable and Usable:
“It’s the Metadata, Stupid!”
Anita de Waard, VP Research Data Collaborations,
Elsevier Research Data Services (VT)
Everybody’s talking about research data:
 Share research outputs
 Demonstrate impact to public
 Data availability drives growth
 Demonstrate impact
 Guarantee permanence, discoverability
 Avoid fraud
 Generate, track outputs
 Comply with mandates
 Ensure availability
 Archive, track, curate
 Support researcher/institution
 Archive
 Add curation
 Allow reuse
Todd Vision, DataDryad, OAI8, 6/23/13:
“We need to find a way to keep Dryad funded, and would
love to hear your ideas about doing that.”
Phil Bourne, Associate Vice Chancellor, UCSD, 4/13:
“We are thinking about the university as a digital
enterprise.”
Mike Huerta, Ass. Director NLM O of Health Info at NIH, 6/13:
“Today, the major public product of science are concepts, written
down in papers. But tomorrow, data will be the main product of
science…. We will require scientists to track and share their data as
least as well, if not better, than they are sharing their ideas today.”
Mara Saule, Dean University Libraries/CIO, UVM, 5/13:
“We need to do something about data.”
 Derive credit
 Comply with mandates
 Discover and use
 Cite/acknowledge
Gov
Funding
bodies
University
management
Researchers
Librarians
Data
Repositories
Nathan Urban, PI Urban Lab, CMU, 3/13:
“If we can share our data, we can write a paper that will
knock everybody’s socks off!”
Roles and needs wrt Research Data:
Barbara Ransom, NSF Program Director Earth Sciences, 2/13:
“We’re not going to spend any more money for you to go out
and get more data! We want you first to show us how you’re
going to use all the data we paid y’all to collect in the past!”
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
How do we get researchers to
curate, store and share their
data?
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
How do we get researchers to
curate, store and share their
data?
How do we ensure
long-term
sustainability for
high-end repositories?
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Institutional
Repositories
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
How do we get researchers to
curate, store and share their
data?
How do we ensure
long-term
sustainability for high-
end repositories?
What role do
libraries/institution
s play?
Research data management in action:
Using antibodies
Research data management in action:
Using antibodies
and squishy bits
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
Research data management in action:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
de Waard, A., Burton, S. et al., 2013
An attempt to get researchers to curate
(but only partially share!) their data:
de Waard, A., Burton, S. et al., 2013
An attempt to get researchers to curate
(but only partially share!) their data:
What to do in the meantime:
49 publications193 publications 76 publications 214 publications 210 publicat
• In 220 publications only 40% of antibodies, 40% of cell lines and 25% of
constructs can be manually identified (Vasilevsky et al, submitted)
• Proposal (with NIH/NIF and Force11 Group):
– Adding minimal data standards
– Tool extracts likely reagents / resources
– User interface asks author to confirm or select
How can research databases become
sustainable in the long term?
1. With IEDA:
– Building a database for lunar
geochemistry
– Write joint report on building repository, curation
costs and challenges
2. With WDS/RDA WG:
– Planning survey of cost recovery models
– Input/inspiration: ICPSR Sloane-funded project
‘Sustaining Domain Repositories for Digital Data’
– Developing overarching funding model with Todd
Vision/DataDryad
Making lunar sample data usable:
Making lunar sample data usable:
Making lunar sample data usable:
Making lunar sample data usable:
Private
store
Data
producer
or sponsor
Acces
s
Closed
Flow of
funds
Data
publication
Publi
c
Service
Collaboration
Conclave

Limited
Subscriptio
n content


Commercial
overlay

Limited Academic
Use/Limited
Data user
Flow of
funds
Examples ICSP
R,
CERN
-LHC
KEGG
GeoFacets
Reaxys
DRAFT - CC-BY-NC 2013, Todd Vision & Anita de Waard
Many small
operations, e.g.
try-db.org,
plhdb.org
Dryad,
arXiv,
PDB
Commercial
and
institutional
storage

&
or
A research database funding model:
Comparing data repository types:
Repository Advantages Disadvantages
Local data
repository
Easy! No one steals
your data.
No one sees it.
Not compliant with
requirements
Generic data
repository
Not very hard to do.
Have complied!
Data can’t be easily
reused. Credit?
Institutional
Repository
Can use existing IR?
Tracking and
compliance checks.
Data can’t easily be
reused. Credit?
Domain-specific
data repository
Data can be reused.
Credit!
Lot of work for
curators. Long-term
sustainable?
Effort,Reuse,Credit,Compliance
Habit,Ease,Privacy,Control
Higherqualitymetadata
Funding Agency: University:
Collaborators:Domain of study:Domain-Specific
Data Repository
Local
Data Repository
Institutional
Data Repository
Generic
Data Repository
AND
THEYALL
WANT
DIFFERENT
METADATA!!!!
Metadata madness…
Where do IRs/libraries fit in?
• Planning series of interviews at key institutions:
– What role do libraries/institutions play wrt research
data management?
– What tools/metadata standards are used?
– What aspects of data deposition is the Research
Office/IR/Institution interested in?
– How does this compare with what scientists want
and do in their labs?
• Goal: share knowledge; establish plan of action
Principles of Elsevier RDS:
• Main goal: make research data optimally available,
discoverable and reusable.
• Collaboration is tailored to partner’s unique needs:
– Working with a few domain-specific and institutional
repositories and institutions
– Aspects where collaboration is needed are discussed
– Collaboration plan is drawn up using SLA: agree on time,
conditions, etc.
• 2013: series of pilots, studies and reports to enable
feasibility study:
– What are key needs?
– Can Elsevier play a role: skillsets, partnerships?
– Is there a (transparent) business model for this?
In summary:
If researchers start to curate and share their data…
And research databases become long-term
sustainable…
… we enable enrichment with high-quality metadata
that makes research data truly discoverable and
reusable.
Many questions remain:
? What role would the institution/library play?
? How do we ensure interoperable metadata?
? What are sustainable models, moving forward?
? Is there a place for publishers, in all this?
Thank you!
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy
• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming,
Ilya Zaslavsky
• NIF: Maryann Martone, Anita Bandrowski
• MSU: Brian Bothner
• OHSU: Melissa Haendel, Nicole Vasilevsky
• California Digital Library: Carly Strasser, John Kunze, Stephen Abrams
• Columbia/IEDA: Kerstin Lehnert, Leslie Hsu
• CNI: Clifford Lynch
• Harvard: Michael Kurtz, Chris Erdmann
• MIT: Micah Altman
• UVM: Mara Saurle
Your questions?
Anita de Waard
VP Research Data Collaborations,
Elsevier Research Data Services (VT)
a.dewaard@elsevier.com
http://researchdata.elsevier.com/

More Related Content

What's hot

Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...The University of Edinburgh
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse CommonsMerce Crosas
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Robert Oostenveld
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015Jackie Wirz, PhD
 

What's hot (20)

Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
Introduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD StudentsIntroduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD Students
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of OxfordData Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
 
Stephenson - Data Curation for Quantitative Social Science Research
Stephenson - Data Curation for Quantitative Social Science ResearchStephenson - Data Curation for Quantitative Social Science Research
Stephenson - Data Curation for Quantitative Social Science Research
 

Viewers also liked

Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataCyndy Parr
 
Did we get the cart before the horse? (faculty researcher feedback)
Did we get the cart before the horse?  (faculty researcher feedback) Did we get the cart before the horse?  (faculty researcher feedback)
Did we get the cart before the horse? (faculty researcher feedback) Jody DeRidder
 
Big Data Meets Metadata – Analyzing Large Data Sets
Big Data Meets Metadata – Analyzing Large Data SetsBig Data Meets Metadata – Analyzing Large Data Sets
Big Data Meets Metadata – Analyzing Large Data Setslucenerevolution
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 

Viewers also liked (6)

Dataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and PreservationDataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and Preservation
 
A Fruitful Collaboration: Offering More than Faculty Profiles
A Fruitful Collaboration: Offering More than  Faculty ProfilesA Fruitful Collaboration: Offering More than  Faculty Profiles
A Fruitful Collaboration: Offering More than Faculty Profiles
 
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and data
 
Did we get the cart before the horse? (faculty researcher feedback)
Did we get the cart before the horse?  (faculty researcher feedback) Did we get the cart before the horse?  (faculty researcher feedback)
Did we get the cart before the horse? (faculty researcher feedback)
 
Big Data Meets Metadata – Analyzing Large Data Sets
Big Data Meets Metadata – Analyzing Large Data SetsBig Data Meets Metadata – Analyzing Large Data Sets
Big Data Meets Metadata – Analyzing Large Data Sets
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 

Similar to Making Research Data Discoverable with Metadata

Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Sarah Shreeves
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesSEAD
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...EDINA, University of Edinburgh
 
Sarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspectiveSarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspectiveJisc
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Robin Rice
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...ariadnenetwork
 

Similar to Making Research Data Discoverable with Metadata (20)

Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
Sarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspectiveSarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspective
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 

More from Anita de Waard

Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 

More from Anita de Waard (20)

Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Making Research Data Discoverable with Metadata

  • 1. The Metadata [R]evolution: Transformative Opportunities September 18, 2013 Some Ideas on Making Research Data Discoverable and Usable: “It’s the Metadata, Stupid!” Anita de Waard, VP Research Data Collaborations, Elsevier Research Data Services (VT)
  • 2. Everybody’s talking about research data:  Share research outputs  Demonstrate impact to public  Data availability drives growth  Demonstrate impact  Guarantee permanence, discoverability  Avoid fraud  Generate, track outputs  Comply with mandates  Ensure availability  Archive, track, curate  Support researcher/institution  Archive  Add curation  Allow reuse Todd Vision, DataDryad, OAI8, 6/23/13: “We need to find a way to keep Dryad funded, and would love to hear your ideas about doing that.” Phil Bourne, Associate Vice Chancellor, UCSD, 4/13: “We are thinking about the university as a digital enterprise.” Mike Huerta, Ass. Director NLM O of Health Info at NIH, 6/13: “Today, the major public product of science are concepts, written down in papers. But tomorrow, data will be the main product of science…. We will require scientists to track and share their data as least as well, if not better, than they are sharing their ideas today.” Mara Saule, Dean University Libraries/CIO, UVM, 5/13: “We need to do something about data.”  Derive credit  Comply with mandates  Discover and use  Cite/acknowledge Gov Funding bodies University management Researchers Librarians Data Repositories Nathan Urban, PI Urban Lab, CMU, 3/13: “If we can share our data, we can write a paper that will knock everybody’s socks off!” Roles and needs wrt Research Data: Barbara Ransom, NSF Program Director Earth Sciences, 2/13: “We’re not going to spend any more money for you to go out and get more data! We want you first to show us how you’re going to use all the data we paid y’all to collect in the past!”
  • 3. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories
  • 4. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories How do we get researchers to curate, store and share their data?
  • 5. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories How do we get researchers to curate, store and share their data? How do we ensure long-term sustainability for high-end repositories?
  • 6. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year Majority of data (90%?) is stored on local hard drives Dryad: 7,631 files Dataverse: 0.6 My Institutional Repositories Some data (8%?) stored in large, generic data repositories MiRB: 25k PetDB: 1,5 k TAIR: 72,1 k PDB: 88,3 k SedDB: 0.6 k A small portion of data (1-2%?) stored in small, topic-focused data repositories How do we get researchers to curate, store and share their data? How do we ensure long-term sustainability for high- end repositories? What role do libraries/institution s play?
  • 7. Research data management in action: Using antibodies
  • 8. Research data management in action: Using antibodies and squishy bits
  • 9. Research data management in action: Using antibodies and squishy bits Grad Students experiment
  • 10. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook.
  • 11. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides,
  • 12. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper.
  • 13. Research data management in action: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  • 14. de Waard, A., Burton, S. et al., 2013 An attempt to get researchers to curate (but only partially share!) their data:
  • 15. de Waard, A., Burton, S. et al., 2013 An attempt to get researchers to curate (but only partially share!) their data:
  • 16. What to do in the meantime: 49 publications193 publications 76 publications 214 publications 210 publicat • In 220 publications only 40% of antibodies, 40% of cell lines and 25% of constructs can be manually identified (Vasilevsky et al, submitted) • Proposal (with NIH/NIF and Force11 Group): – Adding minimal data standards – Tool extracts likely reagents / resources – User interface asks author to confirm or select
  • 17. How can research databases become sustainable in the long term? 1. With IEDA: – Building a database for lunar geochemistry – Write joint report on building repository, curation costs and challenges 2. With WDS/RDA WG: – Planning survey of cost recovery models – Input/inspiration: ICPSR Sloane-funded project ‘Sustaining Domain Repositories for Digital Data’ – Developing overarching funding model with Todd Vision/DataDryad
  • 18. Making lunar sample data usable:
  • 19. Making lunar sample data usable:
  • 20. Making lunar sample data usable:
  • 21. Making lunar sample data usable:
  • 22. Private store Data producer or sponsor Acces s Closed Flow of funds Data publication Publi c Service Collaboration Conclave  Limited Subscriptio n content   Commercial overlay  Limited Academic Use/Limited Data user Flow of funds Examples ICSP R, CERN -LHC KEGG GeoFacets Reaxys DRAFT - CC-BY-NC 2013, Todd Vision & Anita de Waard Many small operations, e.g. try-db.org, plhdb.org Dryad, arXiv, PDB Commercial and institutional storage  & or A research database funding model:
  • 23. Comparing data repository types: Repository Advantages Disadvantages Local data repository Easy! No one steals your data. No one sees it. Not compliant with requirements Generic data repository Not very hard to do. Have complied! Data can’t be easily reused. Credit? Institutional Repository Can use existing IR? Tracking and compliance checks. Data can’t easily be reused. Credit? Domain-specific data repository Data can be reused. Credit! Lot of work for curators. Long-term sustainable? Effort,Reuse,Credit,Compliance Habit,Ease,Privacy,Control Higherqualitymetadata
  • 24. Funding Agency: University: Collaborators:Domain of study:Domain-Specific Data Repository Local Data Repository Institutional Data Repository Generic Data Repository AND THEYALL WANT DIFFERENT METADATA!!!! Metadata madness…
  • 25. Where do IRs/libraries fit in? • Planning series of interviews at key institutions: – What role do libraries/institutions play wrt research data management? – What tools/metadata standards are used? – What aspects of data deposition is the Research Office/IR/Institution interested in? – How does this compare with what scientists want and do in their labs? • Goal: share knowledge; establish plan of action
  • 26. Principles of Elsevier RDS: • Main goal: make research data optimally available, discoverable and reusable. • Collaboration is tailored to partner’s unique needs: – Working with a few domain-specific and institutional repositories and institutions – Aspects where collaboration is needed are discussed – Collaboration plan is drawn up using SLA: agree on time, conditions, etc. • 2013: series of pilots, studies and reports to enable feasibility study: – What are key needs? – Can Elsevier play a role: skillsets, partnerships? – Is there a (transparent) business model for this?
  • 27. In summary: If researchers start to curate and share their data… And research databases become long-term sustainable… … we enable enrichment with high-quality metadata that makes research data truly discoverable and reusable. Many questions remain: ? What role would the institution/library play? ? How do we ensure interoperable metadata? ? What are sustainable models, moving forward? ? Is there a place for publishers, in all this?
  • 28. Thank you! Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy • UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky • NIF: Maryann Martone, Anita Bandrowski • MSU: Brian Bothner • OHSU: Melissa Haendel, Nicole Vasilevsky • California Digital Library: Carly Strasser, John Kunze, Stephen Abrams • Columbia/IEDA: Kerstin Lehnert, Leslie Hsu • CNI: Clifford Lynch • Harvard: Michael Kurtz, Chris Erdmann • MIT: Micah Altman • UVM: Mara Saurle
  • 29. Your questions? Anita de Waard VP Research Data Collaborations, Elsevier Research Data Services (VT) a.dewaard@elsevier.com http://researchdata.elsevier.com/