SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
| 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
NSF Workshop
February 28, March 1, 2017
Data Repositories:
Recommendation,
Certification and Models
for Cost Recovery
| 2
Object of
Study
Raw
Data
Processed
Data
Data
With
Paper
Curated
Record
Method Analysis
Tables/
Figures
Curate
Methods Software
Four Types of Repositories:
Research
Question
NOAA: 20 TB/
NASA streaming > 24 PB/day
NASA Reverb: 12 PB Data
NSSD: > 230 TB of digital data
NSIDC: 1 PB data, : 1 PB total
ALMA Telescope: 40 TB/day
Local Storage/
Instrument Repositories
Size: PB
Nr of files: Trillions
Deep Blue (Umich): 80k
MIT Dspace: 75 k
HAL (France): 60 k
D-Space Cambr: 1.5 k
Of which data: hundreds
Institutional/Local
Repositories
Size: GB
Nr of files: Billions
Figshare: 1.2 M
DataDryad: 3 k
Dataverse: 58 k
Non-Domain
Repositories
Size: MB
Nr of files: Milliions
Domain
Repositories
PetDB: 6 k
PDB: 100 k
NIST ASD: 170 k
Size: kB
Nr of files: 100ks
Publication
| 3
Recommended vs Certified Data Repositories [1]
•  Studied repositories recommended by 17 organisations:
•  Compiled list of 242 recommended repositories
•  Identified criteria for recommendation
•  Identified overlap between recommendations (Fig 1)
•  Identified 5 certification schema’s:
•  Compiled list of 129 certified repositories
•  Identified criteria for certification
•  Identified overlap between recommended & certified repositories (Fig 2)
Figure 1: Most repositories are
recommended by < 3 parties
Figure 2: Most recommended
repositories are not certified
[1] All data is openly available at doi:10.17632/zx2kcyvvwm.1
| 4
Set Of Shared Criteria Between Recommendation and
Certification of Repositories
Umbrella	
  
Categories	
  
Shared	
  Meaning	
   Recommended	
  Repository	
  Criteria	
   Repository	
  Cer8fica8on	
  Scheme	
  
Criteria	
  
Mission	
   Explicit	
  mission	
  statement	
  in	
  
providing	
  long-­‐term	
  responsibility,	
  
persistence,	
  and	
  management	
  of	
  
data(sets)	
  
Community/
Recogni8on	
  
Evidence	
  of	
  use	
  by	
  downloads	
  or	
  cita<ons	
  
from	
  an	
  iden<fiable	
  and	
  ac<ve	
  user	
  
community	
  
Understand	
  and	
  meet	
  the	
  needs	
  
of	
  the	
  designated	
  and	
  defined	
  
target	
  community	
  
Legal	
  and	
  
Contractual	
  
Compliance	
  
Repository	
  operates	
  within	
  a	
  legal	
  
framework/Ensures	
  compliance	
  
with	
  legal	
  regula<ons	
  
When	
  applicable,	
  have	
  	
  contractual	
  
regula<ons	
  governing	
  the	
  protec<on	
  of	
  
human	
  subjects	
  
Contracts	
  and	
  agreements	
  
maintained	
  with	
  relevant	
  par<es	
  
on	
  relevant	
  subjects	
  
Access/Accessibility	
   Public	
  access	
  to	
  the	
  scien<fic/
repository	
  designated	
  community	
  
Anonymous	
  referees	
  (including	
  peer-­‐
reviewers)	
  have	
  access	
  to	
  the	
  data	
  before	
  
public	
  release	
  as	
  indicated	
  by	
  policies	
  
Technical	
  
Structure/Interface	
  
The	
  soIware	
  system	
  supports	
  data	
  
organisa<on	
  and	
  searchability	
  by	
  both	
  
humans	
  and	
  computers.	
  The	
  interface	
  is	
  
intui<ve	
  and	
  mobile	
  user-­‐friendly	
  
The	
  technical	
  (infra)structure	
  is	
  
appropriate,	
  protec<ve,	
  and	
  
secure	
  
Retrievability	
   Data	
  need	
  to	
  have	
  enough	
  
metadata.	
  All	
  data	
  receive	
  a	
  
persistent	
  iden<fier	
  
Preserva8on	
   Long-­‐term	
  and	
  formal	
  
preserva<on/succession	
  plan	
  for	
  
the	
  data,	
  even	
  if	
  the	
  repository	
  
ceases	
  to	
  exist	
  
If	
  the	
  data	
  are	
  retracted,	
  the	
  persistent	
  
iden<fier	
  needs	
  to	
  be	
  maintained	
  
Preserva<on	
  of	
  data	
  informa<on	
  
proper<es	
  and	
  metadata	
  
Final report: Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena (2017), “"Recommended versus
Certified Repositories: Mind the Gap"”, Submitted for Revision Codata Data Science Journal, Feb 20, 2017
| 5
Debit Economy (like a pie)
•  Single pile of ‘stuff’ gets divided:
-  Thing can only be for one person
at one time
-  “If you get more, I get less”
•  Examples:
-  Money
-  Jobs
-  Samples, equipment, space, etc.
•  Behaviors:
-  Hoarding, secrecy
-  (Cut-throat) competition
-  Winning by owning
(and not sharing)
Credit Economy (like a song)
•  Credit comes from visibility:
-  The more you give away,
the more you benefit
-  “Only if I share do I really own”
(“You need me to do you!” JW)
•  Examples:
-  Papers, citations
-  Good ideas (if credited)
-  Skills
•  Behaviors:
-  Open access, citation game
-  Collaboration with top-X
-  Winning by sharing
(to enable priority & visibility)
Two Economies of Science [3]:
[3] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1
<<<DATA???
| 6
RDA Repository Cost Recovery IG
•  Interviewed 22 repositories & reported [2]
•  Different income streams:
1.  Structurally funded
2.  Mostly data access charges
3.  Mostly data deposit fees
4.  Membership fees (for deposits and/or access)
5.  Serial project funding
6.  Supported by host institution
•  Different new models under considerations:
•  Sponsorships/services for the commercial sector
•  Contracts for specific services offered (hosting, archiving, curation)
•  Expanding the number of affiliated institutions
•  Deposit fees
•  More services for “national memory institutes”
•  Some comments:
•  Some countries structurally fund repositories (not US!)
•  Some repositories embedded in scholarly practice
•  Hard to come up with new models: no time, no skill sets!
•  Next step: OECD/GSF WG studies more in-depth, more countries:
http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models
[2] Available at https://www.rd-alliance.org/final-report-income-streams-data-repositories.html
| 7
Thank you!
More on Elsevier’s RDM program and other interesting efforts:
•  https://www.hivebench.com
•  https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-
international-data-rescue-award-in-the-geosciences
•  http://www.journals.elsevier.com/softwarex/
•  https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking
•  https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
•  https://rd-alliance.org/bof-data-search.html
•  https://datasearch.elsevier.com/
•  https://data.mendeley.com/
•  https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
•  https://www.force11.org/
•  http://www.nationaldataservice.org/
•  https://rd-alliance.org/
•  https://www.elsevier.com/about/open-science/research-data
Anita de Waard, a.dewaard@elsevier.com

Contenu connexe

Tendances

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data networkJisc RDM
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
DataUp Lightning Talk for #iEvoBio
DataUp Lightning Talk for #iEvoBioDataUp Lightning Talk for #iEvoBio
DataUp Lightning Talk for #iEvoBioCarly Strasser
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodeiASIS&T
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management HubDenis Parfenov
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesASIS&T
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementASIS&T
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesSEAD
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 

Tendances (19)

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
Wheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation RelayWheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation Relay
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
EDI Training Module 2: EDI Project
EDI Training Module 2:  EDI ProjectEDI Training Module 2:  EDI Project
EDI Training Module 2: EDI Project
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
DataUp Lightning Talk for #iEvoBio
DataUp Lightning Talk for #iEvoBioDataUp Lightning Talk for #iEvoBio
DataUp Lightning Talk for #iEvoBio
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management Hub
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case Studies
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data Management
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 

En vedette

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and dataAndrew Treloar
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Chris Rusbridge
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesVarsha Khodiyar
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Leon Osinski
 
Open Data Repositories
Open Data RepositoriesOpen Data Repositories
Open Data RepositoriesXavier Ochoa
 
Proses Penggubalan Undang2
Proses Penggubalan Undang2Proses Penggubalan Undang2
Proses Penggubalan Undang2azam_hazel
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
 

En vedette (9)

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional Repositories
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
 
Open Data Repositories
Open Data RepositoriesOpen Data Repositories
Open Data Repositories
 
Proses Penggubalan Undang2
Proses Penggubalan Undang2Proses Penggubalan Undang2
Proses Penggubalan Undang2
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 

Similaire à Data Repositories: Recommendation, Certification and Models for Cost Recovery

Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementopl10
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...ariadnenetwork
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDMMarieke Guy
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for LibrariansMarieke Guy
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Sarah Shreeves
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationMichael Day
 
Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Leon Osinski
 

Similaire à Data Repositories: Recommendation, Certification and Models for Cost Recovery (20)

Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for Librarians
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Data management
Data management Data management
Data management
 
Data management plans
Data management plansData management plans
Data management plans
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...
 

Plus de Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papersAnita de Waard
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 

Plus de Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Data Repositories: Recommendation, Certification and Models for Cost Recovery

  • 1. | 1 Anita de Waard 0000-0002-9034-4119 VP Research Data Collaborations Elsevier RDM Services a.dewaard@elsevier.com NSF Workshop February 28, March 1, 2017 Data Repositories: Recommendation, Certification and Models for Cost Recovery
  • 2. | 2 Object of Study Raw Data Processed Data Data With Paper Curated Record Method Analysis Tables/ Figures Curate Methods Software Four Types of Repositories: Research Question NOAA: 20 TB/ NASA streaming > 24 PB/day NASA Reverb: 12 PB Data NSSD: > 230 TB of digital data NSIDC: 1 PB data, : 1 PB total ALMA Telescope: 40 TB/day Local Storage/ Instrument Repositories Size: PB Nr of files: Trillions Deep Blue (Umich): 80k MIT Dspace: 75 k HAL (France): 60 k D-Space Cambr: 1.5 k Of which data: hundreds Institutional/Local Repositories Size: GB Nr of files: Billions Figshare: 1.2 M DataDryad: 3 k Dataverse: 58 k Non-Domain Repositories Size: MB Nr of files: Milliions Domain Repositories PetDB: 6 k PDB: 100 k NIST ASD: 170 k Size: kB Nr of files: 100ks Publication
  • 3. | 3 Recommended vs Certified Data Repositories [1] •  Studied repositories recommended by 17 organisations: •  Compiled list of 242 recommended repositories •  Identified criteria for recommendation •  Identified overlap between recommendations (Fig 1) •  Identified 5 certification schema’s: •  Compiled list of 129 certified repositories •  Identified criteria for certification •  Identified overlap between recommended & certified repositories (Fig 2) Figure 1: Most repositories are recommended by < 3 parties Figure 2: Most recommended repositories are not certified [1] All data is openly available at doi:10.17632/zx2kcyvvwm.1
  • 4. | 4 Set Of Shared Criteria Between Recommendation and Certification of Repositories Umbrella   Categories   Shared  Meaning   Recommended  Repository  Criteria   Repository  Cer8fica8on  Scheme   Criteria   Mission   Explicit  mission  statement  in   providing  long-­‐term  responsibility,   persistence,  and  management  of   data(sets)   Community/ Recogni8on   Evidence  of  use  by  downloads  or  cita<ons   from  an  iden<fiable  and  ac<ve  user   community   Understand  and  meet  the  needs   of  the  designated  and  defined   target  community   Legal  and   Contractual   Compliance   Repository  operates  within  a  legal   framework/Ensures  compliance   with  legal  regula<ons   When  applicable,  have    contractual   regula<ons  governing  the  protec<on  of   human  subjects   Contracts  and  agreements   maintained  with  relevant  par<es   on  relevant  subjects   Access/Accessibility   Public  access  to  the  scien<fic/ repository  designated  community   Anonymous  referees  (including  peer-­‐ reviewers)  have  access  to  the  data  before   public  release  as  indicated  by  policies   Technical   Structure/Interface   The  soIware  system  supports  data   organisa<on  and  searchability  by  both   humans  and  computers.  The  interface  is   intui<ve  and  mobile  user-­‐friendly   The  technical  (infra)structure  is   appropriate,  protec<ve,  and   secure   Retrievability   Data  need  to  have  enough   metadata.  All  data  receive  a   persistent  iden<fier   Preserva8on   Long-­‐term  and  formal   preserva<on/succession  plan  for   the  data,  even  if  the  repository   ceases  to  exist   If  the  data  are  retracted,  the  persistent   iden<fier  needs  to  be  maintained   Preserva<on  of  data  informa<on   proper<es  and  metadata   Final report: Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena (2017), “"Recommended versus Certified Repositories: Mind the Gap"”, Submitted for Revision Codata Data Science Journal, Feb 20, 2017
  • 5. | 5 Debit Economy (like a pie) •  Single pile of ‘stuff’ gets divided: -  Thing can only be for one person at one time -  “If you get more, I get less” •  Examples: -  Money -  Jobs -  Samples, equipment, space, etc. •  Behaviors: -  Hoarding, secrecy -  (Cut-throat) competition -  Winning by owning (and not sharing) Credit Economy (like a song) •  Credit comes from visibility: -  The more you give away, the more you benefit -  “Only if I share do I really own” (“You need me to do you!” JW) •  Examples: -  Papers, citations -  Good ideas (if credited) -  Skills •  Behaviors: -  Open access, citation game -  Collaboration with top-X -  Winning by sharing (to enable priority & visibility) Two Economies of Science [3]: [3] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1 <<<DATA???
  • 6. | 6 RDA Repository Cost Recovery IG •  Interviewed 22 repositories & reported [2] •  Different income streams: 1.  Structurally funded 2.  Mostly data access charges 3.  Mostly data deposit fees 4.  Membership fees (for deposits and/or access) 5.  Serial project funding 6.  Supported by host institution •  Different new models under considerations: •  Sponsorships/services for the commercial sector •  Contracts for specific services offered (hosting, archiving, curation) •  Expanding the number of affiliated institutions •  Deposit fees •  More services for “national memory institutes” •  Some comments: •  Some countries structurally fund repositories (not US!) •  Some repositories embedded in scholarly practice •  Hard to come up with new models: no time, no skill sets! •  Next step: OECD/GSF WG studies more in-depth, more countries: http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models [2] Available at https://www.rd-alliance.org/final-report-income-streams-data-repositories.html
  • 7. | 7 Thank you! More on Elsevier’s RDM program and other interesting efforts: •  https://www.hivebench.com •  https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015- international-data-rescue-award-in-the-geosciences •  http://www.journals.elsevier.com/softwarex/ •  https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking •  https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html •  https://rd-alliance.org/bof-data-search.html •  https://datasearch.elsevier.com/ •  https://data.mendeley.com/ •  https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data •  https://www.force11.org/ •  http://www.nationaldataservice.org/ •  https://rd-alliance.org/ •  https://www.elsevier.com/about/open-science/research-data Anita de Waard, a.dewaard@elsevier.com