Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Data Repositories: Recommendation, Certification and Models for Cost Recovery
1. | 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
NSF Workshop
February 28, March 1, 2017
Data Repositories:
Recommendation,
Certification and Models
for Cost Recovery
2. | 2
Object of
Study
Raw
Data
Processed
Data
Data
With
Paper
Curated
Record
Method Analysis
Tables/
Figures
Curate
Methods Software
Four Types of Repositories:
Research
Question
NOAA: 20 TB/
NASA streaming > 24 PB/day
NASA Reverb: 12 PB Data
NSSD: > 230 TB of digital data
NSIDC: 1 PB data, : 1 PB total
ALMA Telescope: 40 TB/day
Local Storage/
Instrument Repositories
Size: PB
Nr of files: Trillions
Deep Blue (Umich): 80k
MIT Dspace: 75 k
HAL (France): 60 k
D-Space Cambr: 1.5 k
Of which data: hundreds
Institutional/Local
Repositories
Size: GB
Nr of files: Billions
Figshare: 1.2 M
DataDryad: 3 k
Dataverse: 58 k
Non-Domain
Repositories
Size: MB
Nr of files: Milliions
Domain
Repositories
PetDB: 6 k
PDB: 100 k
NIST ASD: 170 k
Size: kB
Nr of files: 100ks
Publication
3. | 3
Recommended vs Certified Data Repositories [1]
• Studied repositories recommended by 17 organisations:
• Compiled list of 242 recommended repositories
• Identified criteria for recommendation
• Identified overlap between recommendations (Fig 1)
• Identified 5 certification schema’s:
• Compiled list of 129 certified repositories
• Identified criteria for certification
• Identified overlap between recommended & certified repositories (Fig 2)
Figure 1: Most repositories are
recommended by < 3 parties
Figure 2: Most recommended
repositories are not certified
[1] All data is openly available at doi:10.17632/zx2kcyvvwm.1
4. | 4
Set Of Shared Criteria Between Recommendation and
Certification of Repositories
Umbrella
Categories
Shared
Meaning
Recommended
Repository
Criteria
Repository
Cer8fica8on
Scheme
Criteria
Mission
Explicit
mission
statement
in
providing
long-‐term
responsibility,
persistence,
and
management
of
data(sets)
Community/
Recogni8on
Evidence
of
use
by
downloads
or
cita<ons
from
an
iden<fiable
and
ac<ve
user
community
Understand
and
meet
the
needs
of
the
designated
and
defined
target
community
Legal
and
Contractual
Compliance
Repository
operates
within
a
legal
framework/Ensures
compliance
with
legal
regula<ons
When
applicable,
have
contractual
regula<ons
governing
the
protec<on
of
human
subjects
Contracts
and
agreements
maintained
with
relevant
par<es
on
relevant
subjects
Access/Accessibility
Public
access
to
the
scien<fic/
repository
designated
community
Anonymous
referees
(including
peer-‐
reviewers)
have
access
to
the
data
before
public
release
as
indicated
by
policies
Technical
Structure/Interface
The
soIware
system
supports
data
organisa<on
and
searchability
by
both
humans
and
computers.
The
interface
is
intui<ve
and
mobile
user-‐friendly
The
technical
(infra)structure
is
appropriate,
protec<ve,
and
secure
Retrievability
Data
need
to
have
enough
metadata.
All
data
receive
a
persistent
iden<fier
Preserva8on
Long-‐term
and
formal
preserva<on/succession
plan
for
the
data,
even
if
the
repository
ceases
to
exist
If
the
data
are
retracted,
the
persistent
iden<fier
needs
to
be
maintained
Preserva<on
of
data
informa<on
proper<es
and
metadata
Final report: Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena (2017), “"Recommended versus
Certified Repositories: Mind the Gap"”, Submitted for Revision Codata Data Science Journal, Feb 20, 2017
5. | 5
Debit Economy (like a pie)
• Single pile of ‘stuff’ gets divided:
- Thing can only be for one person
at one time
- “If you get more, I get less”
• Examples:
- Money
- Jobs
- Samples, equipment, space, etc.
• Behaviors:
- Hoarding, secrecy
- (Cut-throat) competition
- Winning by owning
(and not sharing)
Credit Economy (like a song)
• Credit comes from visibility:
- The more you give away,
the more you benefit
- “Only if I share do I really own”
(“You need me to do you!” JW)
• Examples:
- Papers, citations
- Good ideas (if credited)
- Skills
• Behaviors:
- Open access, citation game
- Collaboration with top-X
- Winning by sharing
(to enable priority & visibility)
Two Economies of Science [3]:
[3] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1
<<<DATA???
6. | 6
RDA Repository Cost Recovery IG
• Interviewed 22 repositories & reported [2]
• Different income streams:
1. Structurally funded
2. Mostly data access charges
3. Mostly data deposit fees
4. Membership fees (for deposits and/or access)
5. Serial project funding
6. Supported by host institution
• Different new models under considerations:
• Sponsorships/services for the commercial sector
• Contracts for specific services offered (hosting, archiving, curation)
• Expanding the number of affiliated institutions
• Deposit fees
• More services for “national memory institutes”
• Some comments:
• Some countries structurally fund repositories (not US!)
• Some repositories embedded in scholarly practice
• Hard to come up with new models: no time, no skill sets!
• Next step: OECD/GSF WG studies more in-depth, more countries:
http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models
[2] Available at https://www.rd-alliance.org/final-report-income-streams-data-repositories.html
7. | 7
Thank you!
More on Elsevier’s RDM program and other interesting efforts:
• https://www.hivebench.com
• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-
international-data-rescue-award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/
• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking
• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
• https://rd-alliance.org/bof-data-search.html
• https://datasearch.elsevier.com/
• https://data.mendeley.com/
• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
• https://www.force11.org/
• http://www.nationaldataservice.org/
• https://rd-alliance.org/
• https://www.elsevier.com/about/open-science/research-data
Anita de Waard, a.dewaard@elsevier.com