3. LIBER & the European Research Infrastructure
LIBER (Association of European Research Libraries)
-Projects:
Content
Europeana Libraries
Europeana Newspapers
Policy
MEDOANET
Infrastructure
APARSEN
AAA Study
ODE
6. Rule #11: Don’t Publicize!
Unless the break is a well known spot, like for e.g. Lahinch,
Bundoran, or Strandhill, taking photo’s and posting them on
the Internet is regarded as unacceptable in the surfing
community. If you publicize a break in this manner you draw
attention to it, which in turns draws more people to it, which
means a place gets more crowded and there is more aggro
in the water. The more you talk about a break to those who
haven’t surfed it the more damage you do to it, and yourself
in the long run because the more people there are in the
water the less waves there are for you. Think about it.
http://www.boards.ie/vbulletin/showthread.php?s=fc082712ef1354ecf7cb0e53dc71d519&t=2055828999
7. Reason not to share surf info
• Other people will steal my wave
• Unethical to share e.g.inexperienced surfers on dangerous
breaks get hurt
• We won’t get recognition e.g. local surfers loose out to
visiting pros
• .............
8.
9. 15 petabytes (15
million gigabytes)
of data annually –
enough to fill more
than 1.7 million
dual-layer DVDs a
year!
10. The Vision
“With a proper scientific einfrastructure, researchers in
different domains can collaborate on the same data set,
finding new insights. They can share a data set easily across
the globe, but also protect its integrity and ownership. They
can use, re-use and combine data, increasing productivity.
They can more easily solve today’s Grand Challenges, such
as climate change and energy supply. Indeed, they can
engage in whole new forms of scientific inquiry, made
possible by the unimaginable power of the e-infrastructure to
find correlations, draw inferences and trade ideas and
information at a scale we are only beginning to see.”
11. Now and Next
• Authentication & authorisation
• New skills
12. The Opportunities for Data Exchange Project
• identify, collate, interpret and deliver evidence of emerging
best practices in sharing, re-using, preserving and citing
data, the drivers for these changes and barriers impeding
progress, in forms suited to each audience
• policy makers, funders, infrastructure operators, data
centres, data providers and users, libraries and publishers
13. Steps to creating the conditions for data sharing
• Understand data sharing today
• Collection of "success stories”, “near misses” and “honourable
failures” in data sharing, re-use and preservation
• Data & scholarly communications
• Integrating data and publications
• Best practice in data citation
• New roles
• Identify drivers and barriers
• Interviews with stakeholder
to seek consensus
Foto "Bell", Noordewierweg 116, Amersfoort.
14. Tales of Data sharing
• 21 stories
• scientific communities
• infrastructure initiatives
• management
• other relevant stakeholders
15.
16. The Astronomical Importance of Discoverability
• Galaxy Zoo (Carolin Liefke)
• Pre-processed data shared with the public to carry out
specific tasks (e.g. classifying galaxies)
• Discoverability a major challenge
in data sharing- easier, more
sophisticated data mining, more
complex automated processing
17. Hypotheses
“Without the infrastructure
that helps scientists manage
their data in a convenient
and efficient way, no
culture of data sharing will
evolve.”
Stefan Winkler-Nees
(German Research Foundation, DFG)
18. Hypotheses Expected
Category: Infrastructure
“An international research community needs
an international data infrastructure and
international support.”
"After decades of reports with data in their
titles the community found inadequate
services almost no international support and
few solutions.”
19. Tension between hypotheses
Cat: Legislation, Education, Behaviour
“Premature data releases should not be
enforced, but the mere possibility of data
misinterpretation is no reason for not sharing
data.”
“To avoid misuse and lack of
acknowledgement of very special data, access
should be restricted to skilled persons trained
by the data creator.”
24. The Data
Publication Pyramid (1) Data
contained and
explained within
the article
(2) Further data
explanations in
any kind of
supplementary (3) Data
files to articles referenced from
the article and
held in data
centers and
(4) Data
repositories
publications,
describing
available
datasets
(5) Data in
drawers and on
disks at the
institute
25. Where do you currently store your research data?
(multiple answers possible)
Source: PARSE.Insight survey 2009, N = 1202
26. The Pyramid’s likely short term reality:
(1) Top of the
pyramid is stable
but small
(2) Risk that
supplements to
articles turn into
Data Dumping (3) Too many
places disciplines lack
a community
endorsed data
archive
(4) Estimates
are that at least
75 % of
research data is
never made
openly avaiable
26
27. The Ideal Pyramid
(1) More
integration of text
and data, viewers
and seamless
links to interactive
datasets
(2) Only if data
cannot be
integrated in (3) Seamless links
article, and only (bi-directional)
relevant extra between
explanations publications and
data, interactive
(4) More Data viewers within the
Journals that articles
describe
datasets, data
mgt plans and
data methods
27
28. A famous paper in Nature:
DNA structure - 1953
• 1 page
• 2 authors
• 1 figure
• no data
Source: V. Kiermer, Nature Publishing Group, 2011
29. Nature in 2001:
The human genome issue
• 62 pages, 49 figures, 27 tables
Source: V. Kiermer, Nature Publishing Group, 2011
30. A thousand genomes – 2010
http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html
Raw data: 12,145 SRA
Raw data: 12,145 SRA
run ids submitted to
run ids submitted to
Short Read Archive
Short Read Archive
Source: V. Kiermer, Nature Publishing Group, 2011
31. Elsevier offers gene and protein viewers
from within the article, to data stored elsewhere:
31
33. Issues for researchers
• Researchers need somewhere to put data and make it safe
for reuse
• Researchers need to control its sharing and access
• Researchers need the ability to integrate data and
publication
• Researchers need to get credit
for data as a first class research
object
• Researchers need someone to
pay for the costs of data availability
and re-use
34. Library support for the researcher
Libraries and data centres must support…
Availability
• data as first class research object: publishing,
persistent identification/citation of datasets
• data description, metadata, standards Findability
documentation and retrieval
• proper documentation of data
Interpretability
• long-term data archiving including data
curation and preservation
Re-usability
35. 7 Areas of Opportunity
• Availability
• Findability
• Interpretability
• Reusability
• Citability
• Curation
• Preservation
36. Researcher Opportunities
Data Issue: Researchers opportunities:
Availability Researchers demand their data be treated as first class research objects
Researchers loosen control over data
Define roles of responsibility and control
Findability Agree convention to propose to publishers regarding data citation
Use of persistent identifiers such as DOI’s
Ensure common citation practices
Interpretability Recognize that data require metadata and work towards community best practice in metadata development
Re-usability Be concerned about the long term ability for secondary use and consider or seek out responsible preservation
actions
Citability Agree a convention for data citation
Follow metadata standards for datasets
Use of persistent identifiers such as DOI’s
Curation Develop sustainable and realistic data management plans
Collaboration with public data archives
Preservation Develop sustainable realistic preservation plans
Active engagement with public data archives
37. Publishers’ Opportunties
Data Issue: Publishers opportunities (Chapter 3):
Availability Articles with data provide richer content and higher usage
Impose stricter editorial policies about availability of underlying data which is in line with general funder’s trends
Ensure data is stored in a safe place, preferably a public repository
Be transparent about curation and preservation of submitted data
Findability Ensure bi-directional links between data and publications
Ensure common citation practices
Interpretability Provide services around data such as viewer apps for underlying data from within the article or interactive graphs,
tables and images
Data Publications
Re-usability Interactive data from within articles
Links to the relevant datasets, not just to the database
Data Publications
Citability Establish uniform data citation standards
Follow metadata standards for datasets
Use of persistent identifiers such as DOI’s
Data Publications
Curation Transparency about curation of submitted data
Collaboration with public data archives
Preservation Transparency about preservation of submitted data
Collaboration with public data archives
38. Libraries’ Opportunities
Data Issue: Libraries and data centres opportunities (Chapter 4):
Availability Lower barriers to researchers to make their data available.
Integrate data sets into retrieval services.
Findability Support of persistent identifiers.
Engage in developing common metadescription schemas and common citation practices.
Promote use of common standards and tools among researchers
Interpretability Support crosslinks between publications and datasets.
Provide and help researchers understand metadescriptions of datasets.
Establish and maintain knowledge base about data and their context.
Re-usability Curate and preserve datasets.
Archive software needed for re-analysis of data.
Be transparent about conditions under which data sets can be re-used (expert knowledge needed, software
needed).
Citability Engage in establishing uniform data citation standards.
Support and promote persistent identifiers.
Curation/Preservation Transparency about curation of submitted data.
Promote good data management practice.
Collaborate with data creators
Instruct researchers on discipline specific best practices in data creation (preservation formats, documentation of
experiment,…)
39. Q. What exactly should the role of the library be and what
are the skills we need?
40. Data Citation: Getting Credit!
• Challenges:
• granularity: which bits inside the dataset is being referred to
• versioning: in case of dynamic or regularly updated data, which
version is cited
• retrievability: indicate via DOIs or accession numbers where the
data are retrievable
Overview of best practices reported in literature and through
interviews with experts
41. Some Findings
• Citations with persistent identifiers should be listed in the
references/bibliography to enable tracking of citation metrics.
• Publishers need to provide guidance for authors and
referees on citation of data.
• Researchers need to nurture awareness in their community
of the benefits of data citation, and follow citation guidelines
given by publishers and data centres.
• Many researchers do not appear to see the value and benefits of
data citation. How different communities can work together to
promote this activity and the status of datasets as primary
research outputs and publishable works in their own right, is an
issue that still needs to be addressed.
42. Our Relationship
Many researchers do not appear to see the value and benefits of data
citation. There is a gap, which could be filled by libraries, in advocacy
for data sharing, the use of subject specific repositories, and best
practice in data citation. These, if filled, would increase the number of
researchers sharing and reusing data.
The issue still to be
addressed is how different
communities can work together
to promote this activity and
the status of datasets as
primary research outputs and
publishable works
in their own right.
43. Now & Next
• For ODE:
• Verify hypotheses as drivers and barriers
• Translate findings for various target groups
• For LIBER:
• Continue to find ways of supporting data sharing
• Return to the framework for the collaborative data infrastructure
44. Now and Next
• Authentication & authorisation
• New skills
45. Addressing Trust and Data Curation
• AAA Study
• Authentication and authorisation infrastructure for European
researchers
• On the Riding the Wave wish list: “Distributed and collaborative
authentication, authorisation and accounting”
• Safe depositing of data
• Authenticity and provenance
• Ensure recognition
• Safe environments for collaboration
46. Addressing Trust and Data Curation
• Alliance for Permanent Access to the Record of Science in
Europe Network (APARSEN)
• look across the excellent work in digital preservation which is
carried out in Europe and to try to bring it together under a
common vision
• Trust, Sustainability, Usability, Access
51. Has enabeled surfers to do things they only dreamed
about
• Big wave hunters….
http://theweek.com/article/index/227955/the-biggest-wave-ever-surfed-the-mind-blowing-video
52. Further Reading
Riding the Wave (2011)
http://www.cordis.europa.eu/fp7/ict/e.../hlg-sdi-report.pdf
ODE/APARSEN Publications
http://www.alliancepermanentaccess.org/index.php/community/current-projects
AAA Study
https://confluence.terena.org/display/aaastudy/AAA+Study+Home+Page
53. Credits
Slide reused from presentations by:
Salvatore Mele (CERN)
Eefke Smit (STM)
Hans Pfeiffenberger (Helmholtz)
Most images sourced through The European Library
I thought I would start where most people normally end by first saying thank you. I work on a lot of projects which focus on research data sharing and curation. I talk with libraries and publishers, funders, and research institutes about the type of infrastructure we need to promote and realise the full potential of data sharing. Regardless of the context, whether it be data preservation, curation, access, resuse, citation, the key to the success of this infrastructure is buy-in from researchers. Putting the carrot and the stick of incentivisation and mandating aside, research need to be convinced that data sharing is something they want to do. So, I’m very happy that LERU has invited to be here today, to discuss the drivers and barriers for data sharing with actual researchers. So thank you to LERU for this opportunity and thank you for having enough interest in this subject to be here today and to hopefully take up the data sharing baton.
Before we get in to the drivers and barriers for data sharing I would like to ‘share’ 2 things about me with you.. First of all, I am a librarian. I work as project officer for LIBER, which is the Association of European Research Libraries. We have 380 member libraries from all over Europe. Our projects really focus on developing the role of the library as part of the Europeana Research Infrastructure and they fall into 3 main categories.
So, waves and surfing are anaologies that are often used when referring to the data deluge and research data sharing. This report ‘Riding the Wave’ which was written by the High Level Expert Group n Scientific Data in october 2010 talks about how Europe can gain from the rising tide of scientific data.
Doing this since 2008. Involves 160 computer centres around the world
Called for a frameworkk for collaborative data infrastructure to outline how different stakeholders interact with the data sharing system
Researcher as end user and researcher as data creator
Libraries and data centres must support data publishing as a prerequisite for data availability, including persistent identification/citation of datasets, and solutions for data description and retrieval, which together facilitate findability. They must also ensure that data is properly documented as a condition for data interpretability and re-usability and prepare for long-term data archiving including data curation and preservation.
Called for a frameworkk for collaborative data infrastructure to outline how different stakeholders interact with the data sharing system
I thought I would start where most people normally end by first saying thank you. I work on a lot of projects which focus on research data sharing and curation. I talk with libraries and publishers, funders, and research institutes about the type of infrastructure we need to promote and realise the full potential of data sharing. Regardless of the context, whether it be data preservation, curation, access, resuse, citation, the key to the success of this infrastructure is buy-in from researchers. Putting the carrot and the stick of incentivisation and mandating aside, research need to be convinced that data sharing is something they want to do. So, I’m very happy that LERU has invited to be here today, to discuss the drivers and barriers for data sharing with actual researchers. So thank you to LERU for this opportunity and thank you for having enough interest in this subject to be here today and to hopefully take up the data sharing baton.