OpenAIRE workshop @ OR2016 - From Repositories, for repositories
1. FROM REPOSITORIES,
FOR REPOSITORIES
OpenAIRE services and tools
WORKSHOP @ OR2016 BY:
Jochen Schirrwagen, Natalia Manola, Paolo Manghi, Pedro Principe
info@openaire.eu
13/Jun/2016 – 1:30pm-3:30pm
Dublin, Trinity College
Maxwell Theatre
2. OPENAIRE
INFRASTRUCTURE
• An Open Access /
Open Science
infrastructure
RESEARCH
CONNECTED.
VISIBLE.
MONITORED.
• Content Acquisition,
Workflows, Info
Enrichment,
A KEY TO CONNECT
• Interoperability
Guidelines, How to
register & How to
be compatible
ONE-STOP SHOP
FOR REPOSITORY
MANAGERS
SERVICES
• Broker service and
Dashboard
WORKSHOP TOPICS
OR2016 @ Dublin 2
Q&A
Discussion
groups
3. AGENDA
1. OpenAIRE infrastructure
• An Open Access/Open Science infrastructure (Natalia Manola – 10 min.)
2. Research connected. Research Visible. Research Monitored.
• Content Acquisition, Workflows, Information Enrichment (Jochen Schirrwagen – 20 min.)
3. A key to connect
• Interoperability Guidelines, How to register & How to be compatible (Jochen Schirrwagen,
Pedro Principe – 20 min.)
Questions and answers (10 min.)
1. OpenAIRE one-stop shop for repository managers services
• Broker service and Dashboard (Paolo Manghi & Natalia Manola – 25 min.)
2. Discussion groups (4 groups – 25 min.)
3. Closing (notes from the groups) (10 min.) 3
1h
1h
4. Discussion groups
• What do you consider the most important tools/services for your work?
• what is missing? how about priorities?
• How can OpenAIRE help? in your visibility inside your Univ./research institution? or outside?
• How does your institution plans to extend repository services for research
outputs other than publications?
• How would you feel for providing your repository content for TDM?
…
4 groups – questions…
4OR2016 @ Dublin
6. @openaire_eu
An Open Knowledge &
Research Information Infrastructure
Science. Set Free
Natalia Manola
University of Athens
Athena Research & Innovation Centre
7. Who we are
• An EU project
• In 24x7 operation since Dec 2010
• OpenAIRE
• OpenAIREplus
• OpenAIRE2020
• Consortium of 50 partners
• One of 5 key EU e-Infrastructures
• A legal entity in 2016
• Institutional, national and
international perspectives on OA
policies & e-Infrastructures
Open Access experts
• Building efficient e-Infra technologies
• State of the art technologies (big
data, linked data)
Information & Computer
Science experts
• Legal &policy recommendations
Legal experts
• Best practices for data
• Linking to data infrastructures
Data communities
OR2016 @ Dublin, Ireland June 13, 2016 7
9. OR2016 @ Dublin, Ireland June 13, 2016 9
Literature
Repositorie
s
OA
Journals
Funding Info
Validation
Cleaning
De-duplicating
Inferring
Linking
Organiz
ations
Projects
Authors
Dataset
s
Publicat
ions
Data
Provider
s
…
Monitoring
Reporting
Evaluation
Impact
Classification
Clustering
Analysis
CRIS
systems
A mini EU-CRIS system
Data
Repositorie
s
Metadata
Full text
Usage data
Discovery
Crowdsourcing
Zenodo
APIs
Data Providers OpenAIRE Platform Services
Trends
Aggregatro
s
Enriching
10. Integrated Scientific Information System
OR2016 @ Dublin, Ireland June 13, 2016 10
14.6 mi unique publications
720 validated data providers
370Κ publications linked to
projects from 6 funders
18.5 K datasets linked to
publications
3.5K links to software repositories
33K organizations
11. Pan-European Network
11
• Europe’s diverse landscape requires local support
• Different practices, different mentalities
National Open Access
Desks (NOADs)
Human support network
33 OA expert nodes in all
Europe
• (OA) Policy aligning
• Technical assistance
• Training
OR2016 @ Dublin, Ireland June 13, 2016
13. 13
OA is here to stay.
What are the policies
and practices for a
sustainable OA?
How, who?
What’s ahead?
14. Repository infrastructure is the
key
Modular, adaptable, flexible, extendible
Preservation & content provision for TDM
Value added services by many and different players
Part of a wider e-Infra ecosystem
Participatory, community driven
16. Continuous, real time
Economies of scale
Overlaps, gaps, trends in Europe and beyond
Economies of scope (APCs)
OR2016 @ Dublin, Ireland June 13, 2016 16
Infrastructures provide off-the-shelf
monitoring
Beyond OA…
Open Science Monitor
17. Measuring openness
OR2016 @ Dublin, Ireland June 13, 2016 17
Products License
measures
Availability
measures
Permanence
measures
Format measures
- articles
-
monographs
- data
- software
etc.
- Creative
Commons
- free to
read
- free to
mine
-
embargoed
and
embargo
length
- pay-walled
Lots of different
ways to measure
availability.
Examples:
metadata quality,
discoverability,
crawling,
machine
readability, links to
other resources,
public access to
usage data
Official
certification
- Yes
- No
- No but
committed to
long-term
preservation
Per file formats
(e.g., PDF, PDF-A,
HTML, embedded
figures, tables,
csv, xls, json, xml)
A proposal for an Openness score
OSI2016 – Open impacts working group
19. From OA to Open Science
• Free to read? Not enough any more… TDM is around the corner!
• Publications only? How about research data, software, methods…
• Publishers vs. researchers? More stakeholders have entered the
game, from funders to the public…
• Reproducibility with OA to research results? Openness is about all
phases and processes of the research lifecycle…
• Use and share data via downloads? Think of data driven science
and big data analytics…
Need to think in broader terms
OR2016 @ Dublin, Ireland June 13, 2016 19
23. 23
OpenAIRE Content Acquisition
Authoritative Information Research DataPublications
• Registries of Data
Providers
• OpenDOAR, re3data,
DOAJ journal list, …
• Funding Information
• Author-/Contributor
Information
• What types of publications
(other than textual) ?
• Access to fulltext ?
• Tracking of Usage Events
• Other types of Research
Outputs?
• What level of detail?
• How do they relate to other
information entities ?
25. 25
OpenAIRE Subsystems and Workflows
• Aggregation
• Collection (and validation) of metadata and corresp. Fulltext; multi-protocol;
• transformation into uniform structure and semantics
• De-duplication
• Identification of duplicates of objects from same entity type (e.g. publication,
author)
• Generation of a disambiguated information space based on sets of similarity
relationships of pairs of duplicated objects
• Information Inference
• Applying mining algorithms on information space graph and fulltexts
• Inferred information to be used to enrich information graph
• Data Publishing
• Population and Enrichment of the information space graph
• Publishing over different back-ends: fulltext-index, OAI-PMH, Statistics-DB,
LOD
26. OpenAIRE Data Flow
26
Data source
import
Native Information Space
Public Information Space
De-duplication
Portal
Inferred objects,
relationships and
properties
End-user
Feedbacks
End-user claims
Metadata Docs
Inference
Pre-public Information Space
Test portal
Quality Test
Service
27. 27
OpenAIRE Information Enrichment
• Completion of missing metadata values by the same record from other
dataproviders
• List of author names
• Persistent identifier
• Links between publication and research data
• Links between publication / research data and funded project
28. Funded projects info in OpenAIRE
Collect metadata
including project
grantID from
OpenAIRE compliant
repositories
Metadata publications
record enrichments by
OpenAIRE
deduplication
Link Publications to
projects by
inference (text
mining procedures)
Link Publications to
projects using the
end-user service:
claim publications
29. Enriched Document Example
29
Plain Bibliographic information
from metadata
Multiple Instances
host the same
document (which is
de-duplicated)
Funding Info either
from metadata or by
text mining
Findings from Text
Mining / Inference
30. 3 A key to connect:
Interoperability
Guidelines, How to
register & How to
be compatible.
30
31. 1 2 3Literature
Repositories
(and journal platforms)
Dublin Core (DRIVER)
Data
Repositories
(and archives/data centres)
Datacite
CRIS systems
CERIF-XML
Guidelines for Data Providers
31
32. Why Guidelines?
Format and Protocol to Collect
FUNDING INFORMATION
ACCESS RIGHTS AND LICENSE INFO
+
REFERENCED DATASETS & RELATED PUBLICATIONS,
EMBARGO DATE INFORMATION
32
33. How do they work?
• Identification of Open Access and funded research results by OAI-Sets:
• ‘openaire’ for publications
• ‘openaire_data’ for research datasets
• Latest schema guarantees backward-compatibility with previous versions.
• Complemented by metadata enrichment thanks to OpenAIRE’s text-mining
services.
33
34. OpenAIRE OAI-Set
• To group metadata relevant for OpenAIRE
• See https://www.openaire.eu/content-acquisition-policy/content-acquisition-
policy/content-acquisition-policy
• Metadata about Open Access Publications
• Metadata about Publications funded in EC-FP7 / H2020
• Metadata about Publications funded by other funders
• OpenAIRE provides information about supported funding information
34
setName setSpec*
The OpenAIRE set OpenAIRE openaire
35. projectID
35
Element name projectID
DCMI definition dc:relation
Usage Mandatory (if applicable)
Usage instruction A vocabulary of projects is exposed by the OpenAIRE API:
http://api.openaire.eu/#cha_projects_http
, and available for all repository managers. Values include
funder, project name and projectID.
The projectID equals the Grant Agreement number, and is
defined by the namespace: info:eu-
repo/grantAgreement/Funder/
FundingProgram/ProjectNumber/
Jurisdiction/ProjectName/ProjectAcronym/
Example:
<dc:relation> info:eu-repo/grantAgreement/EC/FP7/123456 </dc:relation>
<dc:relation> info:eu-repo/grantAgreement/EC/FP7/12345/EU//Acronym
</dc:relation>
36. accessRights
36
Element name accessRights
DCMI definition dc:rights
Usage Mandatory
Usage instruction Use values from vocabulary Access Rights at
http://purl.org/eu-repo/semantics/#info-eu-repo-
AccessRights
• info:eu-repo/semantics/closedAccess
• info:eu-repo/semantics/embargoedAccess
• info:eu-repo/semantics/restrictedAccess
• info:eu-repo/semantics/openAccess
Examples:
<dc:rights> info:eu-repo/semantics/openAccess </dc:rights>
37. embargoEndDate
Element name embargoEndDate
DCMI definition dc:date
Usage Mandatory (if applicable)
Usage instruction Recommended when accessRights = info:eu-
repo/semantics/embargoedAccess
The date type is controlled by the name space info:eu-
repo/date/embargoEnd/, see
http://wiki.surffoundation.nl/display/standards/info-
eu-repo/#info-eu-repo-DateTypesandvalue. Encoding
of this date should be in the form YYYY-MM-DD
(conform ISO 8601).
Examples:
<dc:date> info:eu-repo/date/embargoEnd/2011-05-12 <dc:date>
38. Alternative Identifier
38
Element name Alternative Identifier
DCMI definition dc:relation
Usage Recommended
Usage instruction List alternative identifiers for this publication that are
not the primary identifier (repository splash page),
e.g., the DOI of publisher’s version, the PubMed/arXiv
ID. The term is defined by info:eu-
repo/semantics/altIdentifier info:eu-
repo/semantics/altIdentifier/<scheme>/<identi
fier> where <scheme> must be one of the
following: ark,arxiv, doi, hdl, isbn, purl…
Example
<dc:relation> info:eu-repo/semantics/altIdentifier/doi/10.1234/789.1
</dc:relation>
39. Referenced Dataset
39
Element name Referenced Dataset
DCMI definition dc:relation
Usage Recommended
Usage instruction Encodes links to research datasets connected
with this publication. The syntax of info:eu-
repo/semantics/dataset is: info:eu-
repo/semantics/dataset/<scheme>/<identifier>
where <scheme> must be one of the following:
ark,arxiv, doi, hdl, isbn, purl…
Example
<dc:relation> info:eu-repo/semantics/dataset/doi/10.1234/789.1
</dc:relation>
40. Referenced Publication
40
Element name Referenced Publication
DCMI definition dc:relation
Usage Recommended
Usage instruction Encode links to publications referenced by this
publication. The syntax of info:eu-
repo/semantics/reference is: info:eu-
repo/semantics/reference/<scheme>/<identifier
> where <scheme> must be one of the following:
ark, arxiv, doi, hdl, isbn…
Examples:
<dc:relation> info:eu-repo/semantics/reference/doi/10.1234/789.1
</dc:relation>
41. OpenAIRE Compatibility Status:
Levels and OAI Sets
41
OpenAIRE
basic
Only Open Access
content
via driver oai set
OpenAIRE
2.0
EC funded content
via
ec_fundedresourc
es oai set
OpenAIRE
2.0 +
Open Access and EC
funded content
via driver and
ec_fundedresources
oai set
OpenAIRE
3.0
Open Access and/or
EC funded and/or
National/other
funded content
via openaire oai
set
42. Meet H2020 OA Guidelines
42
Property DC Field Value
EU funding
acknowledgment
dc:contributor “controlled” terms :
["European Union (EU)" and "Horizon 2020"]["Euratom" and "Euratom research
and training programme 2014-2018"]
Peer reviewed dc:type info:eu-repo/semantics/publishedVersion
Embargo period dc:date
dc:rights
• info:eu-repo/date/embargoEnd/<YYYY-MM-DD>
• <YYYY-MM-DD> (as publication date)
• info:eu-repo/semantics/embargoedAccess
Project information dc:relation info:eu-
repo/grantAgreement/EC/H2020/[ProjectID]/[Jurisdiction]/[ProjectName]/[Project
Acronym]/
Persistent identifier dc:identifier or
dc:relation
License dc:rights URL of license condition
Persistent IDs for authors
and contributors
dc:creator
dc:contributor
<Lastname, Firstname; id_orcid 0000-0000-0000-0000>
Reference to related
research outcome
dc:relation info:eu-repo/semantics/dataset/<scheme>/<id>
43. CONTINUE TO BE DEVELOPED
OpenAIRE guidelines
to establish an open and sustainable
scholarly communication infrastructure
43
44. New paradigm – global harmonization
1.Extending Dublin Core with qualifiers but without loosing DCMI
compatibility
2.Replacing Dublin Core by DataCite metadata schema only
3.Replacing Dublin Core and adopting RIOXX only
4.Defining an application profile based on DataCite plus dedicated
elements from MODS (to express the bibliographic citation)
5.Defining an application profile that is
a.backward compatible (when possible)
b.re-using existing standards (when possible)
c. extensible for specific needs
Exploration of best strategy / method / metadata schemes
45. Proposal metadata Layers
Application Profile that defines
●basic layer (still Dublin Core)
●enhanced layer (re-use DataCite)
●extended layer (custom fields specific for regional
networks / infrastructures)
To be discuss with the communities: LA Referencia, Share, Jisc/RIOXX, COAR…
46. Becoming an OpenAIRE data provider
46
1. Register your repository in OpenDOAR / re3data
* institutional/thematic repository -> OpenDOAR
* data repository -> re3data
2. Test compliancy with OpenAIRE Guidelines
Make your repository OpenAIRE compliant –>
by help of the OpenAIRE validator service
3. Add your repository in OpenAIRE
Register your repository in OpenAIRE; pre-filled information
imported from OpenDOAR or re3data
47. 1. Registration in Repository Dictionary
• For literature repositories use:
OpenDOAR (http://opendoar.org/ )
• For research data repositories use:
re3data (http://re3data.org )
• If you are already registered in OpenDOAR:
• Check if the information is up to date
• Take care on admin email contact and
OAI configuration:
baseURL, OAI-Set, Guidelines Compatibility
47
48. 2. Test the OpenAIRE Compliance
48
Choose from the menu
Finally check results
54. http://api.openaire.eu/
Need to integrate project and funding information into your institutional
repository based on DSpace or ePrints?
• Go for the DSpace/ePrints endpoints.
Do you prefer a TSV with the list of projects by funding?
• TSV endpoint is meant for
54
55. Dspace Add-ons for project ids
• OpenAIRE Authority Control
• Dspace 3.2
• http://goo.gl/cEPTZN (updated March 2014)
• Dspace 1.8.2
• http://projeto.rcaap.pt/index.php/lang-en/consultar-recursos-de-
apoio/remository?func=fileinfo&id=354
• OpenAIRE funders projects list addon (NEW)
• In use on the RCAAP Project (PT repositories)
• https://gitlab.fccn.pt/dev-rcaap/addon-openaire/tree/OpenAIRE5.X
• https://gitlab.fccn.pt/dev-rcaap/addon-openaire/tree/OpenAIRE3.X
Using the projects list provided by the OpenAIRE API
55
Allows users to search and include EC (+ WT + FCT) projects ID in
the metadata of the records disposed in accordance with OpenAIRE
56. Submission Workflow
Searching by the name or the project id number
Select the project and accept… the necessary namespace will be filled
57. REDUCE WORKLOAD OF AUTHORS
Repository managers
to fulfill the EC Open Access requirements
or other funders OA mandates
57
64. REPOSITORIES DEPOSIT WORKFLOW: example in UMinho repository
Searching by the name, acronym or the project id number… Select the project and accept
OpenAIRE Funders Projects List
API
65. STREAMLINES PROJECT REPORTING
OpenAIRE
We maintain a page for every EC (H2020, FP7) project,
featuring project information, related project
publications and datasets, and a statistics section.
67. CAN'T FIND OR SEE ALL OF YOUR
PROJECT'S PUBLICATIONS IN
OPENAIRE AT REPORTING TIME?
68. LINK RESEARCH RESULTS TOOL
https://www.openaire.eu/participate/claim
Link publication or datasets
to projets.
Identify the project, select
publications or datasets and
set the access rights.
69. YOUR PUBLICATIONS WILL BE REPORTED
AUTOMATICALLY TO THE EC'S PARTICIPANT
PORTAL AT REPORTING TIME.
Once you deposit in a fully
OpenAIRE compliant repository
75. Scenario
• OpenAIRE aggregates metadata about publications from hundreds of
repositories, aggregators, OA journals, and publishers
• OpenAIRE guidelines: DC fields + access rights + funding projects + links to
datasets or publications
• Infers information about publications
• Relationships to projects and datasets, citations, similarities
• Find duplicates of metadata records for the same publications and merges
them to build a (possibly richer) representative record
75
76. Idea
• Institutional repositories may be interested to acquire metadata records of
publications that are “related with” the repository, i.e. they should/could
be part of their collection
• Enrichment: enrich the records they already have with extra metadata
information
• Addition: add to their collection records they were unaware of
76
77. OpenAIRE Literature
Broker sketch
OpenAIRE
Notification Broker
OpenAIRE
Information Space
Graph
(deduplication,
Inference,
Aggregation)
…
Subscriptions
Potential
Notifications
subscribe
notifyrepository
admin
OpenAIRE Data
Sources
Identifying “events”
relevant to repositories
(enrichments & additions)
Sending
events
Delivered
Notifications
Event (potential notification):
• Message
• Topic
• TargetRepository
• Trust
78. The Challnge
•Enrichment is straightforward
• Harvesting from repository and return to repository only records that
have been “enriched” by deduplication and/or inference
•Addition is less obvious
• Based on relationships, in turn identified by inference algorithms
• Must be augmented with notion of “trust” to enable “tuning”
options in order to reduce false positive notifications
78
79. Examples of enrichments topics
ENRICHMENT
• dc:rights: dc:rights is present and original record was missing it
• dc:identifier-if-DOI: DOI is present and original record was missing it
• dc:type: dc:type is present and original record was missing it
• dc:subject: dc:type is present and original record was missing it
• rel-to-project: relationship to project is present and original record
was missing it
• rel-to-dataset/software/similar-publication: relationship is present
79
80. Examples of additions topics
ADDITIONS
• authorAffiliation: The publication has an author whose organization has a given
institutional repository of reference
• sharedProject: The publication has been funded by a project whose participants
(orgs that are beneficiaries of the grant) have a given institutional repository of
reference
• authorRepositoryOfReference: The publication has an author with a given
institutional repository of reference
80
82. Author’s repository of reference
Exploits relationships
Publication author repository
(where author repository is
“frequency of deposition”)
83. Relevance by project funding
Exploits relationships
publication project organization repository
high chances to yield false positive notifications
84. Subscriptions
• Repository managers can subscribe to the service to receive notifications
about records “assigned to them” and specify
• Topics: enrichment.X or addition.Y
• How to be notified: RSS feed, email, APIs, etc.
• When to be notified: instantly, every K days
• Criteria on record fields (predicate)
• Repository managers can test their subscription by searching the
collection of potential notifications
84
85. Notifications
• The service can notify the repositories in different ways
• OpenAIRE recommended repository APIs for metadata ingestion (e.g. SWORD
project); software modules for known platforms will be considered (e.g. DSpace,
Eprints)
• email to the repository managers
• RSS feeds
• The service avoids redundant notifications by keeping a history of
delivered notifications
85
87. Standards for brokers
Working with similar initiatives
(Jisc, SHARE-US) on the definition
of recommendations to enable
information exchange between a
network of Scholarly
Communication Broker Services
Producers of events
Subscriptions Subscriptions Subscriptions
Consumers of events
subscribe notify subscribe notify subscribe notify
Exchanging
Subscriptions
& channeling
notifications
Exchanging
Subscriptions
& channeling
notifications
99. Questions?
For off-line questions: paolo.manghi@isti.cnr.it
Joint work with: Michele Artini, Claudio Atzori, Alessia Bardi, Nikon Gasparis, Antonis Lempesis,
Natalia Manola, Stefania Martziou, Pedro Principe, Eloy Rodriguez, Jochen Shirrwagen
100. Discussion groups
• What do you consider the most important tools/services for your work?
• what is missing? how about priorities?
• How can OpenAIRE help?
• in your visibility inside your Univ./research institution? or outside?
• How does your institution plans to extend repository services for data?
• How would you feel for providing your repository content for TDM?
…
4 groups – questions…
100OR2016 @ Dublin
101. Discussion groups
• What do you consider the most important tools/services for your work?
• what is missing? how about priorities?
• How can OpenAIRE help? in your visibility inside your Univ./research institution? or outside?
• How does your institution plans to extend repository services for research
outputs other than publications?
• How would you feel for providing your repository content for TDM?
…
4 groups – questions…
101OR2016 @ Dublin
103. FROM REPOSITORIES,
FOR REPOSITORIES
OpenAIRE services and tools
WORKSHOP @ OR2016 BY:
Jochen Schirrwagen, Natalia Manola, Paolo Manghi, Pedro Principe
info@openaire.eu
13/Jun/2016 – 1:30pm-3:30pm
Dublin, Trinity College
Maxwell Theatre