Presentation of the 2nd Content Providers Community Call, targeting the following topics: 1) OpenAIRE Content provider dashboard updates;
2) OpenAIRE aggregation and enrichment processes: specifications and good practices;
3) Community questions & comments.
1. @openaire_eu
4th Community Call
OpenAIRE content providers managers
aggregation and enrichment processes
Alessia Bardi (CNR-ISTI), Andreas Czerniak (UNIBI), Pedro Príncipe (UMINHO)
04/03/2020
2. 1) OpenAIRE Provide updates
2) OpenAIRE aggregation and enrichment processes
- The OpenAIRE Aggregator
- How to monitor the aggregation of your data source
- OpenAIRE enrichment processes
3) Questions & comments (please share your use cases, issues)
AGENDA:
Notes & Agenda ⇨ https://bit.ly/2rTgJwy
www.openaire.eu/provide-community-calls
3. OpenAIRE Provide – recent news
Dashboard UI/UX redesign
https://beta.provide.openaire.eu
(Coming soon - March)
Your participation is needed
(take part of the user board
Collection monitor feature
(aggregation history more complete)
Subscribe the newsletter
www.openaire.eu/past-cp-
newsletters/listing
Provide Public Roadmap
https://trello.com/b/JHbHKLZ
4/openaire-provide-roadmap
4. @openaire_eu
OpenAIRE aggregation and enrichment
processes: specifications and good practices
CNR & Bielefeld University
Community Call | 04 MAR 2020
6. An open metadata research
graph of interlinked scientific
products, with access rights
information, linked to funding
information and research
communities
Graph: model for the representation of information
OpenAIRE uses it to represent objects in the scholarly communication domain and the relationships
that exist among them.
Edges of the graph are annotated with a label that specifies the semantics of the relationships
between two objects, each represented as a node in the graph.
7. The OpenAIRE Research Graph in numbers
Data sources 17K
Publications 37M
(deduplicated 8M with full-texts)
Datasets 975K
Software 52K
Projects 3M
Funders 22
Production
Data sources 10K
Publications 110M
(deduplicated, 10M with full-texts)
Datasets 2M
Software 43K
Projects 3M
Funders 29
Beta
8. … and more
Academic Graph
… and more
… and more
… and more European and international funders
… and more
… and more … and more
Collecting metadata, links, and full-texts
from more than 10K sources worldwide to
materialize a graph where entities of the
research life cycle are linked to each other
10. Enrichment
10
Different records representing the
same entity (results or organization)
are merged in one
Mining full-texts and abstracts to
identify links (e.g. to projects, to
datasets, to software), affiliations,
subject classification, citations
Enrich records with information
about relevant research communities
and infrastructures based on the
provenance of the records and their
keywords.
Propagation
Enrich records based on information
available in the records that are
linked to them with a relationship
with “strong” semantics (e.g.
supplements/isSupplementTo)
RAW
OpenAIRE Research Graph
the supply chain
11. Integration Scenarios
● Directly harvested (from repositories, journals)
● Indirectly harvested (via aggregators, publishers)
○ see “collected from a compatible aggregator” in the Explore portal
○ records are marked to be collected from the aggregator and hosted by the
specific repository/journal (if resolvable via OpenDOAR/re3data/ISSN).
○ when the hosting source cannot be resolved, the record appears as
hosted by the “Unknown repository”
Community Call | 04 MAR 2020
15. OpenAIRE aggregation team: UNIBI
Activities:
• Activate the aggregation workflow
• Check supplied data
• Configure transformation step to
• assign the proper typologies to records (literature, dataset, software, or
other)
• address metadata quality imperfections
• Contact repository managers
• suggest improvements
• ask for permission to download Open Access full-texts
Aggregation of metadata
16. Aggregated record and Data Source Types
Publications
• Article
• Preprint
• Report
• Patent
• …
Datasets
• Dataset
• Collection
• Clinical Trial
• …
Software
• Research
Software
• …
Other Research
Products
• Service
• Workflow
• Interactive
Resource
• …
Institutional/
publication
repositories
Journals/
publishers
Data
repositories
Other
Products
repositories
Software
repositories
CRIS
Community Call | 04 MAR 2020
17. • OpenAIRE collects them but do not re-distribute them
• OpenAIRE explore portal will send the user to your URL
• OpenAIRE as a way to get new users/more accesses to your platform
• OpenAIRE runs mining algorithms to enrich the available metadata
• You can get back this information via the Broker
• Let other repository know that you have an Open Access version of the paper
• OpenAIRE is directly connected to the EC participant portal:
• If an Open Access version of the paper is known to exist, the life of the project
coordinator will be easier...
Open Access full-texts
18. How to monitor your
aggregation
workflow?
Community Call | 04 MAR 2020
21. Collection monitor
Collection mode:
- REFRESH: OpenAIRE collected
everything
- INCREMENTAL: OpenAIRE
collected only records that have
been updated since the previous
collection
22. Collection monitor
The number of records is different: some had to be
discarded by the aggregation team.
Suggestion:
validate your repository and check the report.
23. Collection monitor
The portal
shows the
metadata
collected in this
date...
...but only these number of records get into
the pipeline
Look for the OpenAIRE logo
This version of the metadata is still
not visible in the portal
25. • Infer information with full-text and data mining
• Fostering PIDs
• Propagation of ORCID IDs
• Improve discovery
• Propagation of abstracts between articles, datasets, and
software
• Improve monitoring
• Propagation of organizations from institutional data sources to
relative products and from products to linked products with
same authors
Enrichment of the graph
OpenAIRE-Advance Kick off | Athens | 17-19 Jan 2018
26. • Under implementation - Introduce relevant links
between scientific products, for example:
• Rapid view of science: identify links between articles and
relative presentations
• Hidden research software: identify link to URLs targeting rar
and zip archives
Enrichment of the graph
OpenAIRE-Advance Kick off | Athens | 17-19 Jan 2018
27. Enrichment
27
Different records representing the
same entity (results or organization)
are merged in one
Mining full-texts and abstracts to
identify links (e.g. to projects, to
datasets, to software), affiliations,
subject classification, citations
Enrich records with information
about relevant research communities
and infrastructures based on the
provenance of the records and their
keywords.
Propagation
Enrich records based on information
available in the records that are
linked to them with a relationship
with “strong” semantics (e.g.
supplements/isSupplementTo)
RAW
OpenAIRE Research Graph
the supply chain
Metadata records corresponding to
equivalent objects are merged.
Pre-print, post-print, published
versions are considered equivalent
for stats & monitoring purposes
Harvested
publications
160Mi
Unique publications
110Mi
28. Enriching metadata
Inference
Assign research products to
communities/infrastructures based on their
provenance, subjects
Info deduction
Abstracts, links to projects, countries,
communities/infrastructures, ORCID ids
from a research product to other products
Info propagation
10Mi OA
full-texts
Mining output
• Text-mined links
• 130Mi
• Links to projects, software,
datasets, research
infra/communities,
similarities
• Text-mined values
• 178Mi
• Citations, abstract, subject
classification terms
Coming soon:
links to patents
(EPO/PATSTAT)
subject: s
part_of
r
csubjects:[s, s1, … , sn]
Example
Under consideration:
propagation of organization from one
product to another linked to it with
“supplementedBy/supplementTo”
29. 1) OpenAIRE Provide updates
2) OpenAIRE aggregation and enrichment processes
- The OpenAIRE Aggregator
- How to monitor the aggregation of your data source
- OpenAIRE enrichment processes
3) Questions & comments (please share your use cases, issues)
AGENDA:
Notes & Agenda ⇨ https://bit.ly/2rTgJwy
www.openaire.eu/provide-community-calls
30. Upcoming calls
April 1st - main topic: DSpace-CRIS for OpenAIRE: implementation of the CRIS guidelines and beyond
www.openaire.eu/provide-community-calls
31. Subscribe to our newsletter!
www.openaire.eu/past-cp-newsletters