Presented by Paolo Manghi (OpenAIRE)
during the OpenAIRE workshop "Research policy monitoring in the era of Open Science and Big Data" taking place in Ghent, Belgium on May 27th and 28th 2019
Day 1: Monitoring and Infrastructure for Open Science
https://www.openaire.eu/research-policy-monitoring-in-the-era-of-open-science-and-big-data-the-what-indicators-and-the-how-infrastructures
1. Unlocking Open
Science Monitoring
The OpenAIRE Research Graph and
Monitoring Dashboards
Paolo Manghi
Institute of Information Science and Technologies - CNR
2. OpenAIRE e-infrastructure
Materializing the Open Science graph
MiningHarvestingDeduplication
• Harvested data sources
10K +
• Harvested records
500Mi +
• Publication full-texts
7.5Mi (soon 10.5Mi+)
• Harvested/mined links
60Mi +
4. Providing an open metadata
research graph of interlinked
scientific products, with Open
Access information, linked to
funding information and research
communities
The OpenAIRE research graph
Open
Complete
De-duplicated
Transparent
Participatory
Decentralized
Trusted
5. Added value services
e.g. discovery services, monitoring services
Strategic for Open Science
Making the research graph
an EOSC resource
Open, Trusted, Complete, De-duplicated,
Participatory, Transparent, Decentralized
Actors
Institutions, research organizations, funders,
content providers, researchers, SMEs, etc.
8. De-duplicated
Entity type # Collected
records
# Records after
cleaning and
de-duplication
# Identified
duplicates
Publications ~ 343M ~ 94M ~ 249 millions
Data ~ 5,2M ~4,6M ~ 600K
Software ~150K ~ 134 K ~ 20K
Other ~ 5M ~ 4,5M ~ 500K
Organisations ~ 380K ~220K ~ 160K
More information about the de-duplication framework used by OpenAIRE can be found
searching on Zenodo for :
• “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster)
• “GDup: De-Duplication of Scholarly Communication Big Graphs”
9. • Rely on quality scholarly
communication sources of
different kinds
Participatory
• Include solutions and content
from any interested and known
content provider in scholarly
communication
Institutional repositories
Aggregators
Data archives
Software repositories
Research infrastructure sources
Funder grant databases
Authors & Orgs entity registries
Publishers & journals
10. • Metadata in the graph includes provenance when harvested
and reliability indicators when obtained from mining
Transparent
11. • Preservation and ownership beyond OpenAIRE
Exchanged with other graph initiatives
Redistributed via subscription and notification to
contributing data sources (provide.openaire.eu)
• Openly accessible via APIs
(develop.openaire.eu)
Decentralized
12. • Authors in the loop to enrich their ORCID record
• Validation of end-user ”claims”
Trusted (in progress)
13. Transition from OA content acquisition policies to
OS content acquisition policies
numbers from: explore.openaire.eu and beta.explore.openaire.eu
literature-research data
links
Open Access PDFs for
mining
120Mi
7.5Mi
(10Mi+)
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
90000000
100000000
old CAP new CAP
literature
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
old CAP new CAP
research data
0
20000
40000
60000
80000
100000
120000
140000
160000
old CAP new CAP
software
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
old CAP new CAP
other
26Mi
94Mi
1M
8Mi
95K
192K
3.6Mi
7.5Mi
18. Aims for Research Infrastructures
Publications, research data, software
published thanks to the existence of
the RI
Funding
Impact
Monitoring of Open Science impact:
data/software FAIRness,
reproducibility trends
Open Access/Science
Impact
19. Current stats: BETA public
Type Subscribers Visits
(since
April 2018)
Unique page
views in visit
Publications Research
Data
Research
software
Other
Sustainable Development
Solutions Network - Greece
RCD 16 278 454 6110 686 7 411
Agricultural and Food
Sciences
RCD 3 217 288 311,613 10,309 3,766 85,168
Common Language
Resources and Technology
Infrastructure (*)
RID 11 208 350 104 40 8 46
Fisheries and Aquaculture
Management
RCD 20 247 453 11,170 1,616 3,709 1,510
European Marine Science RCD 27 563 1,150 12,538 107,644 3,746 2,343
Neuroinformatics RCD 21 225 378 24,029 5,083 150 2,375
Digital Humanities and Cultural
Heritage
RCD 23 178 361 410,604 65,143 4,020 94,019
(*) Mining algorithms to be fine tuned to improve recall
20. Current stats: BETA private
Type Subscribers Visits
(since
April 2018)
Unique page
views in visit
Publications Research
Data
Research
software
Other
Instruct-ERIC RID 7 61 89 787 1 3 5
EGI : advanced computing for
research
RID 6 95 123 16,316 887 4,756 1,424
DARIAH EU RID 3 26 40 67 1 0 2
Research Data Alliance RID 19 117 195 31 0 1 0
ELIXIR-GR RID 5 90 134 58 0 38 0
All configurations are still provisional and mining algorithms to be fine tuned to improve precision and recall
24. Aims
Publications, research data, software
published thanks to grants awarded
by the funder
Funding
Impact
Monitoring of Open Science impact:
data/software FAIRness,
reproducibility trends
Open Access/Science
Impact
25. • Funders
• Trends in research fields: new (multidisciplinary) disciplines
• Institutions
• OA/OS behavior, ability to attract cross-funder grants
• Projects
• Success, interconnections, possible liaisons
Added value functionalities
27. Aims
Ability of researchers affiliated with
the institution to produce innovative
and quality scientific products
Research
Impact
Ability of services maintained and
operated by the institution tosupport
researchers at producing or storing
scientific products
Service capacity
impact
Ability of institution toreach funding
from different funders and disciplines
Funding
impact
28. • Funders
• Self-checking compliance to funder mandates
• Recent and past EC and other funders’ activities (representing various
funding levels)
• Checking compliance to funder mandates
• Institutions
• Collaboration network (by institution) via projects and products
• Projects
• Compare project portfolio against that of other similar institutions
(anonymized)
Added value functionalities