2. Research
communities
Researchers (All)
Content providers
Innovators
Research
managers
Funders
Building the graph and Dashboards
OpenAIRE Dashboards
Validation
Cleaning De-duplication
Inference
Research Graph Services
Project communiity
FunderFunding
Product
Publicatio
n
Data Software
Organizatio
n
TERMS
OF USE
Harvesting Uploading
Brokering
Source
ORP
Publications
repositories
Data
repositories
Hybrid
repositories
Registries
OA
Journals
Software
repositories
Content Providers Research
Infras
GUIDE
LINES
8. Materializing the Open Science Graph
Project
communit
y
FunderFunding
Product
Publicatio
n
Researc
h Data
Software
Organizatio
n
Source
Other
res.
products
Mining
Deduplication
End-user feedback
Harvesting
GUIDE
LINES
Research Infrastructures Publishing
IT
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
9. Providing an open metadata
research graph of interlinked
scientific products, with Open
Access information, linked to
funding information and research
communities
The OpenAIRE research graph
Open
Complete
De-duplicated
Transparent
Participatory
Decentralized
Trusted
11. Harvesting/transformation workflows
Source A
Collect Transform
Source B
Native
XML
Cleaned
XML
Collect Transform
Native
XML
Cleaned
XML
Data Collection Workflow
Sub-Workflow Sub-Workflow
Monitoring Data Quality/Expectations
across sources, within sources, etc.
• Workflow templates and workflow
executions (scheduled)
• Provenance
• Types of products
• Etc.
Transformation
• Moving from XML to JSON
frameworks: XSLT to JSON, XML to
JSON
GUIDE
LINES
GUIDE
LINES
12. Fine-grained classification of Research Products
Publications
• Article
• Preprint
• Report
• …
Datasets
• Dataset
• Collection
• Clinical Trials
• …
Software
• Research
Software
• …
Other Research
Products
• Service
• Workflow
• Interactive
Resource
• …
Institutional/
publication
repositories
Journals/
publishers
Data
repositories
Other
Products
repositories
Software
repositories
OpenAIRE-Advance Review, January 2019
14. • MapReduce on HDFS/Spark
• 13 Millions full-texts
• Java/Python framework
Mining
Find new metadata and links
• Identification of links to entities (URLs, PIDs)
• Semantics for documents, datasets, software
• Semantics of links
• Links to web docs
• Ecc
Collect Open Access PDFs
• Pro-actively collect pre-prints
• Identify Open Access versions
16. De-duplication (BETA Content)
More information about the de-duplication framework used by
OpenAIRE can be found searching on Zenodo for :
• “De-duplicating the OpenAIRE Scholarly Communication Big Graph”
(poster)
• “GDup: De-Duplication of Scholarly Communication Big Graphs”
Deduplication techniques
(MapReduce based, Java)
• Improving results by adding
context
17. Production: Open Access CAPs
BETA: Open Science CAPs
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
90000000
100000000
Old CAP New CAP
literature
0
2000000
4000000
6000000
8000000
10000000
12000000
Old CAP New CAP
research data
0
20000
40000
60000
80000
100000
120000
140000
Old CAP New CAP
software
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
Old CAP New CAP
other
110Mi
30Mi
1Mi
10Mi
100K
180K
3Mi
7Mi
Harvested content
• Data sources
12K +
• Records
450Mi
• Publication full-texts
11,6Mi (Springer N. coming)
• Links (also text-mined)
680Mi
PROD BETA PROD BETA
PROD BETAPROD BETA
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
19. API and access
Bulk
OAI-PMH
Dumps in
Zenodo for large
datasets
HTTP
Search
Search REST
APIs
Linked Open
Data
SparQL
LOD dumps
Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
http://develop.openaire.eu
Average unique visitors per month 25,000
Average hits per month 2,2Mi
22. • October-November 2019:
OpenAIRE Research Graph open for consultation
Collecting feedback via Trello (operational end of
September)
• December 2019:
OpenAIRE Research Graph
in production
BETA Graph Open Consultation
http://beta.explore.openaire.eu
• Identify errors/inconsistencies (semi-)automatically
• Crowd-sourcing
24. Access use-cases: APIs and web portal
Harvesting of article-
dataset and dataset-
dataset scholarly links
API
WebUI: link
discovery/navigation
API: link
search/resolution
Other
sources
17,5Mi literature
objects, 50,7Mi
datasets, 481,3Mi
Scholix links;
Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
40Mi hits/month
(~1Bi hits since Jan
2018)
25. • Numbers
17,5Mi literature objects, 50,7Mi datasets, 481,3Mi Scholix
links;
• API Adoption
40Mi hits per month
Scholexplorer
26. Access use-cases: APIs and web portal
Other
sources
Harvesting of links
API
API: link
search/resolutio
n
WebUI: link
discovery/navigation
40Mi hits/month
(~1Bi hits since Jan
2018)
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
28. Is it a questionnaire
management system? Definite
no!
• Articulated handling of a DMP
Publishing, discovery, reuse, statistics onDMPs
• Actionable DMPs
Validation ofstatements viaexternal services
• Collaborative DMP composition
Researchers intheloop
ArgOS
Machine-actionable data
management planning
Powered by OpenDMP
Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
29. • Amnesia is a data anonymization tool available at
https://amnesia.openaire.eu
Amnesiacanbeusedlocallyoron-line
On-line is for demos and training, not safe
• Offers true anonymity and not pseudo-anonymity
k-anonymityandkm-anonymity
• Numbers in 2019 till now:
33Khits
7Kusesoftheon-lineservice
470installations
Amnesia
Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
32. • Repository registration
and validation
• Repository Usage
Statistics
• Repository Broker
Service
Services for Content Providers
http://provide.openaire.eu
Screenshot
33. • 24 repositories defined at least one
subscription
• Integrate with repositories (Zenodo) and
aggregators (LA Referencia)
• Towards PlanS implementation (PDF
brokering)
Broker Service
35. • Topics have data sources as targets
• Events regard an object in a given data source
• Data sources:
Publication repositories from OpenDOAR
Data Archives from re3data.org
Topics
Event (potential notification):
• Message
• Topic
• TargetRepository
• Trust
36. Events
Properties or links that are not
available in the records
Merge
Inference
Claims
Enrichments
Records that should be in
the repository but are NOT
in the repository
Deduction from authors
Deduction from
affiliation
Additions
Wrong links
End-user feedbacks
Alerts
39. ● Join OpenAIRE Usage Statistics
○ enable “usage metrics” for your data source
○ download & configure tracking plugin in your data source
○ confirmation by OpenAIRE once usage events are tracked in PIWIK
● or enter SUSHI endpoint to let OpenAIRE collect COUNTER
reports
Metrics
Download
tracker
Configure Deploy & Test
Validation & Confirmation
42. Research Community Dashboard and
Gateways
Research Community
Dashboard
Researcher
Search-Navigate-Monitor
Research Products
Community
Gateway
Community
Gateway
Community Manager
Configure criteria of
inclusion into Gateway
as-a-Service
IT
43. • Subjects of pertinence
• Provenance (data source) + critieria
• Zenodo communities
• Projects
• Propagation via relationships
Publication «supplementedBy» Data/Software
Project «funds» Publication/Data/Software
Criteria for inclusion
New criteria
• Via ORCID
• Others?
44. Monitoring trends and impact
MONITOR
Funding
impact
Funding
attraction
Open
Science
impact
Open
Access
impact
Research
Impact
28 Funders in BETA
45. Monitoring trends and impact
MONITOR
Funding
impact
Funding
attraction
Open
Science
impact
Open
Access
impact
Research
Impact
28 Funders in BETA
Funders
• Trends in research fields: new (multidisciplinary)
disciplines
Institutions
• OA/OS behavior, ability to attract cross-funder
grants
Projects
• Success, interconnections, possible liaisons
Funders
• Recent and past EC and other funders’ activities
(representing various funding levels)
• Checking compliance to funder mandates
Institutions
• Collaboration network (by institution) via projects and
products
• Ability to attract funds from different funders
Projects
• Check if projects are eligible for Post-Grant APC
funding
• Compare project portfolio against that of other similar
institutions (anonymized)
46. Search and discovery portal
http://explore.openaire.euhttp://beta.explore.openaire.eu