SlideShare a Scribd company logo
1 of 35
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Comparing the Performance of
OAI-PMH with ResourceSync
Petr Knoth, Matteo Cancellieri
Knowledge Media institute
The Open University
UK
Martin Klein
Research Library
Los Alamos National Laboratory
USA
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
“A single scientific repository is of limited value, real benefits
come from the ability to exchange data within a network …
… interoperability allows us to exploit today's computational
power so that we can aggregate, data mine, create new tools
and services, and generate new knowledge from repository
content.” - COAR
ResourceSync and repositories
2
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Protocols for data exchange are the blood of the
scholarly communication system
ResourceSync and repositories
3
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Aggregators and ResourceSync
4
ResourceSync
(CORE FastSync)
3rd parties
-data analysis
- TDM
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Repository aggregators have large full text collections
core.ac.uk stats:
• 13,117,488 Hosted full texts
• 135,539,113 Metadata records
• ~78m Links to full text
• 15TB of raw plain text
• 4,123 Data providers
5
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Many OAI-PMH implementations challenges …
Locating full text URLs in metadata
Restrictions on
full text downloading
Sequential nature of OAI-PMH
Failing resumption tokens
Incremental updates
Scalability
Metadata interoperability
Reliability
No content harvesting support
6
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Speed of OAI-PMH implementations
7
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Aggregators and ResourceSync
8
ResourceSync
(CORE FastSync)
3rd parties
-data analysis
- TDM
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Aggregators and ResourceSync
9
ResourceSync
(CORE FastSync)
3rd parties
-data analysis
- TDM
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Aggregators have a lot of usage
• January 2019 – CORE reached over 10M monthly active users for
the first time
• 571% increase from January 2018
• core.ac.uk by usage in the top 0.0009% of global websites
10
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Aggregator’s challenge
• Stay up to date despite thousands of data providers
• Efficiently expose large amounts of data to many users:
• Human users
• Machines (scalability!)
• OAI-PMH implementations can hardly deal with the job:
• Scalability
• Metadata inconsistency
• Supports for metadata harvesting only
11
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Research question
12
Is ResourceSync better suited for the job than
OAI-PMH?
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
OAI-PMH - Background
13
http://openarchives.org/pmh/
• Recurrent metadata exchange
from a Data Provider to Service
Providers
• XML metadata only
• Repository centric
• Devised 1999-2002, prior to
REST, prior to dominance of
web search engines
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync - Background
14
http://www.openarchives.org/rs/1.1/resourcesync
• Synchronization of resources
from a Source to Destinations
• Web resources, anything with
an HTTP URI & representation
• Resource centric
• Devised 2012-2013, leverages
key ingredients of web
interoperability, existing
specifications, existing Search
Engine Optimization practice
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync in a Nutshell
15
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync Capabilities
16
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync Capabilities
17
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync Capabilities
18
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync Capabilities
19
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync Capabilities
20
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Many to One - Aggregator
21
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync is based on Sitemaps
22
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T14:00:00Z</lastmod>
</url>
…
</urlset>
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
ResourceSync Resource List
23
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
at="2019-06-11T09:00:00Z"
completed="2019-06-11T09:00:44Z" />
<url>
<loc>http://example.com/res1_metadata.xml</loc>
<lastmod>2019-06-02T13:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
length="823"
type="text/xml" />
</url>
</urlset>
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Resource List with Link
24
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
at="2019-06-11T09:00:00Z"
completed="2019-06-11T09:00:44Z" />
<url>
<loc>http://example.com/res1_metadata.xml</loc>
<lastmod>2019-06-02T13:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
length="823"
type="text/xml" />
<rs:ln href="http://example.com/res1_content.pdf"
rel="describes"
length="8876"
type="application/pdf" />
</url>
</urlset>
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
• Designed to allow synchronization of resources, not just metadata
• Explicit link between metadata and the described resource
• Not prescriptive about the metadata format
• Web-centric
• Push-based Change Notifications (WebSub)
ResourceSync Characteristics
25
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
1. Assess the speed of OAI-PMH implementations across repositories
See results on slide #7
Comparative Analysis
26
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
1. Assess the speed of OAI-PMH implementations across repositories
2. Understand the recall in full-text harvesting
Comparative Analysis
27
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Recall of full-text harvesting – the power of the explicit full
text link
28
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
1. Assess the speed of OAI-PMH implementations across repositories
2. Understand the recall in full-text harvesting
3. Evaluate simulated metadata harvesting with ResourceSync
implementations for:
a) Standard Mode
• Resources sync’ed via Resource Lists, one resource at a time
(per HTTP transaction)
b) Resource Dump Mode
• Resources packaged into a Resource Dump, transferred via
one HTTP transaction
c) Batch Mode
• Resources are packaged into partial and on-demand
Resource Dumps, transferred via multiple HTTP transactions
4.
Comparative Analysis
29
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Speed simulated ResourceSync implementations
30
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Speed simulated ResourceSync implementations
31
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Why On Demand Resource Dump
• Many repositories have hundreds of OAI sets:
• Cannot materialize (too much data and processing requirements)
• Cannot rely on Resource List (too slow)
• HATEOAS approach:
https://blog.core.ac.uk/2018/03/17/increasing-the-speed-of-harvesting-
with-on-demand-resource-dumps/
32
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Recommendations for data providers
• Adopt ResourceSync at a platform level (Eprints, Dspace, Fedora, etc.)
• Many considerations:
• Support Change Lists? Dump? Naming of Capability Lists? On
Demand Dumps? How to link resources? WebSub?
• Guidelines needed!
• Resource List adoption only viable for small providers
• Support for on-demand Resource Dumps needed!
• ResourceSync Client-Server implementation available:
https://github.com/resync/resync
• CORE happy to benchmark repository platforms
• LANL working on validator
33
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
• OAI-PMH implementations vary substantially in terms of number of
records downloaded per second
• ResourceSync provides up to 10 times faster harvesting speeds with
Resource Dumps
• On-demand Resource Dumps for optimization
• Not yet part of the standard
• Thanks to resource linking, low recall less of an issue!
Take-aways
34
Comparing the Performance of OAI-PMH with ResourceSync
@petrknoth @mart1nkle1n
OR 2019, 06/12/2019, Hamburg, Germany
Comparing the Performance of
OAI-PMH with ResourceSync
Petr Knoth, Matteo Cancellieri
Knowledge Media institute
The Open University
UK
Martin Klein
Research Library
Los Alamos National Laboratory
USA

More Related Content

Similar to Comparing OAI-PMH and ResourceSync Performance

Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis
 
Intact danish workshop_20171001
Intact danish workshop_20171001Intact danish workshop_20171001
Intact danish workshop_20171001Dirk Pieper
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecyclegeoknow
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generationplan4all
 
OpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITGraph-TA
 
Now you can cite APHRC's data sets (CHAIN-REDS)
Now you can cite APHRC's data sets (CHAIN-REDS)Now you can cite APHRC's data sets (CHAIN-REDS)
Now you can cite APHRC's data sets (CHAIN-REDS)Bruce Becker
 
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...European Data Forum
 
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...Flink Forward
 
flight data analysis using big data
flight data analysis using big data flight data analysis using big data
flight data analysis using big data Sanjib Mitra
 
[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...
[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...
[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...Nuxeo
 
OpenAIRE webinar. Open Research Data in H2020
OpenAIRE webinar. Open Research Data in H2020OpenAIRE webinar. Open Research Data in H2020
OpenAIRE webinar. Open Research Data in H2020OpenAIRE
 
GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019Guide to PHARMACOLOGY
 
TechEvent Customer Project "Trend-Analytics"
TechEvent Customer Project "Trend-Analytics"TechEvent Customer Project "Trend-Analytics"
TechEvent Customer Project "Trend-Analytics"Trivadis
 
SplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHL
SplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHLSplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHL
SplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHLSplunk
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalMartin Kaltenböck
 

Similar to Comparing OAI-PMH and ResourceSync Performance (20)

Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
 
Intact danish workshop_20171001
Intact danish workshop_20171001Intact danish workshop_20171001
Intact danish workshop_20171001
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecycle
 
Hadoop Training
Hadoop TrainingHadoop Training
Hadoop Training
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
 
OpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation Repositories
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBIT
 
HOBBIT @ Graph-TA
HOBBIT @ Graph-TAHOBBIT @ Graph-TA
HOBBIT @ Graph-TA
 
Now you can cite APHRC's data sets (CHAIN-REDS)
Now you can cite APHRC's data sets (CHAIN-REDS)Now you can cite APHRC's data sets (CHAIN-REDS)
Now you can cite APHRC's data sets (CHAIN-REDS)
 
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
 
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
 
flight data analysis using big data
flight data analysis using big data flight data analysis using big data
flight data analysis using big data
 
[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...
[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...
[Nuxeo World 2013] CAPGEMINI NL AND NUXEO: ONE YEAR LATER, GREAT THINGS HAVE ...
 
OpenAIRE webinar. Open Research Data in H2020
OpenAIRE webinar. Open Research Data in H2020OpenAIRE webinar. Open Research Data in H2020
OpenAIRE webinar. Open Research Data in H2020
 
GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019
 
TechEvent Customer Project "Trend-Analytics"
TechEvent Customer Project "Trend-Analytics"TechEvent Customer Project "Trend-Analytics"
TechEvent Customer Project "Trend-Analytics"
 
SplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHL
SplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHLSplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHL
SplunkLive! Munich 2019: Splunking Parcels with Deutsche Post DHL
 
Flink Meetup Septmeber 2017 2018
Flink Meetup Septmeber 2017 2018Flink Meetup Septmeber 2017 2018
Flink Meetup Septmeber 2017 2018
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance Professional
 

More from Martin Klein

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly WebMartin Klein
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...Martin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
 

More from Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 

Recently uploaded

Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...sonatiwari757
 
SEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization SpecialistSEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization SpecialistKHM Anwar
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 

Recently uploaded (20)

Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
SEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization SpecialistSEO Growth Program-Digital optimization Specialist
SEO Growth Program-Digital optimization Specialist
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 

Comparing OAI-PMH and ResourceSync Performance

  • 1. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Comparing the Performance of OAI-PMH with ResourceSync Petr Knoth, Matteo Cancellieri Knowledge Media institute The Open University UK Martin Klein Research Library Los Alamos National Laboratory USA
  • 2. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany “A single scientific repository is of limited value, real benefits come from the ability to exchange data within a network … … interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.” - COAR ResourceSync and repositories 2
  • 3. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Protocols for data exchange are the blood of the scholarly communication system ResourceSync and repositories 3
  • 4. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Aggregators and ResourceSync 4 ResourceSync (CORE FastSync) 3rd parties -data analysis - TDM
  • 5. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Repository aggregators have large full text collections core.ac.uk stats: • 13,117,488 Hosted full texts • 135,539,113 Metadata records • ~78m Links to full text • 15TB of raw plain text • 4,123 Data providers 5
  • 6. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Many OAI-PMH implementations challenges … Locating full text URLs in metadata Restrictions on full text downloading Sequential nature of OAI-PMH Failing resumption tokens Incremental updates Scalability Metadata interoperability Reliability No content harvesting support 6
  • 7. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Speed of OAI-PMH implementations 7
  • 8. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Aggregators and ResourceSync 8 ResourceSync (CORE FastSync) 3rd parties -data analysis - TDM
  • 9. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Aggregators and ResourceSync 9 ResourceSync (CORE FastSync) 3rd parties -data analysis - TDM
  • 10. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Aggregators have a lot of usage • January 2019 – CORE reached over 10M monthly active users for the first time • 571% increase from January 2018 • core.ac.uk by usage in the top 0.0009% of global websites 10
  • 11. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Aggregator’s challenge • Stay up to date despite thousands of data providers • Efficiently expose large amounts of data to many users: • Human users • Machines (scalability!) • OAI-PMH implementations can hardly deal with the job: • Scalability • Metadata inconsistency • Supports for metadata harvesting only 11
  • 12. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Research question 12 Is ResourceSync better suited for the job than OAI-PMH?
  • 13. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany OAI-PMH - Background 13 http://openarchives.org/pmh/ • Recurrent metadata exchange from a Data Provider to Service Providers • XML metadata only • Repository centric • Devised 1999-2002, prior to REST, prior to dominance of web search engines
  • 14. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync - Background 14 http://www.openarchives.org/rs/1.1/resourcesync • Synchronization of resources from a Source to Destinations • Web resources, anything with an HTTP URI & representation • Resource centric • Devised 2012-2013, leverages key ingredients of web interoperability, existing specifications, existing Search Engine Optimization practice
  • 15. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync in a Nutshell 15
  • 16. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync Capabilities 16
  • 17. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync Capabilities 17
  • 18. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync Capabilities 18
  • 19. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync Capabilities 19
  • 20. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync Capabilities 20
  • 21. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Many to One - Aggregator 21
  • 22. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync is based on Sitemaps 22 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </url> … </urlset>
  • 23. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany ResourceSync Resource List 23 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2019-06-11T09:00:00Z" completed="2019-06-11T09:00:44Z" /> <url> <loc>http://example.com/res1_metadata.xml</loc> <lastmod>2019-06-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="823" type="text/xml" /> </url> </urlset>
  • 24. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Resource List with Link 24 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2019-06-11T09:00:00Z" completed="2019-06-11T09:00:44Z" /> <url> <loc>http://example.com/res1_metadata.xml</loc> <lastmod>2019-06-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="823" type="text/xml" /> <rs:ln href="http://example.com/res1_content.pdf" rel="describes" length="8876" type="application/pdf" /> </url> </urlset>
  • 25. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany • Designed to allow synchronization of resources, not just metadata • Explicit link between metadata and the described resource • Not prescriptive about the metadata format • Web-centric • Push-based Change Notifications (WebSub) ResourceSync Characteristics 25
  • 26. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany 1. Assess the speed of OAI-PMH implementations across repositories See results on slide #7 Comparative Analysis 26
  • 27. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany 1. Assess the speed of OAI-PMH implementations across repositories 2. Understand the recall in full-text harvesting Comparative Analysis 27
  • 28. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Recall of full-text harvesting – the power of the explicit full text link 28
  • 29. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany 1. Assess the speed of OAI-PMH implementations across repositories 2. Understand the recall in full-text harvesting 3. Evaluate simulated metadata harvesting with ResourceSync implementations for: a) Standard Mode • Resources sync’ed via Resource Lists, one resource at a time (per HTTP transaction) b) Resource Dump Mode • Resources packaged into a Resource Dump, transferred via one HTTP transaction c) Batch Mode • Resources are packaged into partial and on-demand Resource Dumps, transferred via multiple HTTP transactions 4. Comparative Analysis 29
  • 30. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Speed simulated ResourceSync implementations 30
  • 31. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Speed simulated ResourceSync implementations 31
  • 32. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Why On Demand Resource Dump • Many repositories have hundreds of OAI sets: • Cannot materialize (too much data and processing requirements) • Cannot rely on Resource List (too slow) • HATEOAS approach: https://blog.core.ac.uk/2018/03/17/increasing-the-speed-of-harvesting- with-on-demand-resource-dumps/ 32
  • 33. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Recommendations for data providers • Adopt ResourceSync at a platform level (Eprints, Dspace, Fedora, etc.) • Many considerations: • Support Change Lists? Dump? Naming of Capability Lists? On Demand Dumps? How to link resources? WebSub? • Guidelines needed! • Resource List adoption only viable for small providers • Support for on-demand Resource Dumps needed! • ResourceSync Client-Server implementation available: https://github.com/resync/resync • CORE happy to benchmark repository platforms • LANL working on validator 33
  • 34. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany • OAI-PMH implementations vary substantially in terms of number of records downloaded per second • ResourceSync provides up to 10 times faster harvesting speeds with Resource Dumps • On-demand Resource Dumps for optimization • Not yet part of the standard • Thanks to resource linking, low recall less of an issue! Take-aways 34
  • 35. Comparing the Performance of OAI-PMH with ResourceSync @petrknoth @mart1nkle1n OR 2019, 06/12/2019, Hamburg, Germany Comparing the Performance of OAI-PMH with ResourceSync Petr Knoth, Matteo Cancellieri Knowledge Media institute The Open University UK Martin Klein Research Library Los Alamos National Laboratory USA