SlideShare une entreprise Scribd logo
1  sur  28
ResourceSync




ResourceSyncis funded by The Sloan Foundation
               ResourceSync – March 6 2012
ResourceSync – WebEx Agenda


Problem Perspective – Herbert Van de Sompel (LANL)

Technical Directions – Robert Sanderson (LANL)

Joining the Effort – Nettie Lagace (NISO)

Q&A




                  ResourceSync – March 6 2012
ResourceSync – Problem Perspective




 Presenter – Herbert Van de Sompel
          ResourceSync – March 6 2012
ResourceSync Core Team


Cornell University & OAI:
   BerhardHaslhofer, Carl Lagoze, Simeon Warner

Old Dominion University & OAI:
   Michael L. Nelson

Los Alamos National Laboratory & OAI:
   Martin Klein, Robert Sanderson, Herbert Van de Sompel

NISO:
   Todd Carpenter, Nettie Lagace, Peter Murray


                 ResourceSync – March 6 2012
ResourceSync Problem

• Consideration:
   • Source (server) A has resources that change over time: they
     get created, modified, deleted, moved, …
   • Destination (servers) X, Y, and Z leverage (some) resources
     of Source A.
• Problem:
   • Destinations want to keep in step with the resource changes
     at Source A: resource synchronization.
• Task of ResourceSync effort:
   • Design an approach for resource synchronization aligned
     with the Web Architecture that has a fair chance of adoption
     by different communities.
       • The approach must scale better than recurrent HTTP
          HEAD/GET on resources.


                    ResourceSync – March 6 2012
ResourceSync Use Cases

• arXiv mirroring.
• Metadata and content transfer from Europeana partners to
  central Europeana hub.
• Synchronization of Linked Data content.
• Recurrently collecting Memento metadata from IIPC web
  archives to central aggregator.
• Keeping up-to-data with resources that reside on a Web server.

• Use cases have different requirements regarding synchronization
  accuracy:
   • Synchronization coverage: perfect …… good enough
   • Synchronization speed:      fast …….…. fast enough




                    ResourceSync – March 6 2012
Web Context

• The context in which resource synchronization will take place is
  the Web.
• Resources considered for synchronization:
   • Are identified by dereference-able URIs.
   • Are cache-able.

=> Caveat: It is understood that resources that require client-side
   processing (e.g. hashbangURIs) cause problems.




                     ResourceSync – March 6 2012
3 Synchronization Needs

• Overall, there are 3 distinct needs regarding resource
  synchronization. All 3 are considered in scope for the effort:

    1. Baseline matching: An approach to allow a Destination that
       wants to start synchronizing with a Source to perform an
       initial catch up – Dump.
    2. Incremental resource synchronization: An approach to allow
       a Destination to remain up-to-date regarding changes at the
       Source.
    3. Audit: An approach to allow checking whether a Destination
       is in sync with a Source – Inventory.




                     ResourceSync – March 6 2012
Change Notification – Content Transfer

• Distinguish between two aspects related to incremental resource
  synchronization:

   2.a. Change notification: An approach to allow a Destination to
      understand that a Source’s resource has changed; and what
      the nature of the change is.
   2.b. Content transfer: An approach to allow a Destination to
      update its holdings to reflect the change the resource
      underwent at the Source.

=> For both, PUSH or PULL approaches are conceivable.




                    ResourceSync – March 6 2012
Selective Synchronization

• There is a need to be able to synchronize with a limited set of a
  Source’s resources: selective synchronization.

    • This leads to the notion of channels for resource
      synchronization.
    • The need for channels is most apparent for resource
      synchronization but may have to be carried through in
      baseline matching and audit.

=> This raises an issue:
    • The definition of channels by the Source is straightforward,
       yet may not meet the Destination’s needs.
    • The definition of channels by a Destination is less
       straightforward.


                     ResourceSync – March 6 2012
Memory

• For robustness, both change notification and content transfer
  may need a memory function to allow for synchronization after a
  Destination has missed updates:

   • Catching up on change notifications - Digests.
   • Catching up on resource versions - Mementos.

  However, it would be beneficial if resource synchronization could
  operate in a memory-less mode in cases where less than perfect
  sync is acceptable.
  An important architectural consideration is where that memory
  should reside: Source, intermediary.




                    ResourceSync – March 6 2012
Changes Only – Entire Resource

• Two approaches for content transfer:

   2.b.1. Synchronization is achieved by transfer of the entire
       resource.
   2.b.2. Synchronization is achieved by transfer of resource
       changes only.

   Focus is on (2.b.1) because of complexities re content-type-
   specific resource segment description and re-assembly
   instructions involved in (2.b.2).
   Focus re (2.b.1) is on content transfer using a PULL, not PUSH,
   approach.




                    ResourceSync – March 6 2012
Event Types

• Event types to be considered for change notifications:

    •   Tier 1:
        • Regarding resources: create ; delete ; update ; new
            dump ; new inventory
        • Regarding channels: removed from channel ; added to
            channel
    •   Tier 2:
        • Regarding resources: move ; copy

    Focus is on Tier 1 and on events pertaining to single resources.
    Tier 2 and consideration of groups of resources (e.g. URI
    templates, URI regex) is postponed.



                     ResourceSync – March 6 2012
Resource Representations

• Resource representations are subject to synchronization:
   • But limited to those that can be requested using protocol
     parameters for the resource’s URI.

=> Caveats:
    • It is understood this causes problems in light of resource
      representation personalization, e.g. geo-dependent
      representations.
    • If the change notification function is provided by 3rd party
      instead of Source, change notifications may be subjective to
      the 3rd parties perspective.




                    ResourceSync – March 6 2012
Discovery

• Several Discovery needs arise:

   • Regarding change notification channels:
      • How to find channels that provide change notifications
        for a given Source Server?
      • How to find information about the nature of the changes
        that are communicated on such channels?
   • Regarding dumps and inventories:
      • Where to obtain the most recent dump?
      • Where to obtain the most recent inventory?
   • Regarding memory:
      • Where to obtain digests?
      • How to access Mementos?


                   ResourceSync – March 6 2012
And Also …

• Authorization: For baseline matching, resource synchronization
  (change notification, content transfer), and audit a distinction
  may be required between Destinations that are authorized or
  unauthorized to perform these functions.

• Embargo: How soon after a resource change are parties allowed
  to know about it, to act upon it?

• Trust in a change notification channel:
   • Can a change notification channel that provides information
      about a given Source Server be trusted?




                     ResourceSync – March 6 2012
ResourceSync – Technical Directions




   Presenter – Robert Sanderson
          ResourceSync – March 6 2012
ResourceSync Architectural Challenges


Major Research Areas:
   1. Initial Baseline Synchronization
       • Retrieve dump of data
   2. Incremental Synchronization
       • Main focus of research
   3. Audit to ensure Consistency
       • Not yet significantly investigated

Incremental Synchronization split into two separate aspects:
    1. Change Notification (CN)
       • Alert that something happened (create,update,delete)
    2. Content Transfer (CT)
       • Transfer of just the change or the full resource


                         ResourceSync – March 6 2012
Incremental Synchronization - Architectures


Trivial Approach:
     • Retrieve every resource and compare to current copy




   • Not scalable: too many wasted, large transactions
   • No way to discover newly created resources




                        ResourceSync – March 6 2012
Incremental Synchronization - Architectures


Theoretically Optimal Solution:
   • Whenever a resource changes, push only the change to only the
      Destinations monitoring the resource




   •   No wasted transactions, only as much data transferred as needed
   •   Newly created resources are discovered
   •   But overly burdens the source! Not economically viable
   •   Q: Sweet point between Trivial and Optimal?


                         ResourceSync – March 6 2012
Incremental Synchronization - Architectures


Trivial Approach plus Conditional GET:
     • Retrieve every resource if it has changed
     • Essentially this is a Change Notification Pull




    • Still not scalable: too many wasted transactions
    • Still no way to discover newly created resources




                           ResourceSync – March 6 2012
Incremental Synchronization - Architectures


Simplest Workable Model:
   • Introduce a Feed of change notifications for all resources
   • Atom, RSS, OAI-PMH, SiteMaps, etc.




    • Significantly reduces wasted transactions
    • Newly created resources are discovered
    • But still not very efficient as don’t know when to pull

                          ResourceSync – March 6 2012
Incremental Synchronization - Architectures


Feed Extension Solution:
   • Continue the Feed paradigm, but introduce aggregating service
      and ping notification to re-pull (simulated push)




   • No wasted transactions, but pull will get already-seen notifications
   • Only advantageous if Source already supports a Feed


                         ResourceSync – March 6 2012
Incremental Synchronization - Architectures


Push Solution:
   • Instead of ping+pull, Source can push the notification




   • No wasted transactions, no wasted data transfer
   • Service maintains subscriber list, not Source
   • Change Notification from Source is easier than a Ping+Feed if no
     feed already, and minimally harder than just a Ping

                         ResourceSync – March 6 2012
Change Notification - Protocols


• Atom PubSubHubbub (PuSH)
• XMPP
   • PubSub extension
   • BoSH (XMPP over HTTP)
• Comet / HTTP Streaming
   • Open an HTTP connection and keep reading from it
   • Bayeux Protocol
• Long Polling
   • Keep HTTP connection open until a message, then reopen
   • BoSH, Bayeux option
• WebSockets
   • NullMQ / ZeroMQ
   • XMPP over WebSockets?


                   ResourceSync – March 6 2012
Change Notification - Messages


• Experiments with “bleep” change notifications, inspired by tweets
   • Text that is writeable & readable by machine and human
   • Simple “resync” language:
      • URIcreated at=“time-of-create”#hashtag$resync
      • URIupdated at=“time-of-update”#hashtag$resync
      • URIdeleted at=“time-of-delete”#hashtag$resync
   • Approach is extensible via creation of new languages for bleeps

   http://megalodon.lanl.gov/dbpedia/data/Paris updated at=“2012-03-
      05T19:54:39Z”
      #dbpedia $resync




                        ResourceSync – March 6 2012
Ongoing Investigations


• Change Notification - XMPP & XMPP PubSub& bleeps
   • LANL
   • Ongoing Experiment with Live DBPedia
• Change Notification - Comet / HTTP Streaming & bleeps
   • ODU
   • Bayeux Protocol via Faye Implementation
• Change Notification - Change Simulator
   • Cornell U
   • Generate configurable change notifications
   • Use as standardized input to different systems for testing
• Baseline Matching & Audit
   • Cornell U



                     ResourceSync – March 6 2012
ResourceSync




ResourceSyncis funded by The Sloan Foundation
               ResourceSync – March 6 2012

Contenu connexe

Similaire à ResourceSync

ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013Simeon Warner
 
ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...
ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...
ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...dwestbrook
 
mace15retro-presentation
mace15retro-presentationmace15retro-presentation
mace15retro-presentationJonathan Mace
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in MicroservicesRandy Shoup
 
LavaCon 2017 - Connecting Silos With Content Pipelines
LavaCon 2017 - Connecting Silos With Content PipelinesLavaCon 2017 - Connecting Silos With Content Pipelines
LavaCon 2017 - Connecting Silos With Content PipelinesJack Molisani
 
Spca2014 practical large scale migration guidance v1.0 andries den haan
Spca2014 practical large scale migration guidance v1.0 andries den haanSpca2014 practical large scale migration guidance v1.0 andries den haan
Spca2014 practical large scale migration guidance v1.0 andries den haanNCCOMMS
 
Practical large scale migration guidance
Practical large scale migration guidancePractical large scale migration guidance
Practical large scale migration guidanceAndries den Haan
 
UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...UKSG: connecting the knowledge community
 
PALNI Experience Implementing OCLC's WMS
PALNI Experience Implementing OCLC's WMS PALNI Experience Implementing OCLC's WMS
PALNI Experience Implementing OCLC's WMS Kirsten Leonard
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Nancy Pontika
 
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...UKSG: connecting the knowledge community
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 

Similaire à ResourceSync (20)

ResourceSync - NISO Update Jan 2014
ResourceSync - NISO Update Jan 2014ResourceSync - NISO Update Jan 2014
ResourceSync - NISO Update Jan 2014
 
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
 
ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013
 
ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...
ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...
ALA MidWinter 2013 - Electronic Resources Management Interest Group meeting -...
 
mace15retro-presentation
mace15retro-presentationmace15retro-presentation
mace15retro-presentation
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
LavaCon 2017 - Connecting Silos With Content Pipelines
LavaCon 2017 - Connecting Silos With Content PipelinesLavaCon 2017 - Connecting Silos With Content Pipelines
LavaCon 2017 - Connecting Silos With Content Pipelines
 
Spca2014 practical large scale migration guidance v1.0 andries den haan
Spca2014 practical large scale migration guidance v1.0 andries den haanSpca2014 practical large scale migration guidance v1.0 andries den haan
Spca2014 practical large scale migration guidance v1.0 andries den haan
 
Practical large scale migration guidance
Practical large scale migration guidancePractical large scale migration guidance
Practical large scale migration guidance
 
UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...
 
PALNI Experience Implementing OCLC's WMS
PALNI Experience Implementing OCLC's WMS PALNI Experience Implementing OCLC's WMS
PALNI Experience Implementing OCLC's WMS
 
ResourceSync tutorial OAI8
ResourceSync tutorial OAI8ResourceSync tutorial OAI8
ResourceSync tutorial OAI8
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
 
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 

Plus de National Information Standards Organization (NISO)

Plus de National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Dernier

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Dernier (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

ResourceSync

  • 1. ResourceSync ResourceSyncis funded by The Sloan Foundation ResourceSync – March 6 2012
  • 2. ResourceSync – WebEx Agenda Problem Perspective – Herbert Van de Sompel (LANL) Technical Directions – Robert Sanderson (LANL) Joining the Effort – Nettie Lagace (NISO) Q&A ResourceSync – March 6 2012
  • 3. ResourceSync – Problem Perspective Presenter – Herbert Van de Sompel ResourceSync – March 6 2012
  • 4. ResourceSync Core Team Cornell University & OAI: BerhardHaslhofer, Carl Lagoze, Simeon Warner Old Dominion University & OAI: Michael L. Nelson Los Alamos National Laboratory & OAI: Martin Klein, Robert Sanderson, Herbert Van de Sompel NISO: Todd Carpenter, Nettie Lagace, Peter Murray ResourceSync – March 6 2012
  • 5. ResourceSync Problem • Consideration: • Source (server) A has resources that change over time: they get created, modified, deleted, moved, … • Destination (servers) X, Y, and Z leverage (some) resources of Source A. • Problem: • Destinations want to keep in step with the resource changes at Source A: resource synchronization. • Task of ResourceSync effort: • Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. • The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync – March 6 2012
  • 6. ResourceSync Use Cases • arXiv mirroring. • Metadata and content transfer from Europeana partners to central Europeana hub. • Synchronization of Linked Data content. • Recurrently collecting Memento metadata from IIPC web archives to central aggregator. • Keeping up-to-data with resources that reside on a Web server. • Use cases have different requirements regarding synchronization accuracy: • Synchronization coverage: perfect …… good enough • Synchronization speed: fast …….…. fast enough ResourceSync – March 6 2012
  • 7. Web Context • The context in which resource synchronization will take place is the Web. • Resources considered for synchronization: • Are identified by dereference-able URIs. • Are cache-able. => Caveat: It is understood that resources that require client-side processing (e.g. hashbangURIs) cause problems. ResourceSync – March 6 2012
  • 8. 3 Synchronization Needs • Overall, there are 3 distinct needs regarding resource synchronization. All 3 are considered in scope for the effort: 1. Baseline matching: An approach to allow a Destination that wants to start synchronizing with a Source to perform an initial catch up – Dump. 2. Incremental resource synchronization: An approach to allow a Destination to remain up-to-date regarding changes at the Source. 3. Audit: An approach to allow checking whether a Destination is in sync with a Source – Inventory. ResourceSync – March 6 2012
  • 9. Change Notification – Content Transfer • Distinguish between two aspects related to incremental resource synchronization: 2.a. Change notification: An approach to allow a Destination to understand that a Source’s resource has changed; and what the nature of the change is. 2.b. Content transfer: An approach to allow a Destination to update its holdings to reflect the change the resource underwent at the Source. => For both, PUSH or PULL approaches are conceivable. ResourceSync – March 6 2012
  • 10. Selective Synchronization • There is a need to be able to synchronize with a limited set of a Source’s resources: selective synchronization. • This leads to the notion of channels for resource synchronization. • The need for channels is most apparent for resource synchronization but may have to be carried through in baseline matching and audit. => This raises an issue: • The definition of channels by the Source is straightforward, yet may not meet the Destination’s needs. • The definition of channels by a Destination is less straightforward. ResourceSync – March 6 2012
  • 11. Memory • For robustness, both change notification and content transfer may need a memory function to allow for synchronization after a Destination has missed updates: • Catching up on change notifications - Digests. • Catching up on resource versions - Mementos. However, it would be beneficial if resource synchronization could operate in a memory-less mode in cases where less than perfect sync is acceptable. An important architectural consideration is where that memory should reside: Source, intermediary. ResourceSync – March 6 2012
  • 12. Changes Only – Entire Resource • Two approaches for content transfer: 2.b.1. Synchronization is achieved by transfer of the entire resource. 2.b.2. Synchronization is achieved by transfer of resource changes only. Focus is on (2.b.1) because of complexities re content-type- specific resource segment description and re-assembly instructions involved in (2.b.2). Focus re (2.b.1) is on content transfer using a PULL, not PUSH, approach. ResourceSync – March 6 2012
  • 13. Event Types • Event types to be considered for change notifications: • Tier 1: • Regarding resources: create ; delete ; update ; new dump ; new inventory • Regarding channels: removed from channel ; added to channel • Tier 2: • Regarding resources: move ; copy Focus is on Tier 1 and on events pertaining to single resources. Tier 2 and consideration of groups of resources (e.g. URI templates, URI regex) is postponed. ResourceSync – March 6 2012
  • 14. Resource Representations • Resource representations are subject to synchronization: • But limited to those that can be requested using protocol parameters for the resource’s URI. => Caveats: • It is understood this causes problems in light of resource representation personalization, e.g. geo-dependent representations. • If the change notification function is provided by 3rd party instead of Source, change notifications may be subjective to the 3rd parties perspective. ResourceSync – March 6 2012
  • 15. Discovery • Several Discovery needs arise: • Regarding change notification channels: • How to find channels that provide change notifications for a given Source Server? • How to find information about the nature of the changes that are communicated on such channels? • Regarding dumps and inventories: • Where to obtain the most recent dump? • Where to obtain the most recent inventory? • Regarding memory: • Where to obtain digests? • How to access Mementos? ResourceSync – March 6 2012
  • 16. And Also … • Authorization: For baseline matching, resource synchronization (change notification, content transfer), and audit a distinction may be required between Destinations that are authorized or unauthorized to perform these functions. • Embargo: How soon after a resource change are parties allowed to know about it, to act upon it? • Trust in a change notification channel: • Can a change notification channel that provides information about a given Source Server be trusted? ResourceSync – March 6 2012
  • 17. ResourceSync – Technical Directions Presenter – Robert Sanderson ResourceSync – March 6 2012
  • 18. ResourceSync Architectural Challenges Major Research Areas: 1. Initial Baseline Synchronization • Retrieve dump of data 2. Incremental Synchronization • Main focus of research 3. Audit to ensure Consistency • Not yet significantly investigated Incremental Synchronization split into two separate aspects: 1. Change Notification (CN) • Alert that something happened (create,update,delete) 2. Content Transfer (CT) • Transfer of just the change or the full resource ResourceSync – March 6 2012
  • 19. Incremental Synchronization - Architectures Trivial Approach: • Retrieve every resource and compare to current copy • Not scalable: too many wasted, large transactions • No way to discover newly created resources ResourceSync – March 6 2012
  • 20. Incremental Synchronization - Architectures Theoretically Optimal Solution: • Whenever a resource changes, push only the change to only the Destinations monitoring the resource • No wasted transactions, only as much data transferred as needed • Newly created resources are discovered • But overly burdens the source! Not economically viable • Q: Sweet point between Trivial and Optimal? ResourceSync – March 6 2012
  • 21. Incremental Synchronization - Architectures Trivial Approach plus Conditional GET: • Retrieve every resource if it has changed • Essentially this is a Change Notification Pull • Still not scalable: too many wasted transactions • Still no way to discover newly created resources ResourceSync – March 6 2012
  • 22. Incremental Synchronization - Architectures Simplest Workable Model: • Introduce a Feed of change notifications for all resources • Atom, RSS, OAI-PMH, SiteMaps, etc. • Significantly reduces wasted transactions • Newly created resources are discovered • But still not very efficient as don’t know when to pull ResourceSync – March 6 2012
  • 23. Incremental Synchronization - Architectures Feed Extension Solution: • Continue the Feed paradigm, but introduce aggregating service and ping notification to re-pull (simulated push) • No wasted transactions, but pull will get already-seen notifications • Only advantageous if Source already supports a Feed ResourceSync – March 6 2012
  • 24. Incremental Synchronization - Architectures Push Solution: • Instead of ping+pull, Source can push the notification • No wasted transactions, no wasted data transfer • Service maintains subscriber list, not Source • Change Notification from Source is easier than a Ping+Feed if no feed already, and minimally harder than just a Ping ResourceSync – March 6 2012
  • 25. Change Notification - Protocols • Atom PubSubHubbub (PuSH) • XMPP • PubSub extension • BoSH (XMPP over HTTP) • Comet / HTTP Streaming • Open an HTTP connection and keep reading from it • Bayeux Protocol • Long Polling • Keep HTTP connection open until a message, then reopen • BoSH, Bayeux option • WebSockets • NullMQ / ZeroMQ • XMPP over WebSockets? ResourceSync – March 6 2012
  • 26. Change Notification - Messages • Experiments with “bleep” change notifications, inspired by tweets • Text that is writeable & readable by machine and human • Simple “resync” language: • URIcreated at=“time-of-create”#hashtag$resync • URIupdated at=“time-of-update”#hashtag$resync • URIdeleted at=“time-of-delete”#hashtag$resync • Approach is extensible via creation of new languages for bleeps http://megalodon.lanl.gov/dbpedia/data/Paris updated at=“2012-03- 05T19:54:39Z” #dbpedia $resync ResourceSync – March 6 2012
  • 27. Ongoing Investigations • Change Notification - XMPP & XMPP PubSub& bleeps • LANL • Ongoing Experiment with Live DBPedia • Change Notification - Comet / HTTP Streaming & bleeps • ODU • Bayeux Protocol via Faye Implementation • Change Notification - Change Simulator • Cornell U • Generate configurable change notifications • Use as standardized input to different systems for testing • Baseline Matching & Audit • Cornell U ResourceSync – March 6 2012
  • 28. ResourceSync ResourceSyncis funded by The Sloan Foundation ResourceSync – March 6 2012