SlideShare une entreprise Scribd logo
1  sur  79
Télécharger pour lire hors ligne
ResourceSync:
                           Web-Based
                             Resource
                        Synchronization
                            Herbert Van de Sompel
                                 Los Alamos National Laboratory
                                                    @hvdsomp



                                  ResourceSync is funded by
#resourcesync                   The Sloan Foundation & JISC


                   ResourceSync – Herbert Van de Sompel
                NISO Forum, September 24 2012, Denver, CO
ResourceSync Core Team – NISO & OAI

Los Alamos National Laboratory & OAI: Martin Klein, Robert
Sanderson, Herbert Van de Sompel

Cornell University & OAI: Berhard Haslhofer, Simeon
Warner

Old Dominion University & OAI: Michael L. Nelson

University of Michigan & OAI: Carl Lagoze

NISO: Todd Carpenter, Nettie Lagace, Peter Murray



                            ResourceSync – Herbert Van de Sompel
                         NISO Forum, September 24 2012, Denver, CO
ResourceSync Technical Group


•  Manuel Bernhardt, Delving B.V.
•  Kevin Ford, Library of Congress
•  Richard Jones, JISC
•  Graham Klyne, JISC
•  Stuart Lewis, JISC
•  David Rosenthal, LOCKSS
•  Christian Sadilek, Red Hat
•  Shlomo Sanders, Ex Libris, Inc.
•  Sjoerd Siebinga, Delving B.V.
•  Ed Summers, Library of Congress
•  Jeff Young, OCLC Online Computer Library Center


                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
Synchronize What?

•  Web resources – things with a URI that can be dereferenced and
   are cache-able (no dependency on underlying OS, technologies
   etc.)

•  Small websites/repositories (a few resources) to large
   repositories/datasets/linked data collections (many millions of
   resources)

•  That change slowly (weeks/months) or quickly (seconds), and
   where latency needs may vary

•  Focus on needs of research communication and cultural heritage
   organizations, but aim for generality


                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
Why?

… because lots of projects and services are doing synchronization
but have to resort to ad-hoc, case by case, approaches!

•  Project team involved with projects that need this

•  Experience with OAI-PMH: widely used in repos but
    o  XML metadata only

    o  Attempts at synchronizing actual content via OAI-PMH

       (complex object formats, dc:identifier) not successful.
    o  Web technology has moved on since 1999




•  Devise a shared solution for data, metadata, linked data?


                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Use Cases – The Basics




            ResourceSync – Herbert Van de Sompel
         NISO Forum, September 24 2012, Denver, CO
Use Cases - More




         ResourceSync – Herbert Van de Sompel
      NISO Forum, September 24 2012, Denver, CO
Out Of Scope (For Now)

•  Bidirectional synchronization

•  Destination-defined selective synchronization (query)

•  Bulk URI migration




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Use Case: arXiv Mirroring

•  1M article versions, ~800/day created or
   updated at 8 PM US Eastern Time

•  Metadata and full-text for each article

•  Accuracy important

•  Want low barrier for others to use

•  Look for more general solution than current
   homebrew mirroring (running with minor
   modifications since 1994!) and occasional rsync
   (filesystem layout specific, auth issues)

                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
Use Case: DBpedia Live Duplication

•  Average of 2 updates per second
•  Want low latency => need a push technology




                                ResourceSync – Herbert Van de Sompel
                             NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync Problem


•  Consideration:
    •  Source (server) A has resources that change over time: they
       get created, modified, deleted
    •  Destination (servers) X, Y, and Z leverage (some) resources
       of Source A.
•  Problem:
    •  Destinations want to keep in step with the resource changes
       at source A: resource synchronization.
•  Goal:
    •  Design an approach for resource synchronization aligned
       with the Web Architecture that has a fair chance of adoption
       by different communities.
        •  The approach must scale better than recurrent HTTP
           HEAD/GET on resources.


                                ResourceSync – Herbert Van de Sompel
                             NISO Forum, September 24 2012, Denver, CO
Destination: 3 Basic Synchronization Needs

1.  Baseline synchronization – A destination must be able to
    perform an initial load or catch-up with a source
       -  avoid out-of-band setup

2.  Incremental synchronization – A destination must have some
    way to keep up-to-date with changes at a source
       -  subject to some latency; minimal: create/update/delete
       -  allow to catch-up after destination has been offline

3.  Audit – A destination should be able to determine whether it is
    synchronized with a source
       -  subject to some latency



                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source Capability 1: Describing Content

In order to advertise the resources that a source wants destinations
to know about, it may describe them:

    o    Publish an inventory of resource URIs and possibly
         associated metadata
         -  Destination GETs the Content Description
         -  Destination GETs listed resources by their URI




                                    ResourceSync – Herbert Van de Sompel
                                 NISO Forum, September 24 2012, Denver, CO
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
           updated resources, removes deleted resources.




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.


   o     2.2. Push Change Set: Push a list of recent change events
         (create, update, delete resource) towards (a) destination(s)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.




                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set




                                    ResourceSync – Herbert Van de Sompel
                                 NISO Forum, September 24 2012, Denver, CO
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set

   o    3.2. Historical Content: Provide access to prior resource versions




                                     ResourceSync – Herbert Van de Sompel
                                  NISO Forum, September 24 2012, Denver, CO
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump

   o    4.2. Alternate Content Transfer: Support alternative
        mechanisms to optimize getting content (see later)




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source: Advertise Capabilities

A source needs to advertise the capabilities it supports to allow a
destination to discover them

•     Some capabilities may be provided by a third party, not the
      source itself
     o   e.g. Historical Change Sets, Historical Content
     o   But the source should still make those third party capabilities
         discoverable - trust




                                    ResourceSync – Herbert Van de Sompel
                                 NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync: A Framework of Capabilities


•  Modular framework allowing selective deployment of
   capabilities

•  A Source selects which capabilities to support in order
   to meet local and community needs

•  A Source’s Capabilities can be discovered via
   capability descriptions




                             ResourceSync – Herbert Van de Sompel
                          NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
BY REFERENCE!




    BY VALUE!




   ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Sitemap


<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
     <loc>http://example.com/res1</loc>
     <lastmod>2012-08-08T08:15:00Z</lastmod>
  </url>
  <url>
     <loc>http://example.com/res2</loc>
     <lastmod>2012-08-08T13:22:00Z</lastmod>
  </url>
</urlset>




                               ResourceSync – Herbert Van de Sompel
                            NISO Forum, September 24 2012, Denver, CO
Baseline Matching - Sitemap


•  Periodic publication of up-to-date Sitemap, which is a “by
   reference” inventory of a Source’s resources

•  Use ”as is” with resource location and last modification date as
   core elements

•  Introduce extension elements aimed at supporting audit: e.g.
   MD5 hash of content




                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
robots.txt!


              discovery




                             ResourceSync – Herbert Van de Sompel
                          NISO Forum, September 24 2012, Denver, CO
Baseline Matching – Dump


•  A Dump is a “by-value” inventory of a Source’s resources

•  Periodic publication of an up-to-date Dump

•  Possible technology: ZIP file consisting of:

    •  Special-purpose Sitemap that acts as a manifest for
       resources contained in the ZIP file
        •  Introduce an element to express correspondence
           between resource URI and filename in the ZIP file
    •  Resource bitsteams

•  Possible technology: WARC file


                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Change Communication – Pull Change Sets


•  Periodic publication of a Change Set that describes recent
   changes

•  A Change Set is a Sitemap-style document, enhanced to
   express change events rather than inventory. Per change event,
   convey:
    •  About the event:
        •  datetime
        •  event type: create/update/delete (maybe move/copy)
    •  About the changed resource:
        •  URI
        •  Information relevant for audit, e.g. fixity, size, mime type
        •  Further information to aide accessing the resource (see
           later)

                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Change Set, Based on Sitemap


<?xml version="1.0" encoding="UTF-8"?>
<urlset rs:type="changeset”
        xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <url>
     <loc>http://example.com/res1</loc>
     <lastmod rs:type="updated">2012-08-08T08:15:00Z</lastmod>
  </url>
  <url>
     <loc>http://example.com/res2</loc>
     <lastmod rs:type="created">2012-08-08T10:22:00Z</lastmod>
  </url>
</urlset>


                              ResourceSync – Herbert Van de Sompel
                           NISO Forum, September 24 2012, Denver, CO
Change Set, from Scratch


<?xml version="1.0" encoding="UTF-8"?>
<changeset xmlns="http://www.openarchives.org/rs/changeset">
  <change>
     <link rel="created" length="1234" type="text/html”
           href="http://example.com/res1.html"/>
     <date>2012-09-25T09:00:00Z</date>
     <fixity>ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx</fixity>
   </change>
</changeset>




                              ResourceSync – Herbert Van de Sompel
                           NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Change Communication – Push Change Sets


•  Use a push technology to convey changes

•  Express changes using same Sitemap-style document
    •  A Change Set in this case might convey only one change
       event

•  Possible technology: XMPP PubSub




                               ResourceSync – Herbert Van de Sompel
                            NISO Forum, September 24 2012, Denver, CO
<XMPP PubSub Intermezzo>

XMPP Publish-Subscribe: Client to Subscription Service,
  Subscription Service to Client(s) communication

•  One of the XMPP (Extensible Messaging and Presence Protocol)
   extensions http://xmpp.org/extensions/xep-0060.html

•  Apple Notifications based on XMPP PubSub

•  Both client and server tools widely available




                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
</XMPP PubSub Intermezzo>




Source   PubSub Server                                               Destination


                            ResourceSync – Herbert Van de Sompel
                         NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Change Communication Memory


•  Publication of one or more Change Sets that convey historical
   (rather than recent) changes

•  All historical Change Sets use same Sitemap-style document

•  Same approach irrespective of whether pull or push is used for
   Change Communication




                                ResourceSync – Herbert Van de Sompel
                             NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Resource Transfer


•  Resources are obtained in bulk by obtaining a Dump

•  An individual resource is, by default, obtained by dereferencing a
   resource’s URI listed in:
    •  Sitemap
    •  Change Set

•  Alternative access mechanisms are introduced to obtain an
   individual resource:
    •  From a mirror site
    •  Access to diff with previous version instead of access to the
       entire changed resource
    •  Resource version


                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Resource Memory


•  Requires a (short or long term) archive of resource versions

•  Access to specific version can be expressed as an alternative
   access mechanism in e.g. Change Set.
    •  Via a link to a version resource that is the result of the
       change expressed in the Change Set
    •  Via a link to a Memento TimeGate that supports access to all
       available prior versions




                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
<Memento Intermezzo>




  http://www.mementoweb.org/

              ResourceSync – Herbert Van de Sompel
           NISO Forum, September 24 2012, Denver, CO
Original Resources and Mementos




                ResourceSync – Herbert Van de Sompel
             NISO Forum, September 24 2012, Denver, CO
Bridge from Present to Past




              ResourceSync – Herbert Van de Sompel
           NISO Forum, September 24 2012, Denver, CO
Bridge from Past to Present




              ResourceSync – Herbert Van de Sompel
           NISO Forum, September 24 2012, Denver, CO
Memento Framework




         ResourceSync – Herbert Van de Sompel
      NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Memento Framework

Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png




                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
Time Travel across Versions of a Picture of the Day




Movie at: http://www.mementoweb.org/demo/picoftheday.mov
                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
Memento Framework

Original Resource: http://dbpedia.org/resource/France




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
Time-Series Analysis across DBpedia Versions




      Data collected through HTTP Navigation

           Paper at http://arxiv.org/abs/1003.3661

                             ResourceSync – Herbert Van de Sompel
                          NISO Forum, September 24 2012, Denver, CO
</Memento Intermezzo>




   http://www.mementoweb.org/

               ResourceSync – Herbert Van de Sompel
            NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync Timeline
•  August 2012
    o  First draft spec shared for feedback with ResourceSync team




•  September 2012
    o  Problem Statement paper in D-Lib Magazine

    o  In-person meeting of ResourceSync Team




•  October 2012
    o  Revise spec, conduct experiments

    o  Solicit broad feedback




•  December 2012 – Finalize specification (?)



                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
Pointers
•  First ResourceSync draft spec (do not implement!):
   http://www.openarchives.org/rs/0.1/resourcesync!

•  ResourceSync Simulator code on github
   http://github.org/resync/simulator!

•  NISO ResourceSync workspace
   http://www.niso.org/workrooms/resourcesync/!

•  Memento
   http://mementoweb.org!




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync:
         Get the Sticker!

            Herbert Van de Sompel
                 Los Alamos National Laboratory
                                    @hvdsomp


                  ResourceSync is funded by
                The Sloan Foundation & JISC




   ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO

Contenu connexe

En vedette (7)

The Future of eReaders
The Future of eReadersThe Future of eReaders
The Future of eReaders
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information Lifecycle
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 

Similaire à NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Synchronization

ResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource SynchronizationResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource SynchronizationHerbert Van de Sompel
 
ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveHerbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13Simeon Warner
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncnisohq
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly ResourcesRobert Sanderson
 
Refactoring HUBzero for Linked Data
Refactoring HUBzero for Linked DataRefactoring HUBzero for Linked Data
Refactoring HUBzero for Linked DataYongyang Yu
 
A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation Enno Meijers
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationSimeon Warner
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 

Similaire à NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Synchronization (20)

ResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource SynchronizationResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource Synchronization
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
ResourceSync
ResourceSyncResourceSync
ResourceSync
 
ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem Perspective
 
ResourceSync - NISO Update Jan 2014
ResourceSync - NISO Update Jan 2014ResourceSync - NISO Update Jan 2014
ResourceSync - NISO Update Jan 2014
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13
 
ResourceSync - An Introduction
ResourceSync - An IntroductionResourceSync - An Introduction
ResourceSync - An Introduction
 
NISO ResourceSync Training Session
NISO ResourceSync Training SessionNISO ResourceSync Training Session
NISO ResourceSync Training Session
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
 
Resource Sync - Introduction
Resource Sync - IntroductionResource Sync - Introduction
Resource Sync - Introduction
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
Refactoring HUBzero for Linked Data
Refactoring HUBzero for Linked DataRefactoring HUBzero for Linked Data
Refactoring HUBzero for Linked Data
 
A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource Synchronization
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 

Plus de National Information Standards Organization (NISO)

Plus de National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Dernier

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Dernier (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Synchronization

  • 1. ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by #resourcesync The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 2. ResourceSync Core Team – NISO & OAI Los Alamos National Laboratory & OAI: Martin Klein, Robert Sanderson, Herbert Van de Sompel Cornell University & OAI: Berhard Haslhofer, Simeon Warner Old Dominion University & OAI: Michael L. Nelson University of Michigan & OAI: Carl Lagoze NISO: Todd Carpenter, Nettie Lagace, Peter Murray ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 3. ResourceSync Technical Group •  Manuel Bernhardt, Delving B.V. •  Kevin Ford, Library of Congress •  Richard Jones, JISC •  Graham Klyne, JISC •  Stuart Lewis, JISC •  David Rosenthal, LOCKSS •  Christian Sadilek, Red Hat •  Shlomo Sanders, Ex Libris, Inc. •  Sjoerd Siebinga, Delving B.V. •  Ed Summers, Library of Congress •  Jeff Young, OCLC Online Computer Library Center ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 4. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 5. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 6. Synchronize What? •  Web resources – things with a URI that can be dereferenced and are cache-able (no dependency on underlying OS, technologies etc.) •  Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources) •  That change slowly (weeks/months) or quickly (seconds), and where latency needs may vary •  Focus on needs of research communication and cultural heritage organizations, but aim for generality ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 7. Why? … because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches! •  Project team involved with projects that need this •  Experience with OAI-PMH: widely used in repos but o  XML metadata only o  Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o  Web technology has moved on since 1999 •  Devise a shared solution for data, metadata, linked data? ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 8. Use Cases – The Basics ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 9. Use Cases - More ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 10. Out Of Scope (For Now) •  Bidirectional synchronization •  Destination-defined selective synchronization (query) •  Bulk URI migration ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 11. Use Case: arXiv Mirroring •  1M article versions, ~800/day created or updated at 8 PM US Eastern Time •  Metadata and full-text for each article •  Accuracy important •  Want low barrier for others to use •  Look for more general solution than current homebrew mirroring (running with minor modifications since 1994!) and occasional rsync (filesystem layout specific, auth issues) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 12. Use Case: DBpedia Live Duplication •  Average of 2 updates per second •  Want low latency => need a push technology ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 13. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 14. ResourceSync Problem •  Consideration: •  Source (server) A has resources that change over time: they get created, modified, deleted •  Destination (servers) X, Y, and Z leverage (some) resources of Source A. •  Problem: •  Destinations want to keep in step with the resource changes at source A: resource synchronization. •  Goal: •  Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. •  The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 15. Destination: 3 Basic Synchronization Needs 1.  Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source -  avoid out-of-band setup 2.  Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source -  subject to some latency; minimal: create/update/delete -  allow to catch-up after destination has been offline 3.  Audit – A destination should be able to determine whether it is synchronized with a source -  subject to some latency ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 16. Source Capability 1: Describing Content In order to advertise the resources that a source wants destinations to know about, it may describe them: o  Publish an inventory of resource URIs and possibly associated metadata -  Destination GETs the Content Description -  Destination GETs listed resources by their URI ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 17.
  • 18.
  • 19. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. o  2.2. Push Change Set: Push a list of recent change events (create, update, delete resource) towards (a) destination(s) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 25. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 26.
  • 27.
  • 28.
  • 29. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set o  3.2. Historical Content: Provide access to prior resource versions ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 30. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 31.
  • 32. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump o  4.2. Alternate Content Transfer: Support alternative mechanisms to optimize getting content (see later) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 33. Source: Advertise Capabilities A source needs to advertise the capabilities it supports to allow a destination to discover them •  Some capabilities may be provided by a third party, not the source itself o  e.g. Historical Change Sets, Historical Content o  But the source should still make those third party capabilities discoverable - trust ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 34. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 35. ResourceSync: A Framework of Capabilities •  Modular framework allowing selective deployment of capabilities •  A Source selects which capabilities to support in order to meet local and community needs •  A Source’s Capabilities can be discovered via capability descriptions ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 36. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 37. BY REFERENCE! BY VALUE! ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 38. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 39. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 40. Sitemap <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://example.com/res1</loc> <lastmod>2012-08-08T08:15:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2012-08-08T13:22:00Z</lastmod> </url> </urlset> ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 41. Baseline Matching - Sitemap •  Periodic publication of up-to-date Sitemap, which is a “by reference” inventory of a Source’s resources •  Use ”as is” with resource location and last modification date as core elements •  Introduce extension elements aimed at supporting audit: e.g. MD5 hash of content ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 42. robots.txt! discovery ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 43. Baseline Matching – Dump •  A Dump is a “by-value” inventory of a Source’s resources •  Periodic publication of an up-to-date Dump •  Possible technology: ZIP file consisting of: •  Special-purpose Sitemap that acts as a manifest for resources contained in the ZIP file •  Introduce an element to express correspondence between resource URI and filename in the ZIP file •  Resource bitsteams •  Possible technology: WARC file ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 44. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 45. Change Communication – Pull Change Sets •  Periodic publication of a Change Set that describes recent changes •  A Change Set is a Sitemap-style document, enhanced to express change events rather than inventory. Per change event, convey: •  About the event: •  datetime •  event type: create/update/delete (maybe move/copy) •  About the changed resource: •  URI •  Information relevant for audit, e.g. fixity, size, mime type •  Further information to aide accessing the resource (see later) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 46. Change Set, Based on Sitemap <?xml version="1.0" encoding="UTF-8"?> <urlset rs:type="changeset” xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <url> <loc>http://example.com/res1</loc> <lastmod rs:type="updated">2012-08-08T08:15:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod rs:type="created">2012-08-08T10:22:00Z</lastmod> </url> </urlset> ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 47. Change Set, from Scratch <?xml version="1.0" encoding="UTF-8"?> <changeset xmlns="http://www.openarchives.org/rs/changeset"> <change> <link rel="created" length="1234" type="text/html” href="http://example.com/res1.html"/> <date>2012-09-25T09:00:00Z</date> <fixity>ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx</fixity> </change> </changeset> ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 48. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 49. Change Communication – Push Change Sets •  Use a push technology to convey changes •  Express changes using same Sitemap-style document •  A Change Set in this case might convey only one change event •  Possible technology: XMPP PubSub ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 50. <XMPP PubSub Intermezzo> XMPP Publish-Subscribe: Client to Subscription Service, Subscription Service to Client(s) communication •  One of the XMPP (Extensible Messaging and Presence Protocol) extensions http://xmpp.org/extensions/xep-0060.html •  Apple Notifications based on XMPP PubSub •  Both client and server tools widely available ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 51. </XMPP PubSub Intermezzo> Source PubSub Server Destination ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 52. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 53. Change Communication Memory •  Publication of one or more Change Sets that convey historical (rather than recent) changes •  All historical Change Sets use same Sitemap-style document •  Same approach irrespective of whether pull or push is used for Change Communication ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 54. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 55. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 56. Resource Transfer •  Resources are obtained in bulk by obtaining a Dump •  An individual resource is, by default, obtained by dereferencing a resource’s URI listed in: •  Sitemap •  Change Set •  Alternative access mechanisms are introduced to obtain an individual resource: •  From a mirror site •  Access to diff with previous version instead of access to the entire changed resource •  Resource version ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 57. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 58. Resource Memory •  Requires a (short or long term) archive of resource versions •  Access to specific version can be expressed as an alternative access mechanism in e.g. Change Set. •  Via a link to a version resource that is the result of the change expressed in the Change Set •  Via a link to a Memento TimeGate that supports access to all available prior versions ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 59. <Memento Intermezzo> http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 60. Original Resources and Mementos ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 61. Bridge from Present to Past ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 62. Bridge from Past to Present ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 63. Memento Framework ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 64. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 65. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 66. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 67. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 68. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 69. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 70. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 71. Memento Framework Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 72. Time Travel across Versions of a Picture of the Day Movie at: http://www.mementoweb.org/demo/picoftheday.mov ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 73. Memento Framework Original Resource: http://dbpedia.org/resource/France ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 74. Time-Series Analysis across DBpedia Versions Data collected through HTTP Navigation Paper at http://arxiv.org/abs/1003.3661 ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 75. </Memento Intermezzo> http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 76. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 77. ResourceSync Timeline •  August 2012 o  First draft spec shared for feedback with ResourceSync team •  September 2012 o  Problem Statement paper in D-Lib Magazine o  In-person meeting of ResourceSync Team •  October 2012 o  Revise spec, conduct experiments o  Solicit broad feedback •  December 2012 – Finalize specification (?) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 78. Pointers •  First ResourceSync draft spec (do not implement!): http://www.openarchives.org/rs/0.1/resourcesync! •  ResourceSync Simulator code on github http://github.org/resync/simulator! •  NISO ResourceSync workspace http://www.niso.org/workrooms/resourcesync/! •  Memento http://mementoweb.org! ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 79. ResourceSync: Get the Sticker! Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO