SlideShare a Scribd company logo
1 of 88
Download to read offline
Memento:
                                  Time Travel for the Web
                                     http://www.mementoweb.org

                     Herbert Van de Sompel – hvdsomp@gmail.com
                         Michael L. Nelson – mln@cs.odu.edu



                           The Memento Experiment was partly funded
                                  by the Library of Congress




            Memento: Time Travel for the Web
        Herbert Van de Sompel, Michael L. Nelson
Library of Congress, Washington, DC - November 16 2009
Acknowledgments

•  At the Los Alamos National Laboratory, Prototyping Team:

   o    Robert Sanderson
   o    Lyudmilla Balakireva
   o    Harihar Shankar

•  At Old Dominion University, Web Science and Digital Library
   Research Group:

   o    Scott Ainsworth




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Looking at the Past can be Fun



                                                                  Feb 14 2006



                                                           Cheney prays for hunt victim




              Memento: Time Travel for the Web
          Herbert Van de Sompel, Michael L. Nelson
  Library of Congress, Washington, DC - November 16 2009
Looking at the Past can be Fun



                                                               Feb 14 2006



                                                           Press Attacks Cheney




              Memento: Time Travel for the Web
          Herbert Van de Sompel, Michael L. Nelson
  Library of Congress, Washington, DC - November 16 2009
And Memento wants to make it Easy




                 Memento: Time Travel for the Web
             Herbert Van de Sompel, Michael L. Nelson
     Library of Congress, Washington, DC - November 16 2009
W3C Web Architecture: Resource – URI - Representation


                                   dereference

           URI

                   Identifies

                                    Resource


                                                    Represents


                                                                          Representation




                             Memento: Time Travel for the Web
                         Herbert Van de Sompel, Michael L. Nelson
                 Library of Congress, Washington, DC - November 16 2009
W3C Web Architecture: Resource – URI - Representation


                   dereference              content negotiation

           URI

                   Identifies

                                    Resource


                                                    Represents


                                                                          Representation 1


                                                    Represents                     Representation 2


                             Memento: Time Travel for the Web
                         Herbert Van de Sompel, Michael L. Nelson
                 Library of Congress, Washington, DC - November 16 2009
Resources




            Memento: Time Travel for the Web
        Herbert Van de Sompel, Michael L. Nelson
Library of Congress, Washington, DC - November 16 2009
Resources have Representations




               Memento: Time Travel for the Web
           Herbert Van de Sompel, Michael L. Nelson
   Library of Congress, Washington, DC - November 16 2009
Resources have Representations that Change over Time




                          Memento: Time Travel for the Web
                      Herbert Van de Sompel, Michael L. Nelson
              Library of Congress, Washington, DC - November 16 2009
Only the Current Representation is Available from a Resource




                             Memento: Time Travel for the Web
                         Herbert Van de Sompel, Michael L. Nelson
                 Library of Congress, Washington, DC - November 16 2009
Old Representations are Lost Forever




                 Memento: Time Travel for the Web
             Herbert Van de Sompel, Michael L. Nelson
     Library of Congress, Washington, DC - November 16 2009
There is no Time Dimension to HTTP, the Web


Resource state may evolve over time. Requiring a
  URI owner to publish a new URI for each change in
  resource state would lead to a significant number
  of broken references. For robustness, Web
  architecture promotes independence between an
  identifier and the state of the identified resource.

From: The Architecture of the World Wide Web, http://
  www.w3.org/TR/webarch/


                           Memento: Time Travel for the Web
                       Herbert Van de Sompel, Michael L. Nelson
               Library of Congress, Washington, DC - November 16 2009
Archived Resources Exist




            Memento: Time Travel for the Web
        Herbert Van de Sompel, Michael L. Nelson
Library of Congress, Washington, DC - November 16 2009
Sep 11 2001, 20:36:10 UTC                                                              Dec 20 2001, 4:51:00 UTC

                                     Archived Resources




                                                                        http://en.wikipedia.org/w/index.php?
http://web.archive.org/web/20010911203610/http://              title=September_11_attacks&oldid=282333 archived
www.cnn.com/ archived resource for http://cnn.com                     resource for http://en.wikipedia.org/wiki/
                                                                                September_11_attacks

                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
Finding Archived Resources




Go to http://www.archive.org/ and search                 On http://web.archive.org/web/*/http://cnn.com, select
              http://cnn.com                                               desired datetime


                                       Memento: Time Travel for the Web
                                   Herbert Van de Sompel, Michael L. Nelson
                           Library of Congress, Washington, DC - November 16 2009
Finding Archived Resources




                        Go to
http://en.wikipedia.org/wiki/September_11_attacks                                        Browse History
                  and click History

                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
Dec 20 2001, 4:51:00 UTC                                                                            current

                             Navigating Archived Resources




                                   Pentagon




         http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived
                                                                      http://en.wikipedia.org/wiki/The_Pentagon
       resource for http://en.wikipedia.org/wiki/
                September_11_attacks3

                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
Sep 11 2001, 20:36:10 UTC                                                                Sep 11 2001, 21:38:55 UTC

                          Navigating Archived Resources



    SPACE




http://web.archive.org/web/20010911203610/http://                   http://web.archive.org/web/20010911213855/
www.cnn.com/ archived resource for http://cnn.com                            www.cnn.com/TECH/space/


                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
Current and Past Web are Not Integrated




                   Memento: Time Travel for the Web
               Herbert Van de Sompel, Michael L. Nelson
       Library of Congress, Washington, DC - November 16 2009
This is Where Memento comes in …



Oct 11 2009, 05:30:33 UTC




                                        Memento: Time Travel for the Web
                                    Herbert Van de Sompel, Michael L. Nelson
                            Library of Congress, Washington, DC - November 16 2009
This is Where Memento comes in …

                                                                                        From LANL and ODU
                                                                                        transactional archives
                                                                                        Oct 11 2009, 00:00:01 UTC

                                                                                        Oct 10 2009, 18:00:01 UTC

                                                                                        Oct 10 2009, 16:00:01 UTC




                                                      Web Archiving              Oct 11 2009, 05:30:33 UTC
http://lanlsource.lanl.gov/
hello
Oct 11 2009, 05:30:33 UTC

                                           Memento: Time Travel for the Web
                                       Herbert Van de Sompel, Michael L. Nelson
                               Library of Congress, Washington, DC - November 16 2009
This is Where Memento comes in …

                                                                                         From Wikipedia History


                                                                                         Oct 01 2009, 16:30:00 UTC




                                           Robots Exclusion Protocol              Oct 11 2009, 05:30:33 UTC
http://en.wikipidea.org/wiki/
Web_Archiving
Oct 11 2009, 05:30:33 UTC

                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
This is Where Memento comes in …

                                                                                         From Wikipedia History


                                                                                         Sep 15 2009, 20:49:00 UTC




                                                Robots Exclusion                  Oct 11 2009, 05:30:33 UTC
http://en.wikipidea.org/wiki/
Robots_exclusion_protocol
Oct 11 2009, 05:30:33 UTC

                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
This is Where Memento comes in …

                                                                                          From Internet Archive


                                                                                          Nov 09 2007, 06:21:04 UTC




http://www.robotstxt.org/


Oct 11 2001, 05:30:33 UTC

                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
How does Memento do This?


In order to help understand how Memento introduces
   time travel for the Web, we present a brief recap of
   Transparent Content Negotiation (conneg) in HTTP.

RFC 2295. Transparent Content Negotiation in HTTP,
  http://www.ietf.org/rfc/rfc2295.txt




                           Memento: Time Travel for the Web
                       Herbert Van de Sompel, Michael L. Nelson
               Library of Congress, Washington, DC - November 16 2009
HTTP GET on URI A




            Memento: Time Travel for the Web
        Herbert Van de Sompel, Michael L. Nelson
Library of Congress, Washington, DC - November 16 2009
GET with conneg on URI T – Server Choice – 200 OK




                         Memento: Time Travel for the Web
                     Herbert Van de Sompel, Michael L. Nelson
             Library of Congress, Washington, DC - November 16 2009
GET with conneg on URI T – Server Choice – 302 Found – Step 1




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
GET with conneg on URI T – Server Choice – 302 Found – Step 2




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
GET with conneg on URI T – Server List – 406 Not Acceptable




                             Memento: Time Travel for the Web
                         Herbert Van de Sompel, Michael L. Nelson
                 Library of Congress, Washington, DC - November 16 2009
The Memento Solution


Now, we are ready to introduce the components of
  the Memento Solution:

•  Content Negotiation in the datetime dimension.
•  An API for archives that allows requesting a list of
   all archived versions it holds for a given URI.




                            Memento: Time Travel for the Web
                        Herbert Van de Sompel, Michael L. Nelson
                Library of Congress, Washington, DC - November 16 2009
Terminology Intermission

We introduce the term Memento to refer to an
 archived version of a resource.



A Memento for a resource URI-R (as it existed)
   at time ti is a resource URI-Mi [URI-R@ti] for
   which the representation at any moment
   past its creation time tc is the same as the
   representation that was available from URI-
   R at time ti, with tc <= ti. Implicit in this
   definition is the notion that, once created, a
   Memento always keeps the same
   representation.



                                   Memento: Time Travel for the Web
                               Herbert Van de Sompel, Michael L. Nelson
                       Library of Congress, Washington, DC - November 16 2009
DT-conneg: Content Negotiation in the datetime dimension

 •  RFC 2295 introduces conneg in the following dimensions: media
    type, language, compression, character set, e.g.:
         Accept-Language: en-US

 •  Memento introduces conneg in the datetime dimension:
       X-Accept-Datetime: {Mon, Oct 12 2009
         14:20:33 GMT}

 •  This means that somewhere, we will need transparently
    negotiable resources to get to appropriate Mementos.

 •  This will be discussed for 2 classes of servers.


                                Memento: Time Travel for the Web
                            Herbert Van de Sompel, Michael L. Nelson
                    Library of Congress, Washington, DC - November 16 2009
Class 1 Servers: With Internal Archival Capabilities

•  This type includes:
    o  Content Management Systems

    o  Version Control Systems

    o  TTApache

    o  Servers that archive resource representations in the cloud

       and keep track of the URIs and datetimes of remotely
       archived resources.

•  These servers have all the essential information (URI-Ms, and
   associated datetimes) to respond to a DT-conneg request.




                              Memento: Time Travel for the Web
                          Herbert Van de Sompel, Michael L. Nelson
                  Library of Congress, Washington, DC - November 16 2009
Dec 20 2001, 4:51:00 UTC


                                                                                         Dec 31 2004, 20:46:00 UTC
         current




http://en.wikipedia.org/wiki/
     September_11_attacks




                                                                                         Dec 20 2008, 22:21:00 UTC

                                                     http://en.wikipedia.org/w/index.php?
                                                     title=September_11_attacks&oldid=259237305

                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
original
                                                                     Mementos
resource




                        Memento: Time Travel for the Web
                    Herbert Van de Sompel, Michael L. Nelson
            Library of Congress, Washington, DC - November 16 2009
DT-conneg with URI-R to get URI-M


   original
                                                                         Mementos
  resource




transparently
                                                                           variant
  negotiable
                                                                         resources
   resource




                            Memento: Time Travel for the Web
                        Herbert Van de Sompel, Michael L. Nelson
                Library of Congress, Washington, DC - November 16 2009
Terminology Intermission

We introduce the term TimeGate to refer to a
 transparently negotiable resource that supports the
 datetime dimension.

A TimeGate for an original resource URI-R is a
   transparently negotiable resource URI-
   G[URI-R] for which all variant resources are
   Mementos URI-Mi[URI-R@ti] of the resource
   URI-R. Since multiple archives may host
   versions of URI-R, multiple TimeGates may
   exist for any given resource, i.e. one per
   archive.




                                 Memento: Time Travel for the Web
                             Herbert Van de Sompel, Michael L. Nelson
                     Library of Congress, Washington, DC - November 16 2009
DT-conneg with URI-G/URI-R to get URI-M


   original
                                                                                Mementos
  resource




                same




transparently
                                                                                  variant
  negotiable
                                                                                resources
   resource

 TimeGate




                                   Memento: Time Travel for the Web
                               Herbert Van de Sompel, Michael L. Nelson
                       Library of Congress, Washington, DC - November 16 2009
Servers With Internal Archival Capabilities: Successful Flow




                             Memento: Time Travel for the Web
                         Herbert Van de Sompel, Michael L. Nelson
                 Library of Congress, Washington, DC - November 16 2009
Servers With Internal Archival Capabilities: Other Scenarios




                                                  See http://www.mementoweb.org/guide/http/local




                             Memento: Time Travel for the Web
                         Herbert Van de Sompel, Michael L. Nelson
                 Library of Congress, Washington, DC - November 16 2009
Class 2 Servers: Without Internal Archival Capabilities

•  This type includes:
    o  Servers that are crawled by a web archive

    o  Servers with an associated transactional archive




•  These servers do not have the essential information (URI-Ms,
   and associated datetimes) to respond to a DT-conneg request.

•  But they can still be really constructive by redirecting (HTTP 302)
   a client to an archive that can respond to the DT-conneg request.




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Oct 04 2009, 12:00:01 UTC




          current

                                                                                       Oct 10 2009, 12:00:03 UTC




http://lanlsource.lanl.gov/
hello




                                                                                       Oct 21 2009, 12:00:01 UTC
                                                   http://mementoarchive.lanl.gov/store/ta/
                                                   20091021120001/http://lanlsource.lanl.gov/hello

                                          Memento: Time Travel for the Web
                                      Herbert Van de Sompel, Michael L. Nelson
                              Library of Congress, Washington, DC - November 16 2009
original                                                            Mementos
resource




                        Memento: Time Travel for the Web
                    Herbert Van de Sompel, Michael L. Nelson
            Library of Congress, Washington, DC - November 16 2009
DT-conneg with URI-G to get URI-M


 original    TimeGate                                                 Mementos
resource
            transparently                                               variant
              negotiable                                              resources
               resource




                         Memento: Time Travel for the Web
                     Herbert Van de Sompel, Michael L. Nelson
             Library of Congress, Washington, DC - November 16 2009
redirect                         DT-conneg with URI-G to get URI-M


 original               TimeGate                                                 Mementos
resource
                       transparently                                               variant
                         negotiable                                              resources
                          resource




                                    Memento: Time Travel for the Web
                                Herbert Van de Sompel, Michael L. Nelson
                        Library of Congress, Washington, DC - November 16 2009
How to redirect from Original Resource to its (external) TimeGate

    •  Q1: Which archive to redirect to?

        o    The archive with the best coverage for the server at hand.
              -  There are quite a few nuances, here.
        o    Always redirect to an Aggregator (see later)

    •  Q2: What is the TimeGate URI-G for URI-R on the chosen
       archive?

        o    Convention for syntax of URI-G as function of URI-R.
              -  http://web.archive.org/web/timegate/http://cnn.com
        o    Always redirect to an Aggregator (see later)


                                     Memento: Time Travel for the Web
                                 Herbert Van de Sompel, Michael L. Nelson
                         Library of Congress, Washington, DC - November 16 2009
Servers Without Internal Archival Capabilities: Successful Flow




                              Memento: Time Travel for the Web
                          Herbert Van de Sompel, Michael L. Nelson
                  Library of Congress, Washington, DC - November 16 2009
Servers Without Internal Archival Capabilities: Other Scenarios




                                                  See http://www.mementoweb.org/guide/http/remote




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
HTTP Response Headers for DT-conneg: Datetime Ranges

 •  X-Archive-Interval: Indicates the entire datetime interval
    for which the archival server has Mementos for URI-R.

 •  X-Datetime-Validity: Indicates the datetime interval during
    which the provided representation was valid.
     o  Can reliably be provided by transactional archives, CMS, …

     o  Can typically not reliably be provided by crawler-based

        archives.




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
The Memento Solution


We have covered this component of the Memento
 Solution:

•  Content Negotiation in the datetime dimension.

Now up to the next one:

•  An API for archives that allows requesting a list of
   all archived versions it holds for a given URI.


                            Memento: Time Travel for the Web
                        Herbert Van de Sompel, Michael L. Nelson
                Library of Congress, Washington, DC - November 16 2009
Why an API?

•  Mementos for any given
   URI-R are distributed
   across archives.

•  In order to get a correct
   perspective of available
   Mementos, different
   archives need to be
   consulted.

•  Can do so in distributed
   consultation mode
   (slooow), or by
   consulting an
   aggregator.
Terminology Intermission

We introduce the term TimeBundle to refer to a
 resource via which an overview of all Mementos for
 an original resource URI-R is available.

A TimeBundle for a resource URI-R, is a
   resource URI-B[URI-R] that is an
   aggregation of:

(a)  All Mementos URI-Mi [URI-R@ti] available
     from an archive,
(b)  The archive's TimeGate URI-G for URI-R,
(c)  The original resource URI-R itself.




                                  Memento: Time Travel for the Web
                              Herbert Van de Sompel, Michael L. Nelson
                      Library of Congress, Washington, DC - November 16 2009
Memento: Time Travel for the Web
        Herbert Van de Sompel, Michael L. Nelson
Library of Congress, Washington, DC - November 16 2009
Memento DT-conneg component




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Memento DT-conneg component




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Memento DT-conneg component                              Memento discovery component




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
HTTP Response Headers for DT-conneg: All Mementos

•  Alternates: RFC 2295 requires listing all variant resources.
    o  Impractical for DT-conneg: many variants may exist.

    o  Alternates lists limited amount of variants, centered on the

       datetime requested by the client.

•  Link: To compensate for the incomplete list of variants in
   Alternates, an HTTP Link header points to the TimeBundle via
   which a list is available of all variant resources (Mementos), and
   their associated metadata.

•  Example TimeMap in RDF/XML:
    o  http://www.mementoweb.org/guide/api/map1.rdf



                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Memento DT-conneg component                              Memento discovery component




                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
All Mementos: For Discovery, Cross-Archive Services

•  Archive uses common approaches to make TimeBundles/
   TimeMaps discoverable:
    o  SiteMaps,

    o  Atom Feeds,

    o  OAI-PMH.




•  Aggregator harvests and merges TimeMaps. Based on this
   information, the Aggregator exposes its own TimeGates.
    o  Cross-archive

    o  Finer datetime granularity

    o  Better chances of matching a client’s datetime preference.

    o  Can become a shared target for redirection for many web

       servers.

                              Memento: Time Travel for the Web
                          Herbert Van de Sompel, Michael L. Nelson
                  Library of Congress, Washington, DC - November 16 2009
Aggregation of Archival Metadata                                                                                       Archive A

                                                                         A                             D
                                                                         t1                            t9
                                                                  A                            D
                                                                               A                             D
                                                                  t7                           t0
                                                                               t3                           t11



                                                                         B-1           B-2            B-3                B-4
                                                                       (for A)       (for C)        (for D)            (for E)




                                                                   B-1:                                                  B-8:

                                                                  A@t1                                                  A@t2
                                                                  A@t3                                                  A@t4
                                                                  A@t7                                                  A@t5




                                                                         B-5           B-6        B-7                    B-8
                                                                       (for D)       (for F)    (for G)                (for A)
     Exposed archival metadata per Memento:
=> URI of Memento in archive
=> Datetime of Memento                                                    D                                               A
                                                                          t6                                              t2
=>   media type, extent, language                                   D                                             A
=>   digest                                                                     D                                                A
                                                                   t12                                            t4
=>   Validity-Datetime-Interval                                                t20                                               t5
=>   # times the representation was served
=>   estimate # inlinks for representation
                                                                                                                  Archive B




                                         Memento: Time Travel for the Web
                                     Herbert Van de Sompel, Michael L. Nelson
                             Library of Congress, Washington, DC - November 16 2009
Aggregation of Archival Metadata                                                                                             Archive A

                                                                          A                                  D
                                                                          t1                                 t9
                                                                    A                                D
                                                                                 A                                 D
                                                                    t7                               t0
                                                                                 t3                               t11



                                                                           B-1           B-2                B-3                B-4
                                                                         (for A)       (for C)            (for D)            (for E)




                               A@t1 - Archive A
                               A@t2 - Archive B                     B-1:                                                       B-8:
                               A@t3 - Archive A
                               A@t4 - Archive B                     A@t1                                                      A@t2
                               A@t5 - Archive B           harvest   A@t3                         harvest                      A@t4
                               A@t7 - Archive A                     A@t7                                                      A@t5



                                             Aggregator
                                              Gateway



                                                                           B-5           B-6             B-7                   B-8
                                                                         (for D)       (for F)         (for G)               (for A)
     Exposed archival metadata per Memento:
=> URI of Memento in archive
=> Datetime of Memento                                                      D                                                   A
                                                                            t6                                                  t2
=>   media type, extent, language                                    D                                                  A
=>   digest                                                                       D                                                    A
                                                                    t12                                                 t4
=>   Validity-Datetime-Interval                                                  t20                                                   t5
=>   # times the representation was served
=>   estimate # inlinks for representation
                                                                                                                        Archive B




                                         Memento: Time Travel for the Web
                                     Herbert Van de Sompel, Michael L. Nelson
                             Library of Congress, Washington, DC - November 16 2009
Leveraging the aggregated                                                                                          Archive A
archival metadata                                                                                  D
                                                                    A
for time travel                                                     t1                             t9
                                                              A                            D
                                                                           A                             D
                                                              t7                           t0
                                                                           t3                           t11



                                                                     B-1           B-2            B-3                B-4
                                                                   (for A)       (for C)        (for D)            (for E)




                                          A@t1 - Archive A
                                          A@t2 - Archive B
                                          A@t3 - Archive A
                                          A@t4 - Archive B
                                          A@t5 - Archive B
                                 G        A@t7 - Archive A




                                             TimeBundle
                                             Aggregator



                                                                     B-5           B-6            B-7                B-8
                                                                   (for D)       (for F)        (for G)            (for A)



                                                                      D                                               A
                                                                      t6                                              t2
                                                               D                                              A
                                                                            D                                                A
                                                              t12                                             t4
                                                                           t20                                               t5

                                                                                                              Archive B




                                   Memento: Time Travel for the Web
                               Herbert Van de Sompel, Michael L. Nelson
                       Library of Congress, Washington, DC - November 16 2009
Leveraging the aggregated                                                                                           Archive A
archival metadata                                                                                   D
                                                                     A
for time travel                                                      t1                             t9
                                                               A                            D
                                                                            A                             D
                                                               t7                           t0
                                                                            t3                           t11

                                      302 Found
                                      DT-conneg                       B-1           B-2            B-3                B-4
                                                                    (for A)       (for C)        (for D)            (for E)




                                            A@t1 - Archive A
                                            A@t2 - Archive B
                                            A@t3 - Archive A
  DT-               302                     A@t4 - Archive B
conneg     R       Found          G
                                            A@t5 - Archive B
                                            A@t7 - Archive A




                                               TimeBundle
            Source Server                      Aggregator



                                                                      B-5           B-6            B-7                B-8
                                                                    (for D)       (for F)        (for G)            (for A)



                                                                       D                                               A
                                        Alternates                     t6                                              t2
                                                                D                                              A
                                                                             D                                                A
                                                               t12                                             t4
                                                                            t20                                               t5

                                                                                                               Archive B




                                  Memento: Time Travel for the Web
                              Herbert Van de Sompel, Michael L. Nelson
                      Library of Congress, Washington, DC - November 16 2009
The Memento Solution


We have covered both components of the Memento
 Solution:

•  Content Negotiation in the datetime dimension.
•  An API for archives that allows requesting a list of
   all archived versions it holds for a given URI.

Up to some show-off now …




                            Memento: Time Travel for the Web
                        Herbert Van de Sompel, Michael L. Nelson
                Library of Congress, Washington, DC - November 16 2009
The Memento Experiment

•  Servers at LANL and ODU:
    •  Support of 302 redirect upon
       detection of DT-conneg header
    •  Redirection is to respective
       transactional archive per server.
       These servers support TimeGates,
       TimeBundles

•  Great illustration of the distributed
   nature of the Memento approach.
current

http://lanlsource.lanl.gov/
hello




   current                                                                                              current

http://lanlsource.lanl.gov/                                                            http:/odusource.cs.odu.edu/
pics/picoftheday.png                                                                           pics/picoftheday.png




                                          Memento: Time Travel for the Web
                                      Herbert Van de Sompel, Michael L. Nelson
                              Library of Congress, Washington, DC - November 16 2009
Oct 04 2009, 22:12:33 UTC

http://lanlsource.lanl.gov/
hello




  Oct 04 2009, 22:12:33 UTC                                                            Oct 04 2009, 22:12:33 UTC

http://lanlsource.lanl.gov/                                                              http:/odusource.cs.odu.edu/
pics/picoftheday.png                                                                             pics/picoftheday.png




                                          Memento: Time Travel for the Web
                                      Herbert Van de Sompel, Michael L. Nelson
                              Library of Congress, Washington, DC - November 16 2009
Oct 04 2009, 22:12:33 UTC

http://lanlsource.lanl.gov/
hello

Redirect to TimeGate LANL TA




  Oct 04 2009, 22:12:33 UTC                                                              Oct 04 2009, 22:12:33 UTC

http://lanlsource.lanl.gov/                                                                http:/odusource.cs.odu.edu/
pics/picoftheday.png                                                                               pics/picoftheday.png
Redirect to TimeGate LANL TA                                                            Redirect to TimeGate ODU TA




                                           Memento: Time Travel for the Web
                                       Herbert Van de Sompel, Michael L. Nelson
                               Library of Congress, Washington, DC - November 16 2009
http://mementoarchive.lanl.gov/
store/ta/20091004120001/
http://lanlsource.lanl.gov/
hello




http://mementoarchive.lanl.gov/                                                                                    http://
store/ta/20091004180135/                                                                   mementoarchive.cs.odu.edu/
http://lanlsource.lanl.gov/                                                                  store/ta/20091004160013/
pics/picoftheday.png                                                                        http:/odusource.cs.odu.edu/
                                                                                                    pics/picoftheday.png




                                              Memento: Time Travel for the Web
                                          Herbert Van de Sompel, Michael L. Nelson
                                  Library of Congress, Washington, DC - November 16 2009
The Memento Experiment

•  Servers at Library of Congress:
    •  Support of 302 redirect upon
       detection of DT-conneg header
    •  Redirection is to an aggregator that
       support TimeGates, TimeBundles.
    •  Aggregator collects (dynamically,
       screen scraping) metadata from IA,
       Archive-It, WebCite, Canadian
       Archive.
current

http://digitalpreservation.gov




                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
Oct 04 2009, 22:12:33 UTC

http://digitalpreservation.gov




                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
Oct 04 2009, 22:12:33 UTC

http://digitalpreservation.gov


Redirect to TimeGate Aggregator




                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
Sep 28 2009, 17:14:05 UTC

http://digitalpreservation.gov




                                                                                          http://wayback.archive-it.org/
                                                                                                1610/20090928171405/
                                                                                                                 http://
                                                                                           www.digitalpreservation.gov




                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
The Memento Experiment

•  Wikipedia:
   •  No support of 302 redirect upon
       detection of DT-conneg header
   •  Memento client intercepts the
       “unexpected” 200 OK response.
   •  Client requests from Wikipedia Proxy
       that supports TimeGates,
       TimeBundles.
   •  TimeGate on Wikipedia Proxy
       redirects client to Memento in
       Wikipedia.

   •  Also created Memento plug-in for
      Mediawiki. Adoption currently under
      discussion.
      http://www.mediawiki.org/wiki/Extension:Memento
current

http://en.wikipedia.org/wiki/Clocks




                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
Nov 02 2007, 14:12:00 UTC

http://en.wikipedia.org/wiki/Clocks

      Unexpected response.




                                             Memento: Time Travel for the Web
                                         Herbert Van de Sompel, Michael L. Nelson
                                 Library of Congress, Washington, DC - November 16 2009
Nov 02 2007, 14:12:00 UTC

http://en.wikipedia.org/wiki/Clocks

   Client requests directly from
   TimeGate at Wikipedia Proxy




                                               Memento: Time Travel for the Web
                                           Herbert Van de Sompel, Michael L. Nelson
                                   Library of Congress, Washington, DC - November 16 2009
Oct 31 2007, 21:03:00 UTC

http://en.wikipedia.org/w/index.php?
oldid=168376483




                                            Memento: Time Travel for the Web
                                        Herbert Van de Sompel, Michael L. Nelson
                                Library of Congress, Washington, DC - November 16 2009
Discussion: Memento and Lost Causes (1)

•  URI-R vanishes, but the server that used to serve it is still
   operational:

    o    In this case, the server should still issue the redirect to a
         TimeGate upon detection of the DT-conneg request.
    o    This allows seamless access to a Memento of URI-R, even if
         the server no longer hosts the original.




                                 Memento: Time Travel for the Web
                             Herbert Van de Sompel, Michael L. Nelson
                     Library of Congress, Washington, DC - November 16 2009
Discussion: Memento and Lost Causes (2)

•  A domain vanishes:

   o    The client is looking for a current representation of URI-R that
        was hosted by the domain, but fails.
   o    The client resorts to interaction with archives (or with a
        TimeBundle aggregator) and arrives at the most recent
        Memento of the resource.




                                Memento: Time Travel for the Web
                            Herbert Van de Sompel, Michael L. Nelson
                    Library of Congress, Washington, DC - November 16 2009
Discussion: Memento and Lost Causes (3)

•  A domain is taken over by a new custodian:

   o    The new custodian adheres to other policies regarding which
        archive to redirect a DT-conneg request.
   o    The client understands from the X-Archive-Interval
        returned by that archive of choice, that it does not cover the
        time range in which the previous custodian operated the
        domain.
   o    The client resorts to interaction with other archives (or with a
        TimeBundle aggregator) and arrives at an appropriate
        Memento.




                                Memento: Time Travel for the Web
                            Herbert Van de Sompel, Michael L. Nelson
                    Library of Congress, Washington, DC - November 16 2009
Discussion: Memento and Caching

•  Caches do not take X-Accept-Datetime header into account.

•  Hence, in order to avoid retrieving current representation of URI-
   R, caches between client and server (included) must be
   bypassed when doing datetime content negotiation.

•  Currently enforced by:

    o    Cache-Control: no-cache => force cache revalidation
    o    If-Modified-Since: Thu, 01 Jan 1970 00:00:00
         GMT => make sure that revalidation fails

•  Clearly needs a more elegant solution.

                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Discussion: Memento and Web Archives

•  Web Archives rewrite URLs in archived pages, in order to avoid:

    o    Serving current representations of embedded resources;
    o    Linking to current representations of resources

•  The upside: Archived pages are self-contained.

•  The downside: Cannot navigate beyond the archive’s content,
   even if other archives may have archived version of embedded
   or linked resource.

•  Would be interesting to explore novel strategies with this regard.



                                Memento: Time Travel for the Web
                            Herbert Van de Sompel, Michael L. Nelson
                    Library of Congress, Washington, DC - November 16 2009
If You Think Memento is Cool …

•  Install Apache rewrite rule that redirects when X-Accept-
   Datetime is present.
    o  http://mementoweb.org/tools/apache




•  Join memento-dev Google Group
    o  http://groups.google.com/group/memento-dev




•  Implement Memento natively for a CMS platform.
    o  http://mementoweb.org/guide/http/local




•  Use ModifyHeaders FireFox extension to test.

•  Soon: Memento FireFox plug-in.

                               Memento: Time Travel for the Web
                           Herbert Van de Sompel, Michael L. Nelson
                   Library of Congress, Washington, DC - November 16 2009
Memento wants to make Browsing the Past Easy




Watch a video at http://www.youtube.com/watch?v=LnkBp-FfoJw

                              Memento: Time Travel for the Web
                          Herbert Van de Sompel, Michael L. Nelson
                  Library of Congress, Washington, DC - November 16 2009

More Related Content

Viewers also liked

Open Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeHerbert Van de Sompel
 
The bX project: Federating and Mining Usage Logs from Linking Servers
The bX project: Federating and Mining Usage Logs from Linking ServersThe bX project: Federating and Mining Usage Logs from Linking Servers
The bX project: Federating and Mining Usage Logs from Linking ServersHerbert Van de Sompel
 
An HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataAn HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataHerbert Van de Sompel
 
The Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationThe Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationHerbert Van de Sompel
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumRobert Sanderson
 
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastMemento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastHerbert Van de Sompel
 
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...Herbert Van de Sompel
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference LinkingHerbert Van de Sompel
 
Designing for Time Travel: When Responsive Design Is Not Enough
Designing for Time Travel: When Responsive Design Is Not EnoughDesigning for Time Travel: When Responsive Design Is Not Enough
Designing for Time Travel: When Responsive Design Is Not EnoughBurin Asavesna
 
The Physics Of Time Travel
The Physics Of Time TravelThe Physics Of Time Travel
The Physics Of Time Travelguest433bdee
 

Viewers also liked (20)

Open Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & Exchange
 
the UPS protoproto project
the UPS protoproto projectthe UPS protoproto project
the UPS protoproto project
 
The bX project: Federating and Mining Usage Logs from Linking Servers
The bX project: Federating and Mining Usage Logs from Linking ServersThe bX project: Federating and Mining Usage Logs from Linking Servers
The bX project: Federating and Mining Usage Logs from Linking Servers
 
An HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataAn HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked Data
 
The Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationThe Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communication
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall Forum
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastMemento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
 
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference Linking
 
Untitled I: Challenges ahead
Untitled I: Challenges aheadUntitled I: Challenges ahead
Untitled I: Challenges ahead
 
Time travel
Time travelTime travel
Time travel
 
Designing for Time Travel: When Responsive Design Is Not Enough
Designing for Time Travel: When Responsive Design Is Not EnoughDesigning for Time Travel: When Responsive Design Is Not Enough
Designing for Time Travel: When Responsive Design Is Not Enough
 
The Physics Of Time Travel
The Physics Of Time TravelThe Physics Of Time Travel
The Physics Of Time Travel
 

More from Herbert Van de Sompel

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly recordHerbert Van de Sompel
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
 

More from Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly record
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
Memento 101
Memento 101Memento 101
Memento 101
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Memento: Time Travel for the Web

  • 1. Memento: Time Travel for the Web http://www.mementoweb.org Herbert Van de Sompel – hvdsomp@gmail.com Michael L. Nelson – mln@cs.odu.edu The Memento Experiment was partly funded by the Library of Congress Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 2. Acknowledgments •  At the Los Alamos National Laboratory, Prototyping Team: o  Robert Sanderson o  Lyudmilla Balakireva o  Harihar Shankar •  At Old Dominion University, Web Science and Digital Library Research Group: o  Scott Ainsworth Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 3. Looking at the Past can be Fun Feb 14 2006 Cheney prays for hunt victim Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 4. Looking at the Past can be Fun Feb 14 2006 Press Attacks Cheney Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 5. And Memento wants to make it Easy Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 6. W3C Web Architecture: Resource – URI - Representation dereference URI Identifies Resource Represents Representation Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 7. W3C Web Architecture: Resource – URI - Representation dereference content negotiation URI Identifies Resource Represents Representation 1 Represents Representation 2 Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 8. Resources Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 9. Resources have Representations Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 10. Resources have Representations that Change over Time Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 11. Only the Current Representation is Available from a Resource Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 12. Old Representations are Lost Forever Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 13. There is no Time Dimension to HTTP, the Web Resource state may evolve over time. Requiring a URI owner to publish a new URI for each change in resource state would lead to a significant number of broken references. For robustness, Web architecture promotes independence between an identifier and the state of the identified resource. From: The Architecture of the World Wide Web, http:// www.w3.org/TR/webarch/ Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 14. Archived Resources Exist Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 15. Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC Archived Resources http://en.wikipedia.org/w/index.php? http://web.archive.org/web/20010911203610/http:// title=September_11_attacks&oldid=282333 archived www.cnn.com/ archived resource for http://cnn.com resource for http://en.wikipedia.org/wiki/ September_11_attacks Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 16. Finding Archived Resources Go to http://www.archive.org/ and search On http://web.archive.org/web/*/http://cnn.com, select http://cnn.com desired datetime Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 17. Finding Archived Resources Go to http://en.wikipedia.org/wiki/September_11_attacks Browse History and click History Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 18. Dec 20 2001, 4:51:00 UTC current Navigating Archived Resources Pentagon http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=282333 archived http://en.wikipedia.org/wiki/The_Pentagon resource for http://en.wikipedia.org/wiki/ September_11_attacks3 Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 19. Sep 11 2001, 20:36:10 UTC Sep 11 2001, 21:38:55 UTC Navigating Archived Resources SPACE http://web.archive.org/web/20010911203610/http:// http://web.archive.org/web/20010911213855/ www.cnn.com/ archived resource for http://cnn.com www.cnn.com/TECH/space/ Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 20. Current and Past Web are Not Integrated Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 21. This is Where Memento comes in … Oct 11 2009, 05:30:33 UTC Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 22. This is Where Memento comes in … From LANL and ODU transactional archives Oct 11 2009, 00:00:01 UTC Oct 10 2009, 18:00:01 UTC Oct 10 2009, 16:00:01 UTC Web Archiving Oct 11 2009, 05:30:33 UTC http://lanlsource.lanl.gov/ hello Oct 11 2009, 05:30:33 UTC Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 23. This is Where Memento comes in … From Wikipedia History Oct 01 2009, 16:30:00 UTC Robots Exclusion Protocol Oct 11 2009, 05:30:33 UTC http://en.wikipidea.org/wiki/ Web_Archiving Oct 11 2009, 05:30:33 UTC Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 24. This is Where Memento comes in … From Wikipedia History Sep 15 2009, 20:49:00 UTC Robots Exclusion Oct 11 2009, 05:30:33 UTC http://en.wikipidea.org/wiki/ Robots_exclusion_protocol Oct 11 2009, 05:30:33 UTC Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 25. This is Where Memento comes in … From Internet Archive Nov 09 2007, 06:21:04 UTC http://www.robotstxt.org/ Oct 11 2001, 05:30:33 UTC Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 26. How does Memento do This? In order to help understand how Memento introduces time travel for the Web, we present a brief recap of Transparent Content Negotiation (conneg) in HTTP. RFC 2295. Transparent Content Negotiation in HTTP, http://www.ietf.org/rfc/rfc2295.txt Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 27. HTTP GET on URI A Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 28. GET with conneg on URI T – Server Choice – 200 OK Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 29. GET with conneg on URI T – Server Choice – 302 Found – Step 1 Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 30. GET with conneg on URI T – Server Choice – 302 Found – Step 2 Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 31. GET with conneg on URI T – Server List – 406 Not Acceptable Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 32. The Memento Solution Now, we are ready to introduce the components of the Memento Solution: •  Content Negotiation in the datetime dimension. •  An API for archives that allows requesting a list of all archived versions it holds for a given URI. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 33. Terminology Intermission We introduce the term Memento to refer to an archived version of a resource. A Memento for a resource URI-R (as it existed) at time ti is a resource URI-Mi [URI-R@ti] for which the representation at any moment past its creation time tc is the same as the representation that was available from URI- R at time ti, with tc <= ti. Implicit in this definition is the notion that, once created, a Memento always keeps the same representation. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 34. DT-conneg: Content Negotiation in the datetime dimension •  RFC 2295 introduces conneg in the following dimensions: media type, language, compression, character set, e.g.: Accept-Language: en-US •  Memento introduces conneg in the datetime dimension: X-Accept-Datetime: {Mon, Oct 12 2009 14:20:33 GMT} •  This means that somewhere, we will need transparently negotiable resources to get to appropriate Mementos. •  This will be discussed for 2 classes of servers. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 35. Class 1 Servers: With Internal Archival Capabilities •  This type includes: o  Content Management Systems o  Version Control Systems o  TTApache o  Servers that archive resource representations in the cloud and keep track of the URIs and datetimes of remotely archived resources. •  These servers have all the essential information (URI-Ms, and associated datetimes) to respond to a DT-conneg request. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 36. Dec 20 2001, 4:51:00 UTC Dec 31 2004, 20:46:00 UTC current http://en.wikipedia.org/wiki/ September_11_attacks Dec 20 2008, 22:21:00 UTC http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=259237305 Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 37. original Mementos resource Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 38. DT-conneg with URI-R to get URI-M original Mementos resource transparently variant negotiable resources resource Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 39. Terminology Intermission We introduce the term TimeGate to refer to a transparently negotiable resource that supports the datetime dimension. A TimeGate for an original resource URI-R is a transparently negotiable resource URI- G[URI-R] for which all variant resources are Mementos URI-Mi[URI-R@ti] of the resource URI-R. Since multiple archives may host versions of URI-R, multiple TimeGates may exist for any given resource, i.e. one per archive. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 40. DT-conneg with URI-G/URI-R to get URI-M original Mementos resource same transparently variant negotiable resources resource TimeGate Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 41. Servers With Internal Archival Capabilities: Successful Flow Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 42. Servers With Internal Archival Capabilities: Other Scenarios See http://www.mementoweb.org/guide/http/local Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 43. Class 2 Servers: Without Internal Archival Capabilities •  This type includes: o  Servers that are crawled by a web archive o  Servers with an associated transactional archive •  These servers do not have the essential information (URI-Ms, and associated datetimes) to respond to a DT-conneg request. •  But they can still be really constructive by redirecting (HTTP 302) a client to an archive that can respond to the DT-conneg request. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 44. Oct 04 2009, 12:00:01 UTC current Oct 10 2009, 12:00:03 UTC http://lanlsource.lanl.gov/ hello Oct 21 2009, 12:00:01 UTC http://mementoarchive.lanl.gov/store/ta/ 20091021120001/http://lanlsource.lanl.gov/hello Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 45. original Mementos resource Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 46. DT-conneg with URI-G to get URI-M original TimeGate Mementos resource transparently variant negotiable resources resource Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 47. redirect DT-conneg with URI-G to get URI-M original TimeGate Mementos resource transparently variant negotiable resources resource Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 48. How to redirect from Original Resource to its (external) TimeGate •  Q1: Which archive to redirect to? o  The archive with the best coverage for the server at hand. -  There are quite a few nuances, here. o  Always redirect to an Aggregator (see later) •  Q2: What is the TimeGate URI-G for URI-R on the chosen archive? o  Convention for syntax of URI-G as function of URI-R. -  http://web.archive.org/web/timegate/http://cnn.com o  Always redirect to an Aggregator (see later) Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 49. Servers Without Internal Archival Capabilities: Successful Flow Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 50. Servers Without Internal Archival Capabilities: Other Scenarios See http://www.mementoweb.org/guide/http/remote Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 51. HTTP Response Headers for DT-conneg: Datetime Ranges •  X-Archive-Interval: Indicates the entire datetime interval for which the archival server has Mementos for URI-R. •  X-Datetime-Validity: Indicates the datetime interval during which the provided representation was valid. o  Can reliably be provided by transactional archives, CMS, … o  Can typically not reliably be provided by crawler-based archives. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 52. The Memento Solution We have covered this component of the Memento Solution: •  Content Negotiation in the datetime dimension. Now up to the next one: •  An API for archives that allows requesting a list of all archived versions it holds for a given URI. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 53. Why an API? •  Mementos for any given URI-R are distributed across archives. •  In order to get a correct perspective of available Mementos, different archives need to be consulted. •  Can do so in distributed consultation mode (slooow), or by consulting an aggregator.
  • 54. Terminology Intermission We introduce the term TimeBundle to refer to a resource via which an overview of all Mementos for an original resource URI-R is available. A TimeBundle for a resource URI-R, is a resource URI-B[URI-R] that is an aggregation of: (a)  All Mementos URI-Mi [URI-R@ti] available from an archive, (b)  The archive's TimeGate URI-G for URI-R, (c)  The original resource URI-R itself. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 55. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 56. Memento DT-conneg component Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 57. Memento DT-conneg component Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 58. Memento DT-conneg component Memento discovery component Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 59. HTTP Response Headers for DT-conneg: All Mementos •  Alternates: RFC 2295 requires listing all variant resources. o  Impractical for DT-conneg: many variants may exist. o  Alternates lists limited amount of variants, centered on the datetime requested by the client. •  Link: To compensate for the incomplete list of variants in Alternates, an HTTP Link header points to the TimeBundle via which a list is available of all variant resources (Mementos), and their associated metadata. •  Example TimeMap in RDF/XML: o  http://www.mementoweb.org/guide/api/map1.rdf Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 60. Memento DT-conneg component Memento discovery component Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 61. All Mementos: For Discovery, Cross-Archive Services •  Archive uses common approaches to make TimeBundles/ TimeMaps discoverable: o  SiteMaps, o  Atom Feeds, o  OAI-PMH. •  Aggregator harvests and merges TimeMaps. Based on this information, the Aggregator exposes its own TimeGates. o  Cross-archive o  Finer datetime granularity o  Better chances of matching a client’s datetime preference. o  Can become a shared target for redirection for many web servers. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 62. Aggregation of Archival Metadata Archive A A D t1 t9 A D A D t7 t0 t3 t11 B-1 B-2 B-3 B-4 (for A) (for C) (for D) (for E) B-1: B-8: A@t1 A@t2 A@t3 A@t4 A@t7 A@t5 B-5 B-6 B-7 B-8 (for D) (for F) (for G) (for A) Exposed archival metadata per Memento: => URI of Memento in archive => Datetime of Memento D A t6 t2 => media type, extent, language D A => digest D A t12 t4 => Validity-Datetime-Interval t20 t5 => # times the representation was served => estimate # inlinks for representation Archive B Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 63. Aggregation of Archival Metadata Archive A A D t1 t9 A D A D t7 t0 t3 t11 B-1 B-2 B-3 B-4 (for A) (for C) (for D) (for E) A@t1 - Archive A A@t2 - Archive B B-1: B-8: A@t3 - Archive A A@t4 - Archive B A@t1 A@t2 A@t5 - Archive B harvest A@t3 harvest A@t4 A@t7 - Archive A A@t7 A@t5 Aggregator Gateway B-5 B-6 B-7 B-8 (for D) (for F) (for G) (for A) Exposed archival metadata per Memento: => URI of Memento in archive => Datetime of Memento D A t6 t2 => media type, extent, language D A => digest D A t12 t4 => Validity-Datetime-Interval t20 t5 => # times the representation was served => estimate # inlinks for representation Archive B Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 64. Leveraging the aggregated Archive A archival metadata D A for time travel t1 t9 A D A D t7 t0 t3 t11 B-1 B-2 B-3 B-4 (for A) (for C) (for D) (for E) A@t1 - Archive A A@t2 - Archive B A@t3 - Archive A A@t4 - Archive B A@t5 - Archive B G A@t7 - Archive A TimeBundle Aggregator B-5 B-6 B-7 B-8 (for D) (for F) (for G) (for A) D A t6 t2 D A D A t12 t4 t20 t5 Archive B Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 65. Leveraging the aggregated Archive A archival metadata D A for time travel t1 t9 A D A D t7 t0 t3 t11 302 Found DT-conneg B-1 B-2 B-3 B-4 (for A) (for C) (for D) (for E) A@t1 - Archive A A@t2 - Archive B A@t3 - Archive A DT- 302 A@t4 - Archive B conneg R Found G A@t5 - Archive B A@t7 - Archive A TimeBundle Source Server Aggregator B-5 B-6 B-7 B-8 (for D) (for F) (for G) (for A) D A Alternates t6 t2 D A D A t12 t4 t20 t5 Archive B Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 66. The Memento Solution We have covered both components of the Memento Solution: •  Content Negotiation in the datetime dimension. •  An API for archives that allows requesting a list of all archived versions it holds for a given URI. Up to some show-off now … Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 67. The Memento Experiment •  Servers at LANL and ODU: •  Support of 302 redirect upon detection of DT-conneg header •  Redirection is to respective transactional archive per server. These servers support TimeGates, TimeBundles •  Great illustration of the distributed nature of the Memento approach.
  • 68. current http://lanlsource.lanl.gov/ hello current current http://lanlsource.lanl.gov/ http:/odusource.cs.odu.edu/ pics/picoftheday.png pics/picoftheday.png Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 69. Oct 04 2009, 22:12:33 UTC http://lanlsource.lanl.gov/ hello Oct 04 2009, 22:12:33 UTC Oct 04 2009, 22:12:33 UTC http://lanlsource.lanl.gov/ http:/odusource.cs.odu.edu/ pics/picoftheday.png pics/picoftheday.png Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 70. Oct 04 2009, 22:12:33 UTC http://lanlsource.lanl.gov/ hello Redirect to TimeGate LANL TA Oct 04 2009, 22:12:33 UTC Oct 04 2009, 22:12:33 UTC http://lanlsource.lanl.gov/ http:/odusource.cs.odu.edu/ pics/picoftheday.png pics/picoftheday.png Redirect to TimeGate LANL TA Redirect to TimeGate ODU TA Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 71. http://mementoarchive.lanl.gov/ store/ta/20091004120001/ http://lanlsource.lanl.gov/ hello http://mementoarchive.lanl.gov/ http:// store/ta/20091004180135/ mementoarchive.cs.odu.edu/ http://lanlsource.lanl.gov/ store/ta/20091004160013/ pics/picoftheday.png http:/odusource.cs.odu.edu/ pics/picoftheday.png Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 72. The Memento Experiment •  Servers at Library of Congress: •  Support of 302 redirect upon detection of DT-conneg header •  Redirection is to an aggregator that support TimeGates, TimeBundles. •  Aggregator collects (dynamically, screen scraping) metadata from IA, Archive-It, WebCite, Canadian Archive.
  • 73. current http://digitalpreservation.gov Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 74. Oct 04 2009, 22:12:33 UTC http://digitalpreservation.gov Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 75. Oct 04 2009, 22:12:33 UTC http://digitalpreservation.gov Redirect to TimeGate Aggregator Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 76. Sep 28 2009, 17:14:05 UTC http://digitalpreservation.gov http://wayback.archive-it.org/ 1610/20090928171405/ http:// www.digitalpreservation.gov Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 77. The Memento Experiment •  Wikipedia: •  No support of 302 redirect upon detection of DT-conneg header •  Memento client intercepts the “unexpected” 200 OK response. •  Client requests from Wikipedia Proxy that supports TimeGates, TimeBundles. •  TimeGate on Wikipedia Proxy redirects client to Memento in Wikipedia. •  Also created Memento plug-in for Mediawiki. Adoption currently under discussion. http://www.mediawiki.org/wiki/Extension:Memento
  • 78. current http://en.wikipedia.org/wiki/Clocks Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 79. Nov 02 2007, 14:12:00 UTC http://en.wikipedia.org/wiki/Clocks Unexpected response. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 80. Nov 02 2007, 14:12:00 UTC http://en.wikipedia.org/wiki/Clocks Client requests directly from TimeGate at Wikipedia Proxy Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 81. Oct 31 2007, 21:03:00 UTC http://en.wikipedia.org/w/index.php? oldid=168376483 Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 82. Discussion: Memento and Lost Causes (1) •  URI-R vanishes, but the server that used to serve it is still operational: o  In this case, the server should still issue the redirect to a TimeGate upon detection of the DT-conneg request. o  This allows seamless access to a Memento of URI-R, even if the server no longer hosts the original. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 83. Discussion: Memento and Lost Causes (2) •  A domain vanishes: o  The client is looking for a current representation of URI-R that was hosted by the domain, but fails. o  The client resorts to interaction with archives (or with a TimeBundle aggregator) and arrives at the most recent Memento of the resource. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 84. Discussion: Memento and Lost Causes (3) •  A domain is taken over by a new custodian: o  The new custodian adheres to other policies regarding which archive to redirect a DT-conneg request. o  The client understands from the X-Archive-Interval returned by that archive of choice, that it does not cover the time range in which the previous custodian operated the domain. o  The client resorts to interaction with other archives (or with a TimeBundle aggregator) and arrives at an appropriate Memento. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 85. Discussion: Memento and Caching •  Caches do not take X-Accept-Datetime header into account. •  Hence, in order to avoid retrieving current representation of URI- R, caches between client and server (included) must be bypassed when doing datetime content negotiation. •  Currently enforced by: o  Cache-Control: no-cache => force cache revalidation o  If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT => make sure that revalidation fails •  Clearly needs a more elegant solution. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 86. Discussion: Memento and Web Archives •  Web Archives rewrite URLs in archived pages, in order to avoid: o  Serving current representations of embedded resources; o  Linking to current representations of resources •  The upside: Archived pages are self-contained. •  The downside: Cannot navigate beyond the archive’s content, even if other archives may have archived version of embedded or linked resource. •  Would be interesting to explore novel strategies with this regard. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 87. If You Think Memento is Cool … •  Install Apache rewrite rule that redirects when X-Accept- Datetime is present. o  http://mementoweb.org/tools/apache •  Join memento-dev Google Group o  http://groups.google.com/group/memento-dev •  Implement Memento natively for a CMS platform. o  http://mementoweb.org/guide/http/local •  Use ModifyHeaders FireFox extension to test. •  Soon: Memento FireFox plug-in. Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009
  • 88. Memento wants to make Browsing the Past Easy Watch a video at http://www.youtube.com/watch?v=LnkBp-FfoJw Memento: Time Travel for the Web Herbert Van de Sompel, Michael L. Nelson Library of Congress, Washington, DC - November 16 2009