SlideShare a Scribd company logo
1 of 24
Download to read offline
My Point of View

   Michael L. Nelson
Old Dominion University


       My Point of View: Michael L. Nelson
 Web Archiving Cooperative, Stanford, Sep 09 2010
Observations
• We are pretty good at archiving the web
  of five years ago, but not the web of
  today
• There are separate, shadow webs that
  we are not archiving
• Archiving should be a service with
  short-term utility

                My Point of View: Michael L. Nelson
          Web Archiving Cooperative, Stanford, Sep 09 2010
Ajax = #noarchive




http://web.archive.org/web/*/http://maps.google.com/
http://web.archive.org/web/20091026210613/http://maps.google.com/
http://web.archive.org/web/20091026210613/http://maps.google.com/?output=html&oi=slow

                                                 My Point of View: Michael L. Nelson
                                           Web Archiving Cooperative, Stanford, Sep 09 2010
Reaching Out From the Archive

                                             % grep Host: cnn-ia-headers | wc -l
                                               288
                                             % grep Host: cnn-ia-headers | grep -v archive.org | wc -l
                                               117
                                             % grep Host: cnn-ia-headers | grep -v archive.org | sort -u
                                             Host: ad.doubleclick.net
                                             Host: ads.adsonar.com
                                             Host: ads.cnn.com
                                             Host: aranet.vo.llnwd.net
                                             Host: b.scorecardresearch.com
                                             Host: bs.serving-sys.com
                                             Host: cnn.dyn.cnn.com
                                             Host: ds.serving-sys.com
                                             Host: gdyn.cnn.com
                                             Host: i.cdn.turner.com
                                             Host: i2.cdn.turner.com
                                             Host: js.adsonar.com
                                             Host: metrics.cnn.com
                                             Host: pix04.revsci.net
                                             Host: s0.2mdn.net
                                             Host: symbolcomplete.marketwatch.com
                                             Host: www.adfusion.com
  http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html


                               My Point of View: Michael L. Nelson
                         Web Archiving Cooperative, Stanford, Sep 09 2010
Reaching Through Time
                                 % grep "^GET /web/20.*HTTP/1.1" cnn-ia-headers | awk -F"/" '{print $3}' | sort -u
                                 20091026133351js_
                                 20091026133356
                                 20091026133359js_       first was: 2009-10-26 13:33:51
                                 20091026133425
                                 20091026133427           root was: 2009-10-27 04:33:08
                                 20091026133430js_
                                 20091026133438           end was: 2009-10-27 22:47:45
                                 20091026133441
                                 20091026133443           root - first ~= 15 hours
                                 20091026133446
                                 20091026133448           end - first ~= 23 hours
                                 …[deletia]…
                                 20091027220018
                                 20091027220027
                                 20091027220237
                                 20091027220248
                                 20091027224745
                                 20100923125259          ???
                                 20100923125330          ???



http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html


                             My Point of View: Michael L. Nelson
                       Web Archiving Cooperative, Stanford, Sep 09 2010
Embedded Resources




29 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.youtube.com/user/wichitarecordings


                                               My Point of View: Michael L. Nelson
                                         Web Archiving Cooperative, Stanford, Sep 09 2010
Personalized Resources

GET / HTTP/1.1
Host: bit.ly
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=126736798.4156477295523165000.1251253806.1285119293.1285122783.59; _bit=4c20df7a-003a5-07baf-
91a08fa8;anon_u=cHN1X19jN2MwNjcxZC05MWNiLTQ3MmEtOGIxYy1hZDMyMWRlNzc1OTU=|1284997489|06ac0cefc8ac36
9e0f9849b5fdfbbe8d077d0c65; user=cGhvbmVkdWRl|1284997489|fdb7f02cacb3cb44416f54d83f3237ec0f7bd9b5;
__utmz=126736798.1280940647.33.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _chartbeat2=ciuph6qrso6tn6w7;
_xsrf=49bc661fc02845b3bcbe975d7c2f28de; __utmb=126736798.3.10.1285122783; __utmc=126736798




                                  My Point of View: Michael L. Nelson
                            Web Archiving Cooperative, Stanford, Sep 09 2010
Geolocated Resources
     % curl -I http://www.craigslist.org
     HTTP/1.1 302 Found
     Set-Cookie: cl_b=12851300231056905752;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT
     Location: http://geo.craigslist.org/

     % curl -I http://geo.craigslist.org/
     HTTP/1.1 302 Found
     Content-Type: text/html; charset=iso-8859-1
     Connection: close
     Location: http://norfolk.craigslist.org
     Date: Wed, 22 Sep 2010 04:33:56 GMT
     Set-Cookie: cl_b=12851300363085180962;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT
     Server: Apache

     % traceroute geo.craigslist.org
     traceroute to geo.craigslist.org (208.82.236.208), 64 hops max, 40 byte packets
      1 ***
      2 10.5.120.1 (10.5.120.1) 9.959 ms 23.004 ms 13.208 ms
      3 nrfksysr02-atm151208.hr.hr.cox.net (68.10.8.117) 10.056 ms 10.561 ms 19.970 ms
      4 nrfkdsrj01-ge500.0.rd.hr.cox.net (68.10.14.13) 11.142 ms 20.618 ms 10.293 ms
      5 ashbbprj02-ae4.0.rd.as.cox.net (68.1.1.232) 15.368 ms 68.854 ms 20.153 ms
      6 xe-3-0-0.cr2.dca2.above.net (64.125.26.241) 18.963 ms 23.674 ms 32.977 ms
      7 xe-2-2-0.cr2.iah1.us.above.net (64.125.30.53) 46.201 ms 56.156 ms 46.783 ms
      8 xe-1-1-0.mpr4.phx2.us.above.net (64.125.28.73) 82.616 ms 82.289 ms 84.383 ms
      9 * 64.124.178.62.allocated.above.net (64.124.178.62) 80.893 ms 78.786 ms
     10 511.ae9.ecore1p.craigslist.org (208.82.239.102) 95.958 ms 86.160 ms 90.115 ms
     11 www.craigslist.org (208.82.236.208) 80.968 ms 91.470 ms 80.110 ms



         My Point of View: Michael L. Nelson
   Web Archiving Cooperative, Stanford, Sep 09 2010
Social Resources




http://www.flickr.com/photos/mic_n_2_sugars/84882320/
1 Memento: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.flickr.com/photos/mic_n_2_sugars/84882320/
http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg (Last-Modified: 10 Jan 2006…)
0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg

                                    My Point of View: Michael L. Nelson
                              Web Archiving Cooperative, Stanford, Sep 09 2010
Shadow Web: Mobile




46 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://twitter.com/timoreilly
0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://mobile.twitter.com/timoreilly

                               My Point of View: Michael L. Nelson
                         Web Archiving Cooperative, Stanford, Sep 09 2010
Shadow Web: Mobile




15,000+ Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.cnn.com/
46 Mementos:      http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://m.cnn.com/

                              My Point of View: Michael L. Nelson
                        Web Archiving Cooperative, Stanford, Sep 09 2010
Shadow Web: Linked Data

                                                                            (this resource intentionally left blank)




http://en.wikipedia.org/wiki/DJ_Shadow                                     http://dbpedia.org/resource/DJ_Shadow


                                                                     Accept: text/html            Accept: application/rdf+xml




                                              http://dbpedia.org/page/DJ_Shadow                      http://dbpedia.org/data/DJ_Shadow


                      2 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/resource/DJ_Shadow
                      0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/data/DJ_Shadow
                      0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/page/DJ_Shadow

                                                  My Point of View: Michael L. Nelson
                                            Web Archiving Cooperative, Stanford, Sep 09 2010
Archive Discovery
% curl -I http://dbpedia.org/resource/DJ_Shadow
HTTP/1.1 303 See Other
Date: Wed, 22 Sep 2010 04:13:16 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Server: Virtuoso/06.02.3128 (Linux) x86_64-generic-linux-glibc25-64 VDB
Accept-Ranges: bytes
Location: http://dbpedia.org/page/DJ_Shadow
Content-Length: 0
Set-Cookie: uid=wm2BOkyZglwm1zEBBv2+Ag==; expires=Sat, 02-Oct-10 04:13:16 GMT; domain=dbpedia.org; path=/
P3P: policyref="/w3c/p3p.xml", CP="CUR ADM OUR NOR STA NID"

                                                       DBpedia archive now hosted @ LANL:
% curl -I http://dbpedia.org/page/DJ_Shadow
                                                       http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/DJ_Shadow
HTTP/1.1 200 OK
                                                       http://mementoarchive.lanl.gov/dbpedia/timemap/rdf/http://dbpedia.org/page/DJ_Shadow
Date: Wed, 22 Sep 2010 04:23:15 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
Server: Virtuoso/06.02.3128 (Linux) x86_64-generic-linux-glibc25-64 VDB
Expires: Wed, 29 Sep 2010 03:39:43 GMT
Link: <http://dbpedia.org/data/DJ_Shadow.rdf>; rel="alternate"; type="application/rdf+xml"; title="Structured Descriptor Document (RDF/XML format)",
 <http://dbpedia.org/data/DJ_Shadow.n3>; rel="alternate"; type="text/n3"; title="Structured Descriptor Document (N3/Turtle format)",
 <http://dbpedia.org/data/DJ_Shadow.json>; rel="alternate"; type="application/json"; title="Structured Descriptor Document (RDF/JSON format)",
 <http://dbpedia.org/data/DJ_Shadow.atom>; rel="alternate"; type="application/atom+xml"; title="OData (Atom+Feed format)",
 <http://dbpedia.org/resource/DJ_Shadow>; rel="http://xmlns.com/foaf/0.1/primaryTopic",
 <http://dbpedia.org/resource/DJ_Shadow>; rev="describedby",
 <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/DJ_Shadow>; rel="timegate"
Content-Length: 60711
Set-Cookie: uid=wm2BOkyZhLOgQzDjB4JHAg==; expires=Sat, 02-Oct-10 04:23:15 GMT; domain=dbpedia.org; path=/
P3P: policyref="/w3c/p3p.xml", CP="CUR ADM OUR NOR STA NID"
Accept-Ranges: bytes



                                             My Point of View: Michael L. Nelson
                                       Web Archiving Cooperative, Stanford, Sep 09 2010
Decontextualized
 Resources…




       My Point of View: Michael L. Nelson
 Web Archiving Cooperative, Stanford, Sep 09 2010
Original Resource



                                                                       from these we can
                                                                       create time-based:
                                                                       • indexes
                                                                       • IDF values
                                                                       • PageRank




http://web.archive.org/web/*/http://www.thecribs.com/
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.thecribs.com/



                     My Point of View: Michael L. Nelson
               Web Archiving Cooperative, Stanford, Sep 09 2010
Tagging




http://www.delicious.com/url/4c858cef7188a51bfb3b80b3011cbed8
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.delicious.com/url/4c858cef7188a51bfb3b80b3011cbed8




                          My Point of View: Michael L. Nelson
                    Web Archiving Cooperative, Stanford, Sep 09 2010
Tweeting




http://twitter.com/#search?q=%23thecribs
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://twitter.com/#search?q=%23thecribs (false Mementos!)




                        My Point of View: Michael L. Nelson
                  Web Archiving Cooperative, Stanford, Sep 09 2010
Searching




http://www.google.com/trends?q=the+cribs
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.google.com/trends?q=the+cribs




                       My Point of View: Michael L. Nelson
                 Web Archiving Cooperative, Stanford, Sep 09 2010
Analytics




http://websiteindepth.com/www.thecribs.com
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://websiteindepth.com/www.thecribs.com




                       My Point of View: Michael L. Nelson
                 Web Archiving Cooperative, Stanford, Sep 09 2010
Non-Archiving Made Easy…




           My Point of View: Michael L. Nelson
     Web Archiving Cooperative, Stanford, Sep 09 2010
Batch Recovery For Sites




             http://warrick.cs.odu.edu/
           My Point of View: Michael L. Nelson
     Web Archiving Cooperative, Stanford, Sep 09 2010
Real-Time Recovery for URIs




           Synchronicity - www.cs.odu.edu/~mklein/
            My Point of View: Michael L. Nelson
      Web Archiving Cooperative, Stanford, Sep 09 2010
Useful, Now



                                           RT, @, #, bit.ly, tiny.cc…
                                           users are willing to endure an
                                           appalling level of syntax if there
                                           is a clear and present benefit…




      My Point of View: Michael L. Nelson
Web Archiving Cooperative, Stanford, Sep 09 2010
Closing Thoughts


Preservation not for                                                                                   no more hoary stories
privileged priesthood                                                                                  about format obsolescence:
                                                                                                       http://blog.dshr.org/2010/09/reinforcing-my-point.html
http://doi.acm.org/10.1145/1592761.1592794
http://booktwo.org/notebook/wikipedia-historiography/




                                                archiving as branded service,
                                                not infrastructure
                                                http://blog.dshr.org/2010/06/jcdl-2010-keynote.html




Don't dessicate resources;                                                                            Endless metadata is not
leave them on the web                                                                                 preservation…


                                            My Point of View: Michael L. Nelson
                                      Web Archiving Cooperative, Stanford, Sep 09 2010

More Related Content

What's hot

Virtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetVirtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetOmar Reygaert
 
Using docker for data science - part 2
Using docker for data science - part 2Using docker for data science - part 2
Using docker for data science - part 2Calvin Giles
 
Sevillajs: Una tarde con Firefox OS
Sevillajs: Una tarde con Firefox OSSevillajs: Una tarde con Firefox OS
Sevillajs: Una tarde con Firefox OSFrancisco Jordano
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data scienceCalvin Giles
 
mapserver_install_linux
mapserver_install_linuxmapserver_install_linux
mapserver_install_linuxtutorialsruby
 
Py conkr 20150829_docker-python
Py conkr 20150829_docker-pythonPy conkr 20150829_docker-python
Py conkr 20150829_docker-pythonEric Ahn
 
Multisite Van Dyk Walkah
Multisite Van Dyk WalkahMultisite Van Dyk Walkah
Multisite Van Dyk Walkahjvandyk
 

What's hot (7)

Virtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetVirtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + Puppet
 
Using docker for data science - part 2
Using docker for data science - part 2Using docker for data science - part 2
Using docker for data science - part 2
 
Sevillajs: Una tarde con Firefox OS
Sevillajs: Una tarde con Firefox OSSevillajs: Una tarde con Firefox OS
Sevillajs: Una tarde con Firefox OS
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data science
 
mapserver_install_linux
mapserver_install_linuxmapserver_install_linux
mapserver_install_linux
 
Py conkr 20150829_docker-python
Py conkr 20150829_docker-pythonPy conkr 20150829_docker-python
Py conkr 20150829_docker-python
 
Multisite Van Dyk Walkah
Multisite Van Dyk WalkahMultisite Van Dyk Walkah
Multisite Van Dyk Walkah
 

Viewers also liked

Music Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTubeMusic Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTubeMichael Nelson
 
A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"Michael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the WebMichael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the WebMichael Nelson
 
Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Michael Nelson
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?Michael Nelson
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMichael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the WebMichael Nelson
 
Review of Web Archiving
Review of Web ArchivingReview of Web Archiving
Review of Web ArchivingMichael Nelson
 
The Open Archives Initiative
The Open Archives InitiativeThe Open Archives Initiative
The Open Archives InitiativeMichael Nelson
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web PagesMichael Nelson
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready WebMichael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange ProjectMichael Nelson
 

Viewers also liked (16)

Music Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTubeMusic Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTube
 
A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Review of Web Archiving
Review of Web ArchivingReview of Web Archiving
Review of Web Archiving
 
The Open Archives Initiative
The Open Archives InitiativeThe Open Archives Initiative
The Open Archives Initiative
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 

Similar to My Point of View: Michael L. Nelson Web Archiving Cooperative

Seaside Portability
Seaside PortabilitySeaside Portability
Seaside Portabilityjfitzell
 
5 steps to faster web sites & HTML5 games - updated for DDDscot
5 steps to faster web sites & HTML5 games - updated for DDDscot5 steps to faster web sites & HTML5 games - updated for DDDscot
5 steps to faster web sites & HTML5 games - updated for DDDscotMichael Ewins
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?Andy Davies
 
#vBrownBag OpenStack - Review & Kickoff for Phase 2
#vBrownBag OpenStack - Review & Kickoff for Phase 2#vBrownBag OpenStack - Review & Kickoff for Phase 2
#vBrownBag OpenStack - Review & Kickoff for Phase 2ProfessionalVMware
 
Getting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowGetting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowAaron Peters
 
Common Pitfalls for your Drupal Site, and How to Avoid Them
Common Pitfalls for your Drupal Site, and How to Avoid ThemCommon Pitfalls for your Drupal Site, and How to Avoid Them
Common Pitfalls for your Drupal Site, and How to Avoid ThemAcquia
 
前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpo前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpoMichael Zhang
 
Velocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youVelocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youPatrick Meenan
 
Ensemble oscon 2011
Ensemble oscon 2011Ensemble oscon 2011
Ensemble oscon 2011OSCON Byrum
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)nikomatsakis
 
Symfony - modern technology in practice, Webexpo Prague
Symfony - modern technology in practice, Webexpo PragueSymfony - modern technology in practice, Webexpo Prague
Symfony - modern technology in practice, Webexpo PraguePavel Campr
 
5 Steps to Faster Web Sites and HTML5 Games
5 Steps to Faster Web Sites and HTML5 Games5 Steps to Faster Web Sites and HTML5 Games
5 Steps to Faster Web Sites and HTML5 GamesMichael Ewins
 
Caching Up and Down the Stack
Caching Up and Down the StackCaching Up and Down the Stack
Caching Up and Down the StackDan Kuebrich
 
REST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzREST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzAlessandro Nadalin
 
Art and Science of Web Sites Performance: A Front-end Approach
Art and Science of Web Sites Performance: A Front-end ApproachArt and Science of Web Sites Performance: A Front-end Approach
Art and Science of Web Sites Performance: A Front-end ApproachJiang Zhu
 
Running Docker in Development & Production (#ndcoslo 2015)
Running Docker in Development & Production (#ndcoslo 2015)Running Docker in Development & Production (#ndcoslo 2015)
Running Docker in Development & Production (#ndcoslo 2015)Ben Hall
 
IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."
IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."
IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."Dongwook Lee
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersjudell
 

Similar to My Point of View: Michael L. Nelson Web Archiving Cooperative (20)

Seaside Portability
Seaside PortabilitySeaside Portability
Seaside Portability
 
5 steps to faster web sites & HTML5 games - updated for DDDscot
5 steps to faster web sites & HTML5 games - updated for DDDscot5 steps to faster web sites & HTML5 games - updated for DDDscot
5 steps to faster web sites & HTML5 games - updated for DDDscot
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?
 
#vBrownBag OpenStack - Review & Kickoff for Phase 2
#vBrownBag OpenStack - Review & Kickoff for Phase 2#vBrownBag OpenStack - Review & Kickoff for Phase 2
#vBrownBag OpenStack - Review & Kickoff for Phase 2
 
An API Your Parents Would Be Proud Of
An API Your Parents Would Be Proud OfAn API Your Parents Would Be Proud Of
An API Your Parents Would Be Proud Of
 
Getting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowGetting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and How
 
Common Pitfalls for your Drupal Site, and How to Avoid Them
Common Pitfalls for your Drupal Site, and How to Avoid ThemCommon Pitfalls for your Drupal Site, and How to Avoid Them
Common Pitfalls for your Drupal Site, and How to Avoid Them
 
前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpo前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpo
 
Velocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youVelocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and you
 
Ensemble oscon 2011
Ensemble oscon 2011Ensemble oscon 2011
Ensemble oscon 2011
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)
 
Symfony - modern technology in practice, Webexpo Prague
Symfony - modern technology in practice, Webexpo PragueSymfony - modern technology in practice, Webexpo Prague
Symfony - modern technology in practice, Webexpo Prague
 
5 Steps to Faster Web Sites and HTML5 Games
5 Steps to Faster Web Sites and HTML5 Games5 Steps to Faster Web Sites and HTML5 Games
5 Steps to Faster Web Sites and HTML5 Games
 
Caching Up and Down the Stack
Caching Up and Down the StackCaching Up and Down the Stack
Caching Up and Down the Stack
 
REST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzREST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in Mainz
 
Art and Science of Web Sites Performance: A Front-end Approach
Art and Science of Web Sites Performance: A Front-end ApproachArt and Science of Web Sites Performance: A Front-end Approach
Art and Science of Web Sites Performance: A Front-end Approach
 
Running Docker in Development & Production (#ndcoslo 2015)
Running Docker in Development & Production (#ndcoslo 2015)Running Docker in Development & Production (#ndcoslo 2015)
Running Docker in Development & Production (#ndcoslo 2015)
 
IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."
IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."
IBM dwLive, "Internet & HTTP - 잃어버린 패킷을 찾아서..."
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 

More from Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 

My Point of View: Michael L. Nelson Web Archiving Cooperative

  • 1. My Point of View Michael L. Nelson Old Dominion University My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 2. Observations • We are pretty good at archiving the web of five years ago, but not the web of today • There are separate, shadow webs that we are not archiving • Archiving should be a service with short-term utility My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 4. Reaching Out From the Archive % grep Host: cnn-ia-headers | wc -l 288 % grep Host: cnn-ia-headers | grep -v archive.org | wc -l 117 % grep Host: cnn-ia-headers | grep -v archive.org | sort -u Host: ad.doubleclick.net Host: ads.adsonar.com Host: ads.cnn.com Host: aranet.vo.llnwd.net Host: b.scorecardresearch.com Host: bs.serving-sys.com Host: cnn.dyn.cnn.com Host: ds.serving-sys.com Host: gdyn.cnn.com Host: i.cdn.turner.com Host: i2.cdn.turner.com Host: js.adsonar.com Host: metrics.cnn.com Host: pix04.revsci.net Host: s0.2mdn.net Host: symbolcomplete.marketwatch.com Host: www.adfusion.com http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 5. Reaching Through Time % grep "^GET /web/20.*HTTP/1.1" cnn-ia-headers | awk -F"/" '{print $3}' | sort -u 20091026133351js_ 20091026133356 20091026133359js_ first was: 2009-10-26 13:33:51 20091026133425 20091026133427 root was: 2009-10-27 04:33:08 20091026133430js_ 20091026133438 end was: 2009-10-27 22:47:45 20091026133441 20091026133443 root - first ~= 15 hours 20091026133446 20091026133448 end - first ~= 23 hours …[deletia]… 20091027220018 20091027220027 20091027220237 20091027220248 20091027224745 20100923125259 ??? 20100923125330 ??? http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 6. Embedded Resources 29 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.youtube.com/user/wichitarecordings My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 7. Personalized Resources GET / HTTP/1.1 Host: bit.ly User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Cookie: __utma=126736798.4156477295523165000.1251253806.1285119293.1285122783.59; _bit=4c20df7a-003a5-07baf- 91a08fa8;anon_u=cHN1X19jN2MwNjcxZC05MWNiLTQ3MmEtOGIxYy1hZDMyMWRlNzc1OTU=|1284997489|06ac0cefc8ac36 9e0f9849b5fdfbbe8d077d0c65; user=cGhvbmVkdWRl|1284997489|fdb7f02cacb3cb44416f54d83f3237ec0f7bd9b5; __utmz=126736798.1280940647.33.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _chartbeat2=ciuph6qrso6tn6w7; _xsrf=49bc661fc02845b3bcbe975d7c2f28de; __utmb=126736798.3.10.1285122783; __utmc=126736798 My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 8. Geolocated Resources % curl -I http://www.craigslist.org HTTP/1.1 302 Found Set-Cookie: cl_b=12851300231056905752;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT Location: http://geo.craigslist.org/ % curl -I http://geo.craigslist.org/ HTTP/1.1 302 Found Content-Type: text/html; charset=iso-8859-1 Connection: close Location: http://norfolk.craigslist.org Date: Wed, 22 Sep 2010 04:33:56 GMT Set-Cookie: cl_b=12851300363085180962;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT Server: Apache % traceroute geo.craigslist.org traceroute to geo.craigslist.org (208.82.236.208), 64 hops max, 40 byte packets 1 *** 2 10.5.120.1 (10.5.120.1) 9.959 ms 23.004 ms 13.208 ms 3 nrfksysr02-atm151208.hr.hr.cox.net (68.10.8.117) 10.056 ms 10.561 ms 19.970 ms 4 nrfkdsrj01-ge500.0.rd.hr.cox.net (68.10.14.13) 11.142 ms 20.618 ms 10.293 ms 5 ashbbprj02-ae4.0.rd.as.cox.net (68.1.1.232) 15.368 ms 68.854 ms 20.153 ms 6 xe-3-0-0.cr2.dca2.above.net (64.125.26.241) 18.963 ms 23.674 ms 32.977 ms 7 xe-2-2-0.cr2.iah1.us.above.net (64.125.30.53) 46.201 ms 56.156 ms 46.783 ms 8 xe-1-1-0.mpr4.phx2.us.above.net (64.125.28.73) 82.616 ms 82.289 ms 84.383 ms 9 * 64.124.178.62.allocated.above.net (64.124.178.62) 80.893 ms 78.786 ms 10 511.ae9.ecore1p.craigslist.org (208.82.239.102) 95.958 ms 86.160 ms 90.115 ms 11 www.craigslist.org (208.82.236.208) 80.968 ms 91.470 ms 80.110 ms My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 9. Social Resources http://www.flickr.com/photos/mic_n_2_sugars/84882320/ 1 Memento: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.flickr.com/photos/mic_n_2_sugars/84882320/ http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg (Last-Modified: 10 Jan 2006…) 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 10. Shadow Web: Mobile 46 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://twitter.com/timoreilly 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://mobile.twitter.com/timoreilly My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 11. Shadow Web: Mobile 15,000+ Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.cnn.com/ 46 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://m.cnn.com/ My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 12. Shadow Web: Linked Data (this resource intentionally left blank) http://en.wikipedia.org/wiki/DJ_Shadow http://dbpedia.org/resource/DJ_Shadow Accept: text/html Accept: application/rdf+xml http://dbpedia.org/page/DJ_Shadow http://dbpedia.org/data/DJ_Shadow 2 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/resource/DJ_Shadow 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/data/DJ_Shadow 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/page/DJ_Shadow My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 13. Archive Discovery % curl -I http://dbpedia.org/resource/DJ_Shadow HTTP/1.1 303 See Other Date: Wed, 22 Sep 2010 04:13:16 GMT Content-Type: text/html; charset=UTF-8 Connection: keep-alive Server: Virtuoso/06.02.3128 (Linux) x86_64-generic-linux-glibc25-64 VDB Accept-Ranges: bytes Location: http://dbpedia.org/page/DJ_Shadow Content-Length: 0 Set-Cookie: uid=wm2BOkyZglwm1zEBBv2+Ag==; expires=Sat, 02-Oct-10 04:13:16 GMT; domain=dbpedia.org; path=/ P3P: policyref="/w3c/p3p.xml", CP="CUR ADM OUR NOR STA NID" DBpedia archive now hosted @ LANL: % curl -I http://dbpedia.org/page/DJ_Shadow http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/DJ_Shadow HTTP/1.1 200 OK http://mementoarchive.lanl.gov/dbpedia/timemap/rdf/http://dbpedia.org/page/DJ_Shadow Date: Wed, 22 Sep 2010 04:23:15 GMT Content-Type: text/html; charset=UTF-8 Connection: keep-alive Vary: Accept-Encoding Server: Virtuoso/06.02.3128 (Linux) x86_64-generic-linux-glibc25-64 VDB Expires: Wed, 29 Sep 2010 03:39:43 GMT Link: <http://dbpedia.org/data/DJ_Shadow.rdf>; rel="alternate"; type="application/rdf+xml"; title="Structured Descriptor Document (RDF/XML format)", <http://dbpedia.org/data/DJ_Shadow.n3>; rel="alternate"; type="text/n3"; title="Structured Descriptor Document (N3/Turtle format)", <http://dbpedia.org/data/DJ_Shadow.json>; rel="alternate"; type="application/json"; title="Structured Descriptor Document (RDF/JSON format)", <http://dbpedia.org/data/DJ_Shadow.atom>; rel="alternate"; type="application/atom+xml"; title="OData (Atom+Feed format)", <http://dbpedia.org/resource/DJ_Shadow>; rel="http://xmlns.com/foaf/0.1/primaryTopic", <http://dbpedia.org/resource/DJ_Shadow>; rev="describedby", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/DJ_Shadow>; rel="timegate" Content-Length: 60711 Set-Cookie: uid=wm2BOkyZhLOgQzDjB4JHAg==; expires=Sat, 02-Oct-10 04:23:15 GMT; domain=dbpedia.org; path=/ P3P: policyref="/w3c/p3p.xml", CP="CUR ADM OUR NOR STA NID" Accept-Ranges: bytes My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 14. Decontextualized Resources… My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 15. Original Resource from these we can create time-based: • indexes • IDF values • PageRank http://web.archive.org/web/*/http://www.thecribs.com/ http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.thecribs.com/ My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 20. Non-Archiving Made Easy… My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 21. Batch Recovery For Sites http://warrick.cs.odu.edu/ My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 22. Real-Time Recovery for URIs Synchronicity - www.cs.odu.edu/~mklein/ My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 23. Useful, Now RT, @, #, bit.ly, tiny.cc… users are willing to endure an appalling level of syntax if there is a clear and present benefit… My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010
  • 24. Closing Thoughts Preservation not for no more hoary stories privileged priesthood about format obsolescence: http://blog.dshr.org/2010/09/reinforcing-my-point.html http://doi.acm.org/10.1145/1592761.1592794 http://booktwo.org/notebook/wikipedia-historiography/ archiving as branded service, not infrastructure http://blog.dshr.org/2010/06/jcdl-2010-keynote.html Don't dessicate resources; Endless metadata is not leave them on the web preservation… My Point of View: Michael L. Nelson Web Archiving Cooperative, Stanford, Sep 09 2010