SlideShare une entreprise Scribd logo
1  sur  34
An Evaluation of Caching Policies for
Memento TimeMaps
Justin F. Brunelle and Michael L. Nelson
Old Dominion University
{jbrunelle, mln}@cs.odu.edu
JCDL 2013
Indianapolis, Indiana
07/2013
Discovering Archived
nasa.gov Pages
Archived Pages => mementos
Mementos identified by URI-M
Live Pages => resources
Resources identified by URI-R 2
3
TimeMaps: Lists of mementos
<http://mementoproxy.lanl.gov/aggr/timegate/http://www.nasa.gov/>;rel="timegate", <http://www.nasa.gov/>;rel="original",
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT",
<http://api.wayback.archive.org/memento/19970605230559/http://www.nasa.gov/>;rel="memento";datetime="Thu, 05 Jun 1997 23:05:59 GMT",
<http://api.wayback.archive.org/memento/19970711094601/http://www.nasa.gov/>;rel="memento";datetime="Fri, 11 Jul 1997 09:46:01 GMT",
<http://api.wayback.archive.org/memento/19981202170636/http://www.nasa.gov/>;rel="memento";datetime="Wed, 02 Dec 1998 17:06:36 GMT",
<http://api.wayback.archive.org/memento/19981212031235/http://www.nasa.gov/>;rel="memento";datetime="Sat, 12 Dec 1998 03:12:35 GMT",
<http://api.wayback.archive.org/memento/19990116233500/http://nasa.gov/>;rel="memento";datetime="Sat, 16 Jan 1999 23:35:00 GMT",
<http://api.wayback.archive.org/memento/19990117063022/http://nasa.gov/>;rel="memento";datetime="Sun, 17 Jan 1999 06:30:22 GMT",
<http://api.wayback.archive.org/memento/19990125091025/http://nasa.gov/>;rel="memento";datetime="Mon, 25 Jan 1999 09:10:25 GMT",
<http://api.wayback.archive.org/memento/19990203005545/http://nasa.gov/>;rel="memento";datetime="Wed, 03 Feb 1999 00:55:45 GMT",
<http://api.wayback.archive.org/memento/20080903053412/http://www.nasa.gov/>;rel="memento";datetime="Wed, 03 Sep 2008 05:34:12 GMT",
<http://webarchive.nationalarchives.gov.uk/20080904014810/http://www.nasa.gov/>;rel="memento";datetime="Thu, 04 Sep 2008 00:00:00 GMT",
<http://api.wayback.archive.org/memento/20080904055742/http://www.nasa.gov/>;rel="memento";datetime="Thu, 04 Sep 2008 05:57:42 GMT",
<http://webarchive.nationalarchives.gov.uk/20080906134025/http://www.nasa.gov/>;rel="memento";datetime="Sat, 06 Sep 2008 00:00:00 GMT",
<http://api.wayback.archive.org/memento/20080906143204/http://www.nasa.gov/>;rel="memento";datetime="Sat, 06 Sep 2008 14:32:04 GMT",
<http://webarchive.nationalarchives.gov.uk/20080907124040/http://www.nasa.gov/>;rel="memento";datetime="Sun, 07 Sep 2008 00:00:00 GMT",
<http://api.wayback.archive.org/memento/20080907160232/http://www.nasa.gov/>;rel="memento";datetime="Sun, 07 Sep 2008 16:02:32 GMT",
<http://webarchive.nationalarchives.gov.uk/20120809003120/http://www.nasa.gov/>;rel="memento";datetime="Thu, 09 Aug 2012 00:00:00 GMT",
<http://webarchive.nationalarchives.gov.uk/20120814175606/http://www.nasa.gov/>;rel="memento";datetime="Tue, 14 Aug 2012 00:00:00 GMT",
<http://webarchive.nationalarchives.gov.uk/20120819212348/http://www.nasa.gov/>;rel="memento";datetime="Sun, 19 Aug 2012 00:00:00 GMT",
<http://webarchive.nationalarchives.gov.uk/20120826185010/http://www.nasa.gov/>;rel="memento";datetime="Sun, 26 Aug 2012 00:00:00 GMT",
<http://webarchive.nationalarchives.gov.uk/20120909230516/http://www.nasa.gov/>;rel="last memento";datetime="Sun, 09 Sep 2012 00:00:00 GMT"
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>
;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT"
http://webarchive.nationalarchives.gov.uk/20080907124040/http://www.nasa.gov/
;rel="memento";datetime="Sun, 07 Sep 2008 00:00:00 GMT",
4
Aggregating TimeMapes
• Multiple archives
• Expensive
• Caching reduces
load on archives
• Write-through
Cache
Aggre-
gator
Sort
IA TM
AIT TM
HTTP
Cache
…
5
Aggregator Cache
• TimeMaps change
• Only want to cache better TimeMaps
– Bigger is better
• Ideally monotonically increasing
• Two extremes:
– Never cache (TTL=0)
– Never update in cache (TTL=92)
6
Agenda
7
Cache content measures
• |a| => # of archives
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>
;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT”,
• |m| => # of mementos
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>
;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT”,
8
Same TimeMap
• |a| == |a'|
• |m| == |m'|
All archives have reported the same mementos.
TimeMap T
9
mm mm
mm
TimeMap T'
mm mm
mm
|a| = 2; |m| = 3 |a| = 2; |m| = 3
Gained Archives, Gained Mementos
• |a| < |a`|
• |m| < |m`|
A new archive (WebCite) has just indexed and
reported a memento for the first time.
10
TimeMap T
mm mm
mm
TimeMap T'
mm mm
mm
mm
|a| = 2; |m| = 3 |a| = 3; |m| = 4
• |a| == |a`|
• |m| < |m`|
The Internet Archive has released a set of new
mementos.
11
TimeMap T
mm mm
mm
TimeMap T'
mm mm
mm mm
Same Archives, Gained Mementos
|a| = 2; |m| = 3 |a| = 2; |m| = 4
Lost Archives, Same Mementos
• |a| > |a`|
• |m| == |m`|
A redaction of 1 memento took place in the Internet Archive which
now does not report mementos for this resource.
The UK Web Archive has released 1 new memento for this resource.
1212
TimeMap T '
mm mm
mm
TimeMap T
mm
mm
mm
|a| = 3; |m| = 3 |a| = 2; |m| = 3
Lost Archives, Gained Mementos
• |a| > |a`|
• |m| < |m`|
A redaction of 2 mementos took place in the Internet Archive which
now does not report mementos for this resource.
The UK Government Web Archive has released 3 new mementos for
this resource.
13
TimeMap T
mm mm
mm
TimeMap T'
mm
mmmm
mm
|a| = 2; |m| = 3 |a| = 1; |m| = 4
Lost Archives, Lost Mementos
• |a| > |a`|
• |m| > |m`|
Archive-It has removed a collection, and no longer reports
those mementos. No other archives have new mementos
of those resources.
14
TimeMap T
mm mm
mm
TimeMap T'
mm
|a| = 2; |m| = 3 |a| = 1; |m| = 1
Gained Archives, Lost Mementos
• |a| < |a`|
• |m| > |m`|
A new archive (WebCite) has just indexed and reported 1 memento for
the first time.
A server error at the Internet Archive caused an omission of 2
mementos.
15
TimeMap T
mm mm
mm
|a| = 2; |m| = 4
TimeMap T'
mm
mm
mm
|a| = 3; |m| = 3
mm
Agenda
16
Experiment Design
• Eliminate caching from local Memento proxies
• Daily observations of 4,000 TimeMaps for 92 days in 2013
• TimeMaps analyzed for changes & cardinality
• Investigated caching policies
• Outages observed from Memento/archives/department
17
Observations
Occurrence Description Action
77.4% Unchanged TimeMap Do not update cache
19.7% Lost archives, lost mementos Do not update cache
2.4% Gained archives, gained mementos Update cache
0.4% Same archives, gained mementos Update cache
0.1% Gained archives, lost mementos Do not update cache
0.01% Lost archives, same mementos Update cache
0.01% Lost archives, gained mementos Update cache
18
Impact of Change in TimeMaps
• Caching transient errors
– Not returned or not archived?
19
Cardinality of TimeMaps
<http://mementoproxy.lanl.gov/aggr/timegate/http://www.nasa.gov/>;rel="timegate", <http://www.nasa.gov/>;rel="original",
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT",
<http://api.wayback.archive.org/memento/19970605230559/http://www.nasa.gov/>;rel="memento";datetime="Thu, 05 Jun 1997 23:05:59 GMT",
<http://api.wayback.archive.org/memento/19970711094601/http://www.nasa.gov/>;rel="memento";datetime="Fri, 11 Jul 1997 09:46:01 GMT",
<http://api.wayback.archive.org/memento/19981202170636/http://www.nasa.gov/>;rel="memento";datetime="Wed, 02 Dec 1998 17:06:36 GMT",
<http://api.wayback.archive.org/memento/19981212031235/http://www.nasa.gov/>;rel="memento";datetime="Sat, 12 Dec 1998 03:12:35 GMT",
<http://api.wayback.archive.org/memento/19990116233500/http://nasa.gov/>;rel="memento";datetime="Sat, 16 Jan 1999 23:35:00 GMT",
<http://api.wayback.archive.org/memento/19990117063022/http://nasa.gov/>;rel="memento";datetime="Sun, 17 Jan 1999 06:30:22 GMT",
<http://api.wayback.archive.org/memento/19990125091025/http://nasa.gov/>;rel="memento";datetime="Mon, 25 Jan 1999 09:10:25 GMT",
<http://api.wayback.archive.org/memento/19990203005545/http://nasa.gov/>;rel="memento";datetime="Wed, 03 Feb 1999 00:55:45 GMT",
…
|TM| ?
20
Strict vs. Loose Matching
• Different archive, URI-M, datetime- Strict: 2, Loose: 2
<http://api.wayback.archive.org/memento/20080509125659/http://flare.prefuse.org/>;rel="memento";
datetime="Fri, 09 May 2008 12:56:59 GMT",
<http://webarchive.nationalarchives.gov.uk/20080908074106/http://flare.prefuse.org/>;rel="memento";
datetime="Mon, 08 Sep 2008 00:00:00 GMT",
• Same archive, datetime, different URI-M- Strict: 3, Loose: 1
<http://web.archive.org/web/20101101060204/http://aarp.org:80/Health/>;rel="memento";
datetime="Mon, 01 Nov 2010 06:02:04 GMT",
<http://web.archive.org/web/20101101060204/http://www.aarp.org:80/Health/>;rel="memento";
datetime=“Mon, 01 Nov 2010 06:02:04 GMT",
<http://web.archive.org/web/20101101060204/http://www.aarp.org:80/health/>;rel="memento";
datetime=“Mon, 01 Nov 2010 06:02:04 GMT",
• Same archive, different URI-M, bad datetime- Strict: 2, Loose: 2
<http://wayback.archive-it.org/2342/20110321192906/http://www.apple.com/iphone/find-my-iphone-
setup/>...datetime="Mon, 21 Mar 2011 00:00:00 GMT"
<http://wayback.archive-it.org/2354/20110321035356/http://www.apple.com/iphone/find-my-iphone-
setup/>...datetime="Mon, 21 Mar 2011 00:00:00 GMT"
21
Strict vs. Loose: translate.google.com
22
Agenda
23
Testing
• TTLs [0, 92]
– 0: Thrashed cache, best freshness
– 92: First TimeMap cached, no replacement
• Policies
– Unconditional
• Cardinality ignored
– Conditional
• Replacements occur when cardinality is better
24
Evaluation
• Minimize cost values:
– Q – Queries to the archives
– MemDays – number of missed mementos/day
• Calculated MemDays: mementos missed/day
TTL: ∞
TTL: 0 MemDays
Q
25
MemDays
26
6
|TM|=10
MemDay
=8
Optimal TTLUnconditional
Conditional
Optimal TTL= 9
Optimal TTL= 15
27
Agenda
28
Conclusion & Future Work
• 3-month observation of 4,000 TimeMaps
• Change patterns studied
– 80.2% of TimeMaps monotonically increase
– Others decrease
• Optimal TTL = 15 days
• Cache Improvements:
– Saves requests to the archives
• Worth reinvestigating
– Changed Memento landscape 29
Backups
30
www.nasa.gov 1996 - 2012
31
Memento
Integrates the past and present web
Now
Always
Current
2008 2006 200120082010
32
33
Cardinality
• Size of a TimeMap
– # Archives?
– # Date times?
• TimeMaps:
• Cardinality:
• Monotonic Increase:
34

Contenu connexe

Plus de Justin Brunelle

Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosJustin Brunelle
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacationsJustin Brunelle
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentJustin Brunelle
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer ScientistJustin Brunelle
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACMJustin Brunelle
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODUJustin Brunelle
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODUJustin Brunelle
 

Plus de Justin Brunelle (8)

Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
 
Records expo
Records expoRecords expo
Records expo
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODU
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

An Evaluation of Caching Policies for Memento TimeMaps

  • 1. An Evaluation of Caching Policies for Memento TimeMaps Justin F. Brunelle and Michael L. Nelson Old Dominion University {jbrunelle, mln}@cs.odu.edu JCDL 2013 Indianapolis, Indiana 07/2013
  • 2. Discovering Archived nasa.gov Pages Archived Pages => mementos Mementos identified by URI-M Live Pages => resources Resources identified by URI-R 2
  • 3. 3
  • 4. TimeMaps: Lists of mementos <http://mementoproxy.lanl.gov/aggr/timegate/http://www.nasa.gov/>;rel="timegate", <http://www.nasa.gov/>;rel="original", <http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT", <http://api.wayback.archive.org/memento/19970605230559/http://www.nasa.gov/>;rel="memento";datetime="Thu, 05 Jun 1997 23:05:59 GMT", <http://api.wayback.archive.org/memento/19970711094601/http://www.nasa.gov/>;rel="memento";datetime="Fri, 11 Jul 1997 09:46:01 GMT", <http://api.wayback.archive.org/memento/19981202170636/http://www.nasa.gov/>;rel="memento";datetime="Wed, 02 Dec 1998 17:06:36 GMT", <http://api.wayback.archive.org/memento/19981212031235/http://www.nasa.gov/>;rel="memento";datetime="Sat, 12 Dec 1998 03:12:35 GMT", <http://api.wayback.archive.org/memento/19990116233500/http://nasa.gov/>;rel="memento";datetime="Sat, 16 Jan 1999 23:35:00 GMT", <http://api.wayback.archive.org/memento/19990117063022/http://nasa.gov/>;rel="memento";datetime="Sun, 17 Jan 1999 06:30:22 GMT", <http://api.wayback.archive.org/memento/19990125091025/http://nasa.gov/>;rel="memento";datetime="Mon, 25 Jan 1999 09:10:25 GMT", <http://api.wayback.archive.org/memento/19990203005545/http://nasa.gov/>;rel="memento";datetime="Wed, 03 Feb 1999 00:55:45 GMT", <http://api.wayback.archive.org/memento/20080903053412/http://www.nasa.gov/>;rel="memento";datetime="Wed, 03 Sep 2008 05:34:12 GMT", <http://webarchive.nationalarchives.gov.uk/20080904014810/http://www.nasa.gov/>;rel="memento";datetime="Thu, 04 Sep 2008 00:00:00 GMT", <http://api.wayback.archive.org/memento/20080904055742/http://www.nasa.gov/>;rel="memento";datetime="Thu, 04 Sep 2008 05:57:42 GMT", <http://webarchive.nationalarchives.gov.uk/20080906134025/http://www.nasa.gov/>;rel="memento";datetime="Sat, 06 Sep 2008 00:00:00 GMT", <http://api.wayback.archive.org/memento/20080906143204/http://www.nasa.gov/>;rel="memento";datetime="Sat, 06 Sep 2008 14:32:04 GMT", <http://webarchive.nationalarchives.gov.uk/20080907124040/http://www.nasa.gov/>;rel="memento";datetime="Sun, 07 Sep 2008 00:00:00 GMT", <http://api.wayback.archive.org/memento/20080907160232/http://www.nasa.gov/>;rel="memento";datetime="Sun, 07 Sep 2008 16:02:32 GMT", <http://webarchive.nationalarchives.gov.uk/20120809003120/http://www.nasa.gov/>;rel="memento";datetime="Thu, 09 Aug 2012 00:00:00 GMT", <http://webarchive.nationalarchives.gov.uk/20120814175606/http://www.nasa.gov/>;rel="memento";datetime="Tue, 14 Aug 2012 00:00:00 GMT", <http://webarchive.nationalarchives.gov.uk/20120819212348/http://www.nasa.gov/>;rel="memento";datetime="Sun, 19 Aug 2012 00:00:00 GMT", <http://webarchive.nationalarchives.gov.uk/20120826185010/http://www.nasa.gov/>;rel="memento";datetime="Sun, 26 Aug 2012 00:00:00 GMT", <http://webarchive.nationalarchives.gov.uk/20120909230516/http://www.nasa.gov/>;rel="last memento";datetime="Sun, 09 Sep 2012 00:00:00 GMT" <http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/> ;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT" http://webarchive.nationalarchives.gov.uk/20080907124040/http://www.nasa.gov/ ;rel="memento";datetime="Sun, 07 Sep 2008 00:00:00 GMT", 4
  • 5. Aggregating TimeMapes • Multiple archives • Expensive • Caching reduces load on archives • Write-through Cache Aggre- gator Sort IA TM AIT TM HTTP Cache … 5
  • 6. Aggregator Cache • TimeMaps change • Only want to cache better TimeMaps – Bigger is better • Ideally monotonically increasing • Two extremes: – Never cache (TTL=0) – Never update in cache (TTL=92) 6
  • 8. Cache content measures • |a| => # of archives <http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/> ;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT”, • |m| => # of mementos <http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/> ;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT”, 8
  • 9. Same TimeMap • |a| == |a'| • |m| == |m'| All archives have reported the same mementos. TimeMap T 9 mm mm mm TimeMap T' mm mm mm |a| = 2; |m| = 3 |a| = 2; |m| = 3
  • 10. Gained Archives, Gained Mementos • |a| < |a`| • |m| < |m`| A new archive (WebCite) has just indexed and reported a memento for the first time. 10 TimeMap T mm mm mm TimeMap T' mm mm mm mm |a| = 2; |m| = 3 |a| = 3; |m| = 4
  • 11. • |a| == |a`| • |m| < |m`| The Internet Archive has released a set of new mementos. 11 TimeMap T mm mm mm TimeMap T' mm mm mm mm Same Archives, Gained Mementos |a| = 2; |m| = 3 |a| = 2; |m| = 4
  • 12. Lost Archives, Same Mementos • |a| > |a`| • |m| == |m`| A redaction of 1 memento took place in the Internet Archive which now does not report mementos for this resource. The UK Web Archive has released 1 new memento for this resource. 1212 TimeMap T ' mm mm mm TimeMap T mm mm mm |a| = 3; |m| = 3 |a| = 2; |m| = 3
  • 13. Lost Archives, Gained Mementos • |a| > |a`| • |m| < |m`| A redaction of 2 mementos took place in the Internet Archive which now does not report mementos for this resource. The UK Government Web Archive has released 3 new mementos for this resource. 13 TimeMap T mm mm mm TimeMap T' mm mmmm mm |a| = 2; |m| = 3 |a| = 1; |m| = 4
  • 14. Lost Archives, Lost Mementos • |a| > |a`| • |m| > |m`| Archive-It has removed a collection, and no longer reports those mementos. No other archives have new mementos of those resources. 14 TimeMap T mm mm mm TimeMap T' mm |a| = 2; |m| = 3 |a| = 1; |m| = 1
  • 15. Gained Archives, Lost Mementos • |a| < |a`| • |m| > |m`| A new archive (WebCite) has just indexed and reported 1 memento for the first time. A server error at the Internet Archive caused an omission of 2 mementos. 15 TimeMap T mm mm mm |a| = 2; |m| = 4 TimeMap T' mm mm mm |a| = 3; |m| = 3 mm
  • 17. Experiment Design • Eliminate caching from local Memento proxies • Daily observations of 4,000 TimeMaps for 92 days in 2013 • TimeMaps analyzed for changes & cardinality • Investigated caching policies • Outages observed from Memento/archives/department 17
  • 18. Observations Occurrence Description Action 77.4% Unchanged TimeMap Do not update cache 19.7% Lost archives, lost mementos Do not update cache 2.4% Gained archives, gained mementos Update cache 0.4% Same archives, gained mementos Update cache 0.1% Gained archives, lost mementos Do not update cache 0.01% Lost archives, same mementos Update cache 0.01% Lost archives, gained mementos Update cache 18
  • 19. Impact of Change in TimeMaps • Caching transient errors – Not returned or not archived? 19
  • 20. Cardinality of TimeMaps <http://mementoproxy.lanl.gov/aggr/timegate/http://www.nasa.gov/>;rel="timegate", <http://www.nasa.gov/>;rel="original", <http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT", <http://api.wayback.archive.org/memento/19970605230559/http://www.nasa.gov/>;rel="memento";datetime="Thu, 05 Jun 1997 23:05:59 GMT", <http://api.wayback.archive.org/memento/19970711094601/http://www.nasa.gov/>;rel="memento";datetime="Fri, 11 Jul 1997 09:46:01 GMT", <http://api.wayback.archive.org/memento/19981202170636/http://www.nasa.gov/>;rel="memento";datetime="Wed, 02 Dec 1998 17:06:36 GMT", <http://api.wayback.archive.org/memento/19981212031235/http://www.nasa.gov/>;rel="memento";datetime="Sat, 12 Dec 1998 03:12:35 GMT", <http://api.wayback.archive.org/memento/19990116233500/http://nasa.gov/>;rel="memento";datetime="Sat, 16 Jan 1999 23:35:00 GMT", <http://api.wayback.archive.org/memento/19990117063022/http://nasa.gov/>;rel="memento";datetime="Sun, 17 Jan 1999 06:30:22 GMT", <http://api.wayback.archive.org/memento/19990125091025/http://nasa.gov/>;rel="memento";datetime="Mon, 25 Jan 1999 09:10:25 GMT", <http://api.wayback.archive.org/memento/19990203005545/http://nasa.gov/>;rel="memento";datetime="Wed, 03 Feb 1999 00:55:45 GMT", … |TM| ? 20
  • 21. Strict vs. Loose Matching • Different archive, URI-M, datetime- Strict: 2, Loose: 2 <http://api.wayback.archive.org/memento/20080509125659/http://flare.prefuse.org/>;rel="memento"; datetime="Fri, 09 May 2008 12:56:59 GMT", <http://webarchive.nationalarchives.gov.uk/20080908074106/http://flare.prefuse.org/>;rel="memento"; datetime="Mon, 08 Sep 2008 00:00:00 GMT", • Same archive, datetime, different URI-M- Strict: 3, Loose: 1 <http://web.archive.org/web/20101101060204/http://aarp.org:80/Health/>;rel="memento"; datetime="Mon, 01 Nov 2010 06:02:04 GMT", <http://web.archive.org/web/20101101060204/http://www.aarp.org:80/Health/>;rel="memento"; datetime=“Mon, 01 Nov 2010 06:02:04 GMT", <http://web.archive.org/web/20101101060204/http://www.aarp.org:80/health/>;rel="memento"; datetime=“Mon, 01 Nov 2010 06:02:04 GMT", • Same archive, different URI-M, bad datetime- Strict: 2, Loose: 2 <http://wayback.archive-it.org/2342/20110321192906/http://www.apple.com/iphone/find-my-iphone- setup/>...datetime="Mon, 21 Mar 2011 00:00:00 GMT" <http://wayback.archive-it.org/2354/20110321035356/http://www.apple.com/iphone/find-my-iphone- setup/>...datetime="Mon, 21 Mar 2011 00:00:00 GMT" 21
  • 22. Strict vs. Loose: translate.google.com 22
  • 24. Testing • TTLs [0, 92] – 0: Thrashed cache, best freshness – 92: First TimeMap cached, no replacement • Policies – Unconditional • Cardinality ignored – Conditional • Replacements occur when cardinality is better 24
  • 25. Evaluation • Minimize cost values: – Q – Queries to the archives – MemDays – number of missed mementos/day • Calculated MemDays: mementos missed/day TTL: ∞ TTL: 0 MemDays Q 25
  • 29. Conclusion & Future Work • 3-month observation of 4,000 TimeMaps • Change patterns studied – 80.2% of TimeMaps monotonically increase – Others decrease • Optimal TTL = 15 days • Cache Improvements: – Saves requests to the archives • Worth reinvestigating – Changed Memento landscape 29
  • 32. Memento Integrates the past and present web Now Always Current 2008 2006 200120082010 32
  • 33. 33
  • 34. Cardinality • Size of a TimeMap – # Archives? – # Date times? • TimeMaps: • Cardinality: • Monotonic Increase: 34

Notes de l'éditeur

  1. Mention here that timemaps are cached to save from overburdening the archives and improve response-time to the users.
  2. CNN.com: 15878 Google.com: 27540
  3. For two extremes, make note of the expense: Never caching means we have to request timemaps of the archives and build the aggregator each HTTP GET for the aggregator timegate (meaning it will only operate as fast as the slowest responding archive). However, this gives us the freshest results. Never replacing in the cache means we might have the stale-est results but save load on the archives. This method also has the highest potential to cache transient errors forever.
  4. |a| is the number of contributing archives |m| is the number of unique mementos listed in the timemap. For simplicity and the time being, let’s refer to a unique memento as a single observation of a URI-R at a point in time.
  5. Vast majority of timemaps don’t change. When they do, they often change because an archive isn’t reporting its mementos. Rarely does something “strange” happen.
  6. Explain transient error vs. not archived
  7. MemDays is needed because a TimeMap that misses 2 mementos per day for 10 days is not the same as one that misses 100 per day, and is better than one that misses 1,000 once.
  8. Optimal ttl at intersection of memdays and misses (requests to archives)
  9. 3 months probably isn’t enough data The new Memento landscape has changed – there are new archives, the IA publishes mementos with increased frequency, more archives are memento compliant. This makes the study worth investigating again.
  10. Pages change over time