08448380779 Call Girls In Civil Lines Women Seeking Men
An Evaluation of Caching Policies for Memento TimeMaps
1. An Evaluation of Caching Policies for
Memento TimeMaps
Justin F. Brunelle and Michael L. Nelson
Old Dominion University
{jbrunelle, mln}@cs.odu.edu
JCDL 2013
Indianapolis, Indiana
07/2013
5. Aggregating TimeMapes
• Multiple archives
• Expensive
• Caching reduces
load on archives
• Write-through
Cache
Aggre-
gator
Sort
IA TM
AIT TM
HTTP
Cache
…
5
6. Aggregator Cache
• TimeMaps change
• Only want to cache better TimeMaps
– Bigger is better
• Ideally monotonically increasing
• Two extremes:
– Never cache (TTL=0)
– Never update in cache (TTL=92)
6
8. Cache content measures
• |a| => # of archives
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>
;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT”,
• |m| => # of mementos
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>
;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT”,
8
9. Same TimeMap
• |a| == |a'|
• |m| == |m'|
All archives have reported the same mementos.
TimeMap T
9
mm mm
mm
TimeMap T'
mm mm
mm
|a| = 2; |m| = 3 |a| = 2; |m| = 3
10. Gained Archives, Gained Mementos
• |a| < |a`|
• |m| < |m`|
A new archive (WebCite) has just indexed and
reported a memento for the first time.
10
TimeMap T
mm mm
mm
TimeMap T'
mm mm
mm
mm
|a| = 2; |m| = 3 |a| = 3; |m| = 4
11. • |a| == |a`|
• |m| < |m`|
The Internet Archive has released a set of new
mementos.
11
TimeMap T
mm mm
mm
TimeMap T'
mm mm
mm mm
Same Archives, Gained Mementos
|a| = 2; |m| = 3 |a| = 2; |m| = 4
12. Lost Archives, Same Mementos
• |a| > |a`|
• |m| == |m`|
A redaction of 1 memento took place in the Internet Archive which
now does not report mementos for this resource.
The UK Web Archive has released 1 new memento for this resource.
1212
TimeMap T '
mm mm
mm
TimeMap T
mm
mm
mm
|a| = 3; |m| = 3 |a| = 2; |m| = 3
13. Lost Archives, Gained Mementos
• |a| > |a`|
• |m| < |m`|
A redaction of 2 mementos took place in the Internet Archive which
now does not report mementos for this resource.
The UK Government Web Archive has released 3 new mementos for
this resource.
13
TimeMap T
mm mm
mm
TimeMap T'
mm
mmmm
mm
|a| = 2; |m| = 3 |a| = 1; |m| = 4
14. Lost Archives, Lost Mementos
• |a| > |a`|
• |m| > |m`|
Archive-It has removed a collection, and no longer reports
those mementos. No other archives have new mementos
of those resources.
14
TimeMap T
mm mm
mm
TimeMap T'
mm
|a| = 2; |m| = 3 |a| = 1; |m| = 1
15. Gained Archives, Lost Mementos
• |a| < |a`|
• |m| > |m`|
A new archive (WebCite) has just indexed and reported 1 memento for
the first time.
A server error at the Internet Archive caused an omission of 2
mementos.
15
TimeMap T
mm mm
mm
|a| = 2; |m| = 4
TimeMap T'
mm
mm
mm
|a| = 3; |m| = 3
mm
17. Experiment Design
• Eliminate caching from local Memento proxies
• Daily observations of 4,000 TimeMaps for 92 days in 2013
• TimeMaps analyzed for changes & cardinality
• Investigated caching policies
• Outages observed from Memento/archives/department
17
18. Observations
Occurrence Description Action
77.4% Unchanged TimeMap Do not update cache
19.7% Lost archives, lost mementos Do not update cache
2.4% Gained archives, gained mementos Update cache
0.4% Same archives, gained mementos Update cache
0.1% Gained archives, lost mementos Do not update cache
0.01% Lost archives, same mementos Update cache
0.01% Lost archives, gained mementos Update cache
18
19. Impact of Change in TimeMaps
• Caching transient errors
– Not returned or not archived?
19
20. Cardinality of TimeMaps
<http://mementoproxy.lanl.gov/aggr/timegate/http://www.nasa.gov/>;rel="timegate", <http://www.nasa.gov/>;rel="original",
<http://api.wayback.archive.org/memento/19961231235847/http://www.nasa.gov/>;rel="first memento";datetime="Tue, 31 Dec 1996 23:58:47 GMT",
<http://api.wayback.archive.org/memento/19970605230559/http://www.nasa.gov/>;rel="memento";datetime="Thu, 05 Jun 1997 23:05:59 GMT",
<http://api.wayback.archive.org/memento/19970711094601/http://www.nasa.gov/>;rel="memento";datetime="Fri, 11 Jul 1997 09:46:01 GMT",
<http://api.wayback.archive.org/memento/19981202170636/http://www.nasa.gov/>;rel="memento";datetime="Wed, 02 Dec 1998 17:06:36 GMT",
<http://api.wayback.archive.org/memento/19981212031235/http://www.nasa.gov/>;rel="memento";datetime="Sat, 12 Dec 1998 03:12:35 GMT",
<http://api.wayback.archive.org/memento/19990116233500/http://nasa.gov/>;rel="memento";datetime="Sat, 16 Jan 1999 23:35:00 GMT",
<http://api.wayback.archive.org/memento/19990117063022/http://nasa.gov/>;rel="memento";datetime="Sun, 17 Jan 1999 06:30:22 GMT",
<http://api.wayback.archive.org/memento/19990125091025/http://nasa.gov/>;rel="memento";datetime="Mon, 25 Jan 1999 09:10:25 GMT",
<http://api.wayback.archive.org/memento/19990203005545/http://nasa.gov/>;rel="memento";datetime="Wed, 03 Feb 1999 00:55:45 GMT",
…
|TM| ?
20
21. Strict vs. Loose Matching
• Different archive, URI-M, datetime- Strict: 2, Loose: 2
<http://api.wayback.archive.org/memento/20080509125659/http://flare.prefuse.org/>;rel="memento";
datetime="Fri, 09 May 2008 12:56:59 GMT",
<http://webarchive.nationalarchives.gov.uk/20080908074106/http://flare.prefuse.org/>;rel="memento";
datetime="Mon, 08 Sep 2008 00:00:00 GMT",
• Same archive, datetime, different URI-M- Strict: 3, Loose: 1
<http://web.archive.org/web/20101101060204/http://aarp.org:80/Health/>;rel="memento";
datetime="Mon, 01 Nov 2010 06:02:04 GMT",
<http://web.archive.org/web/20101101060204/http://www.aarp.org:80/Health/>;rel="memento";
datetime=“Mon, 01 Nov 2010 06:02:04 GMT",
<http://web.archive.org/web/20101101060204/http://www.aarp.org:80/health/>;rel="memento";
datetime=“Mon, 01 Nov 2010 06:02:04 GMT",
• Same archive, different URI-M, bad datetime- Strict: 2, Loose: 2
<http://wayback.archive-it.org/2342/20110321192906/http://www.apple.com/iphone/find-my-iphone-
setup/>...datetime="Mon, 21 Mar 2011 00:00:00 GMT"
<http://wayback.archive-it.org/2354/20110321035356/http://www.apple.com/iphone/find-my-iphone-
setup/>...datetime="Mon, 21 Mar 2011 00:00:00 GMT"
21
34. Cardinality
• Size of a TimeMap
– # Archives?
– # Date times?
• TimeMaps:
• Cardinality:
• Monotonic Increase:
34
Notes de l'éditeur
Mention here that timemaps are cached to save from overburdening the archives and improve response-time to the users.
CNN.com: 15878 Google.com: 27540
For two extremes, make note of the expense: Never caching means we have to request timemaps of the archives and build the aggregator each HTTP GET for the aggregator timegate (meaning it will only operate as fast as the slowest responding archive). However, this gives us the freshest results. Never replacing in the cache means we might have the stale-est results but save load on the archives. This method also has the highest potential to cache transient errors forever.
|a| is the number of contributing archives |m| is the number of unique mementos listed in the timemap. For simplicity and the time being, let’s refer to a unique memento as a single observation of a URI-R at a point in time.
Vast majority of timemaps don’t change. When they do, they often change because an archive isn’t reporting its mementos. Rarely does something “strange” happen.
Explain transient error vs. not archived
MemDays is needed because a TimeMap that misses 2 mementos per day for 10 days is not the same as one that misses 100 per day, and is better than one that misses 1,000 once.
Optimal ttl at intersection of memdays and misses (requests to archives)
3 months probably isn’t enough data The new Memento landscape has changed – there are new archives, the IA publishes mementos with increased frequency, more archives are memento compliant. This makes the study worth investigating again.