Reference Rot and Link Decoration
Presentation given at OAI9 based on "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
Reference Rot and Link Decoration
1. Reference Rot and !
Link Decoration!
Martin Klein!
UCLA
martinklein0815@gmail.com
@mart1nkle1n
2. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Hiberlink Team
• Los Alamos National Laboratory
• Research Library: (Martin Klein), (Robert Sanderson), Harihar
Shankar, Herbert Van de Sompel!
• University of Edinburgh
• Edina: Peter Burnhill, Neil Mayo, Muriel Mewissen, Christine
Rees, Tim Strickland, Richard Wincewicz
• Language Technology Group: Beatrix Alex, Claire Grover,
Colin Matheson, Richard Tobin, (Ke “Adam” Zhou)
• Funding: Andrew W. Mellon Foundation
2
3. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
3
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253
4. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
4
Reference Rot
5. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
5
Link Rot
6. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
6
“Entertaining” Link Rot
7. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
7
Ubiquitous Link Rot
8. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
8
Content Drift
http://dl00.org!
!
2000
9. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
9
Content Drift
http://dl00.org!
!
2004
10. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
10
Content Drift
http://dl00.org!
!
2005
11. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
11
Content Drift
http://dl00.org!
!
2008
12. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
12
NYT Coverage
Links in!
Supreme Court decisions:!
!
• Link rot: 29%!
!
• Reference rot: 49%
13. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
13
Scholarly Communication
14. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
14
!Exist
!Exist
!Exist
Exist
Exist
Archived
Archived
!Archived
Archived
Archived
15. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Entrance Hiberlink
• These resources:
• Are not necessarily under the custodianship of parties that care about
long time integrity, access
• Do not necessarily have the same sense of fixity like e.g., journal articles
• Links to these resources are subject to Reference Rot:
• Link Rot: Link stops working e.g., HTTP 404
• Content Drift: Linked content changes over time
15
16. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
16
Quantifying!
Reference Rot
17. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Our Study
• Time frame of publications: Jan 1997 - Dec 2012
• Articles from arXiv, Elsevier, and PMC in XML and PDF format
• Convert PDF to XML
• Extract URIs to web at large resources
• Store article’s publication date
• URI live web test (trusted in 200 OK response)
• URI archive lookup via Memento infrastructure
17
arXiv Elsevier PMC
total articles 707, 667 2, 285, 000 595, 889
articles with HTTP references 142, 134 94, 645 156, 160
amount of HTTP references 346, 177 232, 712 480, 853
18. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
18
1997 1999 2001 2003 2005 2007 2009 2011
02000060000100000140000180000
articles
URI references
1997 1999 2001 2003 2005 2007 2009 2011
050001500025000350004500055000
articles
URI references
1997 1999 2001 2003 2005 2007 2009 2011
050000100000150000200000250000300000350000
articles
URI references
PMC
Elsevier
arXiv
Our Corpora
19. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
19
Link Rot in arXiv
1997 1999 2001 2003 2005 2007 2009 2011
102030405060708090100
1000020000300004000050000
HTTP References
Link Rot
NumberofHTTPReferences
LinkRotPercentage
21. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
21
Content Drift / Archival Status
Not Archived
75.3%
Archived
24.7%
Rotten
26.0%
Active
74.0%
All Links
• Archival status used as proxy
• Availability of archived copy created within N days of article’s publication
• N = 14 arXiv
22. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
22
PMC
Elsevier
arXiv
Not Archived
75.3%
Archived
24.7%
Rotten
26.0%
Active
74.0%
All Links
Not Archived
75.2%
Archived
24.8%
Rotten
32.7%
Active
67.3%
All Links
Not Archived
74.5%
Archived
25.5%
Rotten
20.0%
Active
80.0%
All Links
23. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
23
Loss of Context
24. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
24
Loss of Context
all links active links
links archived!
(14 days)
25. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
STM Article Extrapolation
25
26. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
STM Article Extrapolation
• Immune: article contains no URIs to web at large
resources
• Healthy: none of the URIs to web at large
resources suffer from link rot nor content drift
• infected: at least one URI to web at large
resources suffers from link rot or content drift
26
27. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
27
Immune vs not Immune STM Articles
0
10
20
30
40
50
60
70
80
90
100
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Immune not Immune
28. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
STM Article Extrapolation
• Immune: article contains no URIs to web at large
resources
• Healthy: none of the URIs to web at large
resources suffer from reference rot
• Infected: at least one URI to web at large
resources suffers from reference rot
28
30. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
30
An approach to solve !
Reference Rot
31. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Robust Links
1.Create snapshot of linked resources in a web archive when:
• drafting work
• submitting article
• publishing article
• aggregating article
31
32. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Robust Links
1. Create snapshot of linked resources in a web
archive
2. Convey creation date of your web page in
machine-actionable manner
32
33. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Page Creation Date
33
<!DOCTYPE html>
<html>
<head>
<title> … </title>
<meta itemprop="datePublished" content="2015-02-18" />
…
</head>
…
</html>
34. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
34
35. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Robust Links
1. Create snapshot of linked resources in a web archive
2. Convey creation date of your web page in machine-
actionable manner
3. Decorate links with datetime of linking and URI of
archived snapshot, in addition to resource’s original
URI
35
http://robustlinks.mementoweb.org/spec/
36. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Link Decoration
36
<a href="http://hiberlink.org/">http://hiberlink.org/</a>
37. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Link Decoration
37
<a href="http://hiberlink.org/"
!
data-versionurl="http://archive.is/Bvq2v"
data-versiondate=“2014-11-01">
!
http://hiberlink.org/</a>
38. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
38
http://robustlinks.mementoweb.org/demo/uri_references_js.html
39. Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
39
http://robustlinks.mementoweb.org/demo/uri_references_js.html
40. Reference Rot and !
Link Decoration!
Martin Klein!
UCLA
martinklein0815@gmail.com
@mart1nkle1n