1. Preserving
The
Integrity
of
The
Scholarly
Record
http://www.flickr.com/photos/shinez/5000985919/
Peter
Burnhill,
EDINA
@
University
of
Edinburgh
NaAonal
Library
of
Scotland
George
IV
Bridge
5.30pm
16th
February
2. Preserving
The
Integrity
of
The
Scholarly
Record
http://www.flickr.com/photos/shinez/5000985919/
Peter
Burnhill,
EDINA
@
University
of
Edinburgh
NaAonal
Library
of
Scotland
George
IV
Bridge
5.30pm
16th
February
Take
Home
Message:
1) Archive
Streams
of
Issued
Content
2) Avoid
Reference
Rot
3. The
Scholarly
Record
&
Serials
…
[a
focus
on
the
digital]
‘The
Scholarly
Record’
has
a
fuzzy
edge
‘e-‐journals’
Websites,
Databases,
Repositories
‘Book-‐length
work’
4. The
Scholarly
Record
&
Serials
…
[a
focus
on
the
digital]
ConAnuing
Resources,
inc.
Serials
‘The
Scholarly
Record’
has
a
fuzzy
edge
‘e-‐journals’
Websites,
Databases,
Repositories
‘Book-‐length
work’
5. The
Scholarly
Record
&
Serials
…
[a
focus
on
the
digital]
ConAnuing
Resources,
inc.
Serials
‘The
Scholarly
Record’
has
a
fuzzy
edge
Issued
in
Parts
(Serials)
Content
changes
over
Ame
(IntegraAng)
‘e-‐journals’
Websites,
Databases,
Repositories
‘Book-‐length
work’
6. The
Scholarly
Record
&
Serials
…
[a
focus
on
the
digital]
ConAnuing
Resources,
inc.
Serials
‘The
Scholarly
Record’
has
a
fuzzy
edge
Other
‘resources
needed
for
scholarship’
Issued
in
Parts
(Serials)
Content
changes
over
Ame
(IntegraAng)
‘e-‐journals’
Websites,
Databases,
Repositories
‘Book-‐length
work’
‘Gov
Docs’
7. 1. What
exactly
is
the
scholarly
record?
• What
of
that
now
‘issued
on
the
Web’?
• And
what
if
we
limit
focus
to
what
could
get
an
ISSN?
2. Whose
responsibility
is
it
to
act
as
steward?
Each
research
library;
library
consorAa;
naAonal/state
libraries/archives?
&
is
this
a
naAonal,
or
a
trans-‐naAonal
challenge?
The
following
quesAons
are
implicit:
8. An Article, once available in print
on-shelf locally …
… is now online & accessed
remotely,
‘anytime/anywhere’
=> Improved Ease of Access J
But what of Continuity of Access?
Will it be still be there tomorrow?
9. Libraries boast of ‘e-collections’,
but maybe now they only have ‘e-connections’
Picture
credit:
hgp://somanybooksblog.com/2009/03/27/library-‐tour/
=> real & present danger for the integrity
of what is published as scholarly record
10. This is a global challenge: trans-national action
%age of 132,806 ISSN issued for e-serials (December 2013)
US:
20%
UK:
8.6%
Rest
of
World:
71%
Researchers (& libraries/publishers) in any one country
are dependent upon content written and published as
serials in countries other than their own
11. So, who is offering digital shelving?
① Web-scale not-for-profit archiving agencies:
② National libraries …
③ Research libraries: consortia & specialist centres …
Ingesting content with archival intent …
National Science Library,
Chinese Academy of Sciences
National Science Library,
Chinese Academy of Sciences
12. Many archiving organisations a Good Thing
“Digital information is best preserved by replicating it at multiple
archives run by autonomous organizations”
B. Cooper and H. Garcia-Molina (2002)
Some
bad
stuff
will
happen!
13. A
Project
to
Pilot
an
E-‐journal
PreservaAon
Registry
Service
Need to know who is looking after what & how?
14. ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
user requirements
(a)
(b)
ISSN-‐L
as
kernel
field
METADATA
on extant e-serials
METADATA
on preservation action
Digital Preservation
Agencies
Pilot: CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance
A
Project
to
Pilot
an
E-‐journal
PreservaAon
Registry
Service
Need to know who is looking after what & how?
15. ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
user requirements
(a)
(b)
ISSN-‐L
as
kernel
field
METADATA
on extant e-serials
METADATA
on preservation action
Digital Preservation
Agencies
Pilot: CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance
A
Project
to
Pilot
an
E-‐journal
PreservaAon
Registry
Service
Need to know who is looking after what & how?
The Keepers Registry
"Tales
from
the
Keepers
Registry"
Serials
Review
39.1
(2013)
16. …
to
discover
who
is
looking
a5er
what
thekeepers.org as Global Monitor
*New
in
2014*
Library
of
Congress
and
Scholars
Portal
now
reporAng
in
17. e-‐journals
should
be
easy
–
right?
the
Keepers
Registry
recorded
In
2011,
16,558
Atles
‘ingested
&
archived’
by
at
least
1
‘keeper’
in
2013,
21,557
in
2014,
26,195
now
26,712
9,731
'ingested
&
archived'
by
3+
…
more
archiving
&
as
more
archives
report
into
Registry
!
Some
signs
of
Progress:
Wrigen
&
produced
by
Julie
Brown,
1989
18. “Are we there yet?” … “Don’t think so”
‘Ingest Ratio’= titles being ingested by one or more Keeper
/ ‘online serials’ in ISSN Register
= 26,195 / 136,965 [in March 2014]
=> 19%
(We do not know about 80% of all resources having ISSN)
‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers
/ ‘online serials’ in ISSN Register
= 9,656 / 136,965
=> 7%
19. Evidence
on
what
libraries
care
about
Using
Title
List
Comparison
tool
in
Members
Area
of
Keepers
Registry
As
reported
in:
P.
Burnhill
(2013)
Tales
from
The
Keepers
Registry:
Serial
Issues
About
Archiving
&
the
Web.
Serials
Review
39
(1),
3–20.
hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178,
&
hgps://www.era.lib.ed.ac.uk/handle/1842/6682
In
2011/12
three
major
research
libraries
in
the
USA
(Columbia,
Cornell
&
Duke)
checked
archival
status
of
serial
Atles
regarded
as
important
‘Ingest
RaKo’
=
22%
to
28%,
ie
about
a
quarter
=>
fate
of
c.75%
is
unknown
20. very
many
‘at
risk’
e-‐journals
from
many
small
publishers
BIG
publishers
act
early
but
incompletely
Priority:
find
economic
way
to
archive
content
from
…
21. …
logs
for
the
UK
OpenURL
Router*
• 8.5m
full
text
requests
in
UK
during
2012
=>
53,311
online
Atles
requested
Analysis
in
2013::
‘Ingest
RaKo’
=
32%
(16,985/53,311)
=>
over
two
thirds
68%
(36,326
Atles)
held
by
none!
Evidence
based
on
what
Researchers
Use
*
As
reported
in
Keepers
Registry
Blog,
OpenURL
Router
passes
‘discovery’
requests
to
commercial
OpenURL
resolver
services;
developed
&
delivered
by
EDINA
as
part
of
Jisc
support
for
UK
universiAes
&
colleges
22. …
logs
for
the
UK
OpenURL
Router*
• 8.5m
full
text
requests
in
UK
during
2012
=>
53,311
online
Atles
requested
Analysis
in
2013::
‘Ingest
RaKo’
=
32%
(16,985/53,311)
=>
over
two
thirds
68%
(36,326
Atles)
held
by
none!
Evidence
based
on
what
Researchers
Use
*
As
reported
in
Keepers
Registry
Blog,
OpenURL
Router
passes
‘discovery’
requests
to
commercial
OpenURL
resolver
services;
developed
&
delivered
by
EDINA
as
part
of
Jisc
support
for
UK
universiAes
&
colleges
“I
believe
we've
…
a
problem
here.”
[John
Swigert,
Jr.]
23. Another threat to the integrity of the record
Language Technology Group
Funded by the Andrew W. Mellon Foundation
‘Reference
Rot’
When
what
was
referenced
&
cited
ceases
to
say
the
same
thing,
or
‘has
ceased
to
be’
hJp://www.snorgtees.com/this-‐parrot-‐has-‐ceased-‐to-‐be
Reference Rot = Link Rot + Content Drift
“when links to web resources
no longer point to what they once did”
25. + Content Drift: What is at end of URI has changed, or gone!
http://dl00.org
2000
http://dl00.org
2004
http://dl00.org
2005
http://dl00.org
2008
(a)
Dynamic
content
as
values
on
webpage
changes
over
Ame
(b)
StaKc
content
but
very
different
(o{en
unrelated)
web
pages
26. Hiberlink: Time Travel for The Scholarly Web
1. Threat: Creating evidence on extent of ‘Reference Rot’
– Main focus: references (& URIs) made in Journal Articles
• "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"
– PLOS One paper published on 26 December 2014.
• Harvard Law Library & permaCC reference rot in Supreme Court judgments
• http://www.newyorker.com/magazine/2015/01/26/cobweb
– Also looked at Reference Rot & the e-Thesis, ETD2014
2. Remedy: Opportunities for productive intervention
– Identify workflows: preparation, publication, ingest
– Prototype tools to avoid or limit reference rot
– Pro-active or ‘transactional’ archiving as remedy
• Embedding such ‘solutions’ in existing tools & infrastructure
• Propose/test new infrastructure for temporal referencing
– supporting & using the Memento protocol
28. • Robust Link - re-factor the HTML link that is returned
‘Infrastructure’ to Enable Remedy
<a href="http://www.bnf.fr">
Link to the BNF
</a>
b) Augment Link with a set of Datetime & location pairs
<a href="http://www.bnf.fr"
mset="2014-05-19,
http://archive.today/zdpAn 2014-05-15 memento">
Link to the BNF
</a>
a) Take simple URI - to French National Library (say)
hgp://robustlinks.mementoweb.org/
29. Remedy for The Integrity of The Scholarly Record
Envisage
the
best
opportuniAes
for
IntervenAon
to
make
Remedy,
to
‘flash-‐freeze’,
either
to
avoid
reference
rot
or
to
‘stop
the
rot’.
3
basic
workflows:
① Study:
PreparaAon
-‐>
(Review)
-‐>
Submission
② PublicaAon:
Editorial
-‐>
(Revision)
-‐>
Acceptance
-‐>
Issue
③ Post-‐PublicaAon:
Deposit/Ingest
-‐>
Provide/Access
-‐>
Use
IdenPfy
the
Actors
involved
in:
① ComposiAon:
author/creator
② Public
Release:
editor/referee/copy
③ CuraAon:
librarian
/
repository
manager
/
archivist
30. Hiberlink Plug-in: help authors & middle-folk do the right thing:
① Triggers archiving of referenced web content when it
is noted in:
– Zotero - used by authors to manage references
https://www.zotero.org/
– Open Journal System (OJS) - used by OA publishers
https://pkp.sfu.ca/ojs/
② Returns Datetime URI for archived content that can
be used in the citation
Two-step Remedy To Avoid Reference Rot