Link checking, the 404 problem (and the other 403) (Richard Cross, Nottingham Trent University)
How much time should libraries spend validating links from resource lists to electronic and online materials? How much effort should go in to validating copyright and quality assuring materials selected by academics from the web? This session will highlight some of the key issues raised by the business of 'link checking' at Nottingham Trent University, discuss the pros and cons of different assurance methods, and suggest some possible developments in the functionality of Aspire that could support and streamline this area of work.
Link checking, the 404 problem (and the other 403) (Richard Cross, Nottingham Trent University)
1. 25 June 2013
1
Enhancing life-long learning, teaching and research through
information resources and services
2. 25 June 2013
2
Link checking, the 404 problem (and the
other 403)
Richard Cross, Resource Discovery and Innovation Team Manager
Nottingham Trent University Library
Talis Aspire User Group | Nottingham Conference Centre| June 2013
3. 25 June 2013
3
Linking it all up
• Problem? What problem?
• The link validation process at Nottingham Trent University
• Issues and challenges in the review of link-to materials
• Some possible Aspire developments that could help
4. The 404 problem?
• Error 404: standard response from a web site when the requested
page URL cannot be found
25 June 2013
4
5. The other 403?
• The tangle of issues raised
by the challenge of
ensuring reliable,
problem-free, seamless,
deep-linking to online
resources
• Broken links, inaccessible
material, poor quality
metadata, copyright
blindness, unclear
provenance – all raise
reputational issues for the
resource list service
25 June 2013
5
6. Aspire, metadata and linking
DOI-based, Aspire
recogniser supported
25 June 2013
6
Strongest metadata
and link-to extraction
Web stuff
Weakest metadata and
link-to extraction
• Aspire’s ability to ‘recognise’ online materials exists – for
understandable reasons – along a continuum
7. One of the biggest challenges?
Managing the things
academics finds on the
interwebnet…
25 June 2013
7
• Link-to materials which are not selected from library-mediated
discovery systems can be the most challenging to manage
8. Reasons
to be
fearful
25 June 2013
Ease of Google
discovery
Poor web site
design
Search engine
bot indexing
Copyright
infringing hosts
Lack of copyright
knowledge
Resistance to
accept the utility
of copyright
Time-poor
academics
Quick-and-dirty
resource
selection
Lack of real-time
guidance
8
9. 25 June 2013
9
The link validation process at Nottingham Trent
University
10. Link checking and the library review process
Acquisitions
Link
checking
Digitisation
• Improve and validate metadata (matching as closely as possible the
requirements of the LLR Harvard citation style)
• Validate URLs for to be: persistent, location-independent,
authentication-aware
• Review OpenURL resolution (to test match full-text outcomes)
• Check for copyright-compliance, and provider appropriateness
11. What’s the link-checking workflow?
• Metadata librarians routinely deal with link-
checking processes
• Exceptions and queries are referred to
review meetings
• Scenarios are recorded for future reference -
to routinise the processing of future
occurrences
• Suspected infringements of copyright are
recorded
• End of review reports are shared with Liaison
Librarians, either 'for reference' or for further
action
12. Clear evidence of added-value in process
• Links to materials that would otherwise break now persist
• Access to content is available on-campus and off-campus
• OpenURL resolution to full-text outcomes is improved
25 June 2013
12
• Metadata is enriched, better
matching citation quality
• Without additional effort on
their part, lecturer’s intention
is met
• List quality and reputation is
improved
13. Where is your institution on the
resource list link-checking continuum?
25 June 2013
13
Q:
We don’t check
anything – it’s
the academic’s
responsibility to
get this right
We check
everything –
we quality
assure for our
students
14. Issues: for librarians involved in Review
• Scale: some lists can include 100+ links
• Link validation process is largely manual, and requires a significant
investment of staff resources
• Time: ‘link checking’ staff have large number of other commitments
• Steep learning curve for processing link-to items
• Application has no memory of previous decisions; no ‘knowledge
base’
15. Issues: for librarians working with academics
• Practice of some list authors suggests training opportunity: resource
selection; copyright awareness; resource discovery
• Lists with large numbers of link-to items can take more time to
validate than academics appreciate
• Large proportions of free-on-the-web materials on resource lists
might not match fee-paying student expectation?
17. University of the Wild West
‘Link to’ examples
Chrisard Ross
Remorse Discovery and Litigation Team Manager
18. YouTube
Not posted
by content
owner
No
indication of
use rights
Clear
copyright
infringement
Prefixing the duper with super
19. YouTube
Prefixing the duper with super
YouTube can and do take
action against copyright
infringement and
intellectual property theft
20. Don’t go there…
Site hosting
full-text of
movie scripts,
ownership of
which is held
by film studios
Academic
discovered
‘repository’
through Google
search
Popping the ‘x’ into ecellent
21. …it’s not safe
Domain itself is not web safe (access is blocked by YouWooWoo’s
WebSense monitor)
Popping the ‘x’ into ecellent
22. ‘Our VLE filestore is Google optimised. Who
knew?’
Article loaded into a
VLE module at ‘an
institution somewhere
on the planet’
Found and indexed by
Google
Found and
bookmarked by
academic
Provenance,
clearance, status -
unknown
Putting you and you into fablos
23. ‘But if I point students to Scribd, it’s all
free…’
Unmediated content
hosts provide access to
full text-material
Copyright is self-
assigned by the
uploader
Indexed by Google,
found by academic
Pouring the ‘we’ between ‘yes’ and ‘can’
24. Do you face challenges managing
link-to content on your lists?
25 June 2013 24
Q:
Never. Our
academics only
select from our
discovery
systems
All the time.
We can
empathise with
the experiences
of YouWooWoo
26. Highlighting copyright awareness
• One-time copyright responsibility tick-box on bookmarking
• Include help and support links in cases of uncertainty
27. Highlighting copyright awareness
• Allied to a tenancy-level amber and red list of domains
• Additional text prompts and action depending on match
• Possible draft/publish options settable based on domain match
allstolen.com
copiedright.com
inicked.it
scribd.com
youtube.com
academia.edu
28. Managing copyright awareness
• Reports on: domain popularity; addition of items from new domains
(with drill through to items)
• Potential for Tenancy-level domain Knowledge Base (guidance on
how to approach material found on, e.g. academia.edu, Scribd,
YouTube, et al)
29. Promoting library resources
• Matching keyword searching with
library discovery systems
• Proposing alternate resources from
other lists in the Tenancy
• Matching to preferred search indexes
(Amazon, Google Scholar,
OpenLibrary)
30. Managing links – reporting on 404s
• As a hosted solution, Aspire’s ability
to check availability of
authentication-protected resources is
limited
• Aspire could report on ‘Webpage’ or
‘Website’ type items (on reasonable
assumption that materials are openly
accessible)
• Then either Aspire or customer could
run checks to report on 404s (and
other error responses)
31. Managing links – reporting on domains
• Aspire could report out on links by domain (e.g. ‘Give me all
details for all Tenancy items linking to sploo.com’)
• Extending the logic of the existing ‘All Journal Article Items’
report
32. Control over links displayed in item record
• Extend the recently released option to select preferred ‘Online
resource’ link from the item record
• Retain URIs in item metadata in full view, but have the option to
select ones not to display – OpenURL link, dx DOI link, deep-link
URL based on best discovery link (default to show all)
34. 25 June 2013
34
Questions or comments?
NTU Resource Lists
http://resourcelists.ntu.ac.uk
Richard Cross
Resource Discovery and Innovation Team Manager
Nottingham Trent University Library
richard.cross@ntu.ac.uk