Presentation given at the International Digital Curation Conference in San Francisco, February 26 2014. Highlights the lack of machine-actionability of persistent identifiers assigned to scholarly communication assets. Proposes an approach to address the issue that meets requirements that take into account the changing nature of web based research communication. A draft paper provides more details: http://public.lanl.gov/herbertv/papers/Papers/2014/IDCC2014_vandesompel.pdf
Exploring the Future Potential of AI-Enabled Smartphone Processors
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
1. Persistent Identifiers for Scholarly Assets and the Web:
The Need for an Unambiguous Mapping
Herbert Van de Sompel
@hvdsomp
Robert Sanderson
@azaroth42
Harihar Shankar
@hariharshankar
Martin Klein
@mart1nkle1n
Los Alamos National Laboratory
2. Acknowledgments
•
•
•
•
•
•
•
•
Sean Bechhofer – University of Manchester
Geoff Bilder – CrossRef
Maarten Hoogerwerf – DANS
Pete Johnston – Cambridge University
Carl Lagoze - University of Michigan
Michael L. Nelson – Old Dominion University
Andrew Treloar – ANDS
Simeon Warner – Cornell University
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
3. Motivation
• Persistent/Persist-able Identifiers (PIDs) play a crucial role in the
identification of scholarly assets
• Motivated by concerns of long term persistence, PIDs are minted
outside of the dominant web information access protocol, HTTP
• Value added services targeted at humans and machines
assume/require resources identified by means of HTTP URIs
• Hence, an unambiguous bridge is required between:
• PID-oriented paradigm of research communication
• HTTP-oriented web, semantic web, linked data environment
• Preferably, such a bridge should work across PID systems
• Interoperability between PID systems
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
4. Status Quo of the PID/HTTP Bridge
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
5.
6.
7. HTTP HEAD != HTTP GET
• The expectation is that an HTTP HEAD on HTTP-URI-PID will yield
the same response (without body) as an HTTP GET
• Martin Fenner finds this is not always the case
• Not a CrossRef resolver problem, a publisher problem
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
9. Examples of Issues with the PID/HTTP Bridge
• Given an HTTP-URI-PID, how can a machine navigate towards the
actual content (i.e. not the landing page)?
• Given an HTTP-URI-LOC (of - say - an image), what is the PID of
the asset it resorts under?
• What is the URI of the Target of an Open Annotation that pertains to
a PID-identified asset (i.e. not to the landing page, not to the PDF,
the HTML, …)?
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
10. Requirements for the PID/HTTP Bridge
• Targeted at machines so richer applications (for humans and
machines) can emerge
• Follow your nose; typed links; RDF
• Support for bundling resources and describing those
resources to reflect that assets increasingly consist of multiple, not
just a single, resource
• Multiple HTTP-URI-LOC resort under a PID
• Support for resource versioning, discovery of versions, access to
versions to reflect that resources used or created during the
research process are increasingly dynamic
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
11. Evidence for these Requirements: Data Citation Principles
(4) Unique Identification: A data citation should include a persistent
method for identification that is machine actionable, globally
unique, and widely used in the community.
(5) Access: Data citations should facilitate access to the data
themselves and to such associated metadata, documentation,
code, and other materials, as are necessary for both humans and
machines to make informed use of the referenced data.
(7) Specificity and Verifiability: … Citations or citation metadata should
include information about provenance and fixity sufficient to
facilitate verifying that the specific timeslice, version and/or granular
portion of data retrieved subsequently is the same as was originally
cited.
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
12. A Proposed PID/HTTP Bridge
• A bridge goes in two directions:
• Uniform path from the PID of an asset the asset’s constituent
resources, each identified by a distinct HTTP-URI-LOC
• Uniform path from the HTTP-URI-LOC of a constituent resource
of a scholarly asset to the PID of that asset
• In order to build the bridge, a rather basic question needs an answer
…
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
13. What is the Nature of the Resource Identified by HTTP-URI-PID?
• HTTP-URI-PID identifies the landing page HTTP-URI-LAND
• Interpretation supported by typical “302 Found” redirection
• HTTP-URI-PID identifies the asset identified by PID for the purpose
of web interactions
• Interpretation supported by:
• CrossRef display guideline that recommends using HTTPURI-PID in the online environment, replacing prior practice
to use PID
• CrossRef provides descriptive RDF metadata using “303
See Also” style content negotiation with HTTP-URI-PID
• The resource is conceptual, a so-called non-information
resource
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
14. A Proposed PID/HTTP Bridge
• A bridge goes in two directions:
• Uniform path from the PID of an asset to the asset’s constituent
resources, each identified by a distinct HTTP-URI-LOC
• Uniform path from the HTTP-URI-LOC of a constituent resource
of a scholarly asset to the PID of that asset
• HTTP-URI-PID identifies the asset identified by PID for the purpose
of web interactions
• The proposed bridge builds on: HTTP, Cool URIs for the
Semantic Web, HTTP Links and Link Relation Types, OAI-ORE,
Memento
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
15.
16.
17. Requirements for the PID/HTTP Bridge
Targeted at machines
Support for bundling resources
• Support for resource versioning
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
18. Common Resource Versioning Pattern
version-specific URI
generic URI: always most recent version
version-specific URI
19. Resource Versioning
• This common resource versioning pattern can be used for
Aggregations (HTTP-URI-PID), Resource Maps (HTTP-URI-MACH),
Aggregated Resources (HTTP-URI-LOC, HTTP-URI-LAND)
• The pattern aligns perfectly with Memento which offers modular
functionality for discovering, accessing resource versions using
HTTP headers (See Resource Versioning and Memento):
• Express datetime of a resource version
• Interlink resource versions
• Interlink resource version and the associated generic resource
• Access an overview of all resource versions
• Access a resource version that was current at a given datetime
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
20. Requirements for the PID/HTTP Bridge
Targeted at machines
Support for bundling resources
Support for resource versioning
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
21. Open Issues
Which ontologies for metadata,
types, relationships? Cf. SURF
info-eu-repo, State of the LOD
Cloud
• No URI schemes for PIDs
• PID/HTTP-URI-PID for each
version; typically none that
always yield the current version
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
22. Open Issues
Should it be owl:sameAs
Should it be rel=“collection”
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
23. References
• Martin Fenner. Challenges in automated DOI resolution.
http://blog.martinfenner.org/2013/10/13/broken-dois/
• FORCE11 Data Citation Principles. http://force11.org/datacitation
• Cool URIs for the Semantic Web. http://www.w3.org/TR/cooluris/
• Web Linking. http://tools.ietf.org/search/rfc5988
• IANA Link Relation Types. http://www.iana.org/assignments/linkrelations/link-relations.xhtml
• OAI-ORE. http://www.openarchives.org/ore/1.0/
• Memento, RFC 7089. http://tools.ietf.org/html/rfc7089
• Resource Versioning and Memento.
http://www.mementoweb.org/guide/howto/
• SURF info-eu-repo. http://purl.org/REP/standards/info-eu-repo
• State of the LOD Cloud. http://lod-cloud.net/state/
Van de Sompel, Sanderson, Shankar, Klein
IDCC 2014, San Francisco, CA, February 26 2014
Notes de l'éditeur
Suggesting that the resource identified by HTTP-URI-PID is a non-information resource that corresponds with the scholarly asset as an intellectual object