Talk at the Semantic Web in Libraries Conference 2012 (SWIB2012). Cologne 28/12/2012 during the session "TOWARDS AN INTERNATIONAL LOD LIBRARY ECOLOGY".
(http://swib.org/swib12/programme.php)
Status Quo and (current) Limitations of Library Linked Data
1. Status Quo
and
(current) limitations of
Library Linked Data
Asunción Gómez-Pérez1, Philipp Cimiano2 and
Daniel Vila-Suero1
1Facultad
de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
asun,dvila@fi.upm.es
2AG Semantic Computing, CITEC, University of Bielefeld
cimianog@cit-ec.uni-bielefeld.de
Acknowledgments: BNE team, DNB team, Uldis Bojars
(NLL), Jodi Schneider (NUIG) and others
SWIB12 – Cologne, 28-11-2012
2. Outline
• The Library Linked Data Cloud
• A (library) Linked Data Life-cycle
• A collection of current limitations
• Conclusions
2
5. Library Linked Data Cloud
British Library Data Model - Book
Publication Events Key
@prefix blt:
@prefix rdf:
<http://www.bl.uk/schemas/bibliographic/blterms#> .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
Series foaf:Agent
dcterms:Agent
event:Event
bibo:Series blt:PublicationEvent External
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . An Instance Link
@prefix owl: <http://www.w3.org/2002/07/owl#> . skos:notation skos:prefLabel geo:SpatialThing
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . a rdfs:subClassOf
@prefix dct: <http://purl.org/dc/terms/> . a MARC country code
@prefix isbd: <http://iflastandards.info/ns/isbd/elements/> .
bibo:issn URI A Class A Literal
All properties with a range of
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
Series a Agent blt:PublicationEvent can be used
@prefix bibo: <http://purl.org/ontology/bibo/> .
BL URI foaf:focus BL URI with blt:PublicationStartEvent
@prefix rda: <http://rdvocab.info/ElementsGr2/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
Place and blt:PublicationEndEvent.
@prefix foaf: <http://xmlns.com/foaf/0.1/> . GeoNames URI
BL URI rdfs:subClassOf Arrows omitted for clarity. Author
@prefix event: <http://purl.org/NET/c4dm/event.owl#> .
@prefix org: <http://www.w3.org/ns/org#> . bio:Birth
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . event:agent blt:PublicationStartEvent
event:place
CalendarYear a a
dct:isPartOf
event:place bio:Death a
a
LCSH URI if http://r.d.g/id/year/ PublicationStartEvent rdfs:subClassOf
available xxxx BL URI bio:date
bio:date a
id.loc.gov URI for owl:sameAs blt:PublicationEndEvent Birth BL URI
dct:hasPart event:time
scheme
skos:inScheme
PublicationEvent Death BL URI
Topic LCSH BL URI blt:publicationStart a foaf:familyName
BL URI
a blt:publication PublicationEndEvent
blt:TopicLCSH
BL URI foaf:givenName
bio:event
dct:BibliographicResource bio:event
foaf:name
blt:publicationEnd
bibo:Book
Person-as-Concept or Person-as-Agent
rdfs:subClassOf a bibo:MultiVolumeBook BL URI
BL URI dct:subject a a foaf:Agent
dct:Agent
blt:PersonConcept dct:creator foaf:Person
skos:inScheme foaf:focus
blt:hasCreated owl:sameAs
a
rdfs:subClassOf id.loc.gov URI for
scheme rda:periodOfActivityOfThePerson
dct:contributor
dct:subject blt:hasContributedTo
Skos:Concept VIAF URI if available
Family-as-Concept
BL URI dct:subject dct:creator
rdfs:subClassOf Resource
a BL URI blt:hasCreated
Organization-as-Agent BL
skos:inScheme foaf:focus dct:subject URI
blt:FamilyConcept dct:contributor
rdfs:label
blt:hasContributedTo a
Family-as-Agent rdfs:label
id.loc.gov URI for BL URI [foaf:name]
scheme foaf:Agent
dct:Agent
rdfs:subClassOf foaf:Organization
dct:language org:Organization
foaf:focus
blt:bnb
Organization-as-Concept Lexvo URI bibo:isbn10 bibo:isbn13
blt:OrganizationConcept a
BL URI
dct:subject
dct:subject
skos:inScheme foaf:focus
dct:abstract
Identifiers
dct:tableOfContents dct:title
rdfs:subClassOf
MARC language dct:alternative
id.loc.gov URI for isbd:P1008
scheme code URI (edition statement)
dct:spatial
blt:TopicDDC isbd:P1073
a Dewey (note on language)
skos:notation skos:prefLabel
BL URI
skos:inScheme isbd:P1042
Title
dct:description
(content note)
Dewey Info URI for skos:broader
skos:notation isbd:P1053
scheme (extent)
New version of datos.bne.es by
bibo:numVolumes
Dewey Info URI Place-as-Concept
BL URI Place-as-Thing
foaf:focus BL URI
a owl:sameAs a
Miscellaneous literals
beginning of 2013 including:
rdfs:subClassOf
Subject blt:PlaceConcept LCSH URI if geo:SpatialThing Assume that most instance data will have
an rdfs:label. These properties have been
Tim Hodson - tim.hodson@talis.com
Corine Deliot - Corine.Deliot@bl.uk
available dct:Location omitted for clarity. Alan Danskin - Alan.Danskin@bl.uk
Heather Rosie - Heather.Rosie@bl.uk
V.1.4 August 2012 Jan Ashton - Jan.Ashton@bl.uk
- Links to digital objects
- More links to external datasets
- APIs and improved documentation
5
6. Library Linked Data Cloud
And many others (see http://thedatahub.org/group/lld)
6
7. Library Linked Data Cloud
• Availability of Library Linked Data is already a
reality
• Many serious efforts: id.loc.gov, VIAF, DNB,
data.bnf.fr, Bibliographic Framework Initiative etc.
• Still several challenges and limitations preventing
LLD full potential
7
9. A (Library) Linked Data Life-cycle
SPECIFICATION • A series of steps or phases
MODELLING
• Based on Villazón-Terrazas et al.*
RDF
GENERATION • Others: LOD2, Datalift, etc. (see
http://www.w3.org/2011/gld/wiki/
GLD_Life_cycle)
LINKING
ENRICHMENT
PUBLICATION
*Methodological Guidelines forLuis. M. Vilches-Blázquez, Data,
Boris Villazón-Terrazas,
Publishing Government Linked
EXPLOITATION Oscar Corcho and Asunción Gómez-Pérez
Linking Goverment Data Book
9
10. A (Library) Linked Data Life-cycle
SPECIFICATION
Definition and analysis of
MODELLING source data and their format,
structure, etc.
RDF
GENERATION
LINKING
ENRICHMENT
PUBLICATION
EXPLOITATION
10
11. A (Library) Linked Data Life-cycle
SPECIFICATION
- Selection and reuse of
vocabularies to represent
MODELLING LD.
- Creation of new local
RDF terms and mapping to
GENERATION existing vocabularies
LINKING
ENRICHMENT
PUBLICATION
EXPLOITATION
11
12. A (Library) Linked Data Life-cycle
SPECIFICATION
Taking source data, and
vocabularies:
MODELLING
Mapping (cross-walk) source
data to produce RDF
RDF
GENERATION
LINKING
ENRICHMENT
PUBLICATION
EXPLOITATION
12
13. A (Library) Linked Data Life-cycle
SPECIFICATION
- Discover related
resources (ideally in RDF
MODELLING form).
- Enrich RDF data with
RDF data from other sources (e.g.
GENERATION substitute literals by URIs in
other datasets)
LINKING
ENRICHMENT
PUBLICATION
EXPLOITATION
13
14. A (Library) Linked Data Life-cycle
SPECIFICATION
- Setup the infrastructure to
expose your data to the Web
MODELLING (SPARQL, APIs, dumps).
- Enable discovery of your
RDF data (sitemap, voID)
GENERATION
- Include data provenance,
license, etc.
LINKING
ENRICHMENT
PUBLICATION
EXPLOITATION
14
15. A (Library) Linked Data Life-cycle
SPECIFICATION
- Produce user interfaces
that integrate your data (and
MODELLING other sources)
- Provide innovative
RDF services on top of the data
GENERATION - Integrate with existing
services (e.g. patron
LINKING services), etc.
ENRICHMENT
PUBLICATION
EXPLOITATION
15
17. Specification
SPECIFICATION • MANY source formats, encodings and schemas:
MODELLING
<XML>
RDF
GENERATION
CSV
LINKING PICA+
ENRICHMENT
MAB
PUBLICATION
EXPLOITATION
17
18. Specification
SPECIFICATION LIMITATIONS:
MODELLING
1. Lack of principled methods, techniques and tools to
deal with heterogeneous source formats, schemas
and encodings
RDF
GENERATION metastream, metamorph APIs from culturegraph
2. Need for analysis of the semantics of metadata
LINKING schemas and the variation of their usage accross
ENRICHMENT
libraries:
MARC 21: "Marc21 as Data: A start".
PUBLICATION
Karen Coyle's code4lib's article*
EXPLOITATION
* http://journal.code4lib.org/articles/5468
18
19. Modelling
SPECIFICATION • Many different models and approaches
British Library Data Model - Book
Publication Events Key
@prefix blt:
@prefix rdf:
<http://www.bl.uk/schemas/bibliographic/blterms#> .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
Series foaf:Agent
dcterms:Agent
event:Event
bibo:Series blt:PublicationEvent External
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . An Instance Link
@prefix owl: <http://www.w3.org/2002/07/owl#> . skos:prefLabel geo:SpatialThing
skos:notation
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . a rdfs:subClassOf
@prefix dct: <http://purl.org/dc/terms/> . a MARC country code
@prefix isbd: <http://iflastandards.info/ns/isbd/elements/> .
bibo:issn URI A Class A Literal
All properties with a range of
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
Series a Agent blt:PublicationEvent can be used
@prefix bibo: <http://purl.org/ontology/bibo/> .
BL URI foaf:focus BL URI with blt:PublicationStartEvent
@prefix rda: <http://rdvocab.info/ElementsGr2/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
Place and blt:PublicationEndEvent.
@prefix foaf: <http://xmlns.com/foaf/0.1/> . GeoNames URI
BL URI rdfs:subClassOf Arrows omitted for clarity. Author
@prefix event: <http://purl.org/NET/c4dm/event.owl#> .
@prefix org: <http://www.w3.org/ns/org#> . bio:Birth
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . event:agent blt:PublicationStartEvent
event:place
CalendarYear a a
dct:isPartOf
event:place bio:Death a
a
LCSH URI if http://r.d.g/id/year/ PublicationStartEvent rdfs:subClassOf
available xxxx BL URI bio:date
MODELLING
bio:date a
id.loc.gov URI for owl:sameAs blt:PublicationEndEvent Birth BL URI
dct:hasPart event:time
scheme
skos:inScheme
PublicationEvent Death BL URI
Topic LCSH BL URI blt:publicationStart a foaf:familyName
BL URI
a blt:publication PublicationEndEvent
blt:TopicLCSH
BL URI foaf:givenName
bio:event
dct:BibliographicResource bio:event
foaf:name
blt:publicationEnd
bibo:Book
Person-as-Concept or Person-as-Agent
rdfs:subClassOf a bibo:MultiVolumeBook BL URI
BL URI dct:subject a a foaf:Agent
dct:Agent
blt:PersonConcept dct:creator foaf:Person
skos:inScheme foaf:focus
blt:hasCreated owl:sameAs
a
rdfs:subClassOf id.loc.gov URI for
scheme rda:periodOfActivityOfThePerson
dct:contributor
dct:subject blt:hasContributedTo
Skos:Concept VIAF URI if available
Family-as-Concept
BL URI dct:subject dct:creator
rdfs:subClassOf Resource
a BL URI blt:hasCreated
Organization-as-Agent BL
skos:inScheme foaf:focus dct:subject URI
blt:FamilyConcept dct:contributor
rdfs:label
blt:hasContributedTo a
Family-as-Agent rdfs:label
id.loc.gov URI for BL URI [foaf:name]
scheme foaf:Agent
dct:Agent
rdfs:subClassOf foaf:Organization
dct:language
RDF
foaf:focus org:Organization
blt:bnb
Organization-as-Concept Lexvo URI bibo:isbn10 bibo:isbn13
blt:OrganizationConcept a
BL URI
dct:subject
dct:subject
skos:inScheme foaf:focus
dct:abstract
Identifiers
GENERATION
dct:tableOfContents dct:title
rdfs:subClassOf
MARC language dct:alternative
id.loc.gov URI for isbd:P1008
scheme code URI (edition statement)
dct:spatial
blt:TopicDDC isbd:P1073
a Dewey (note on language)
skos:notation skos:prefLabel
BL URI
skos:inScheme isbd:P1042
Title
dct:description
(content note)
Dewey Info URI for skos:broader
skos:notation isbd:P1053
scheme (extent)
bibo:numVolumes
Dewey Info URI Place-as-Concept
BL URI Place-as-Thing
foaf:focus BL URI
a owl:sameAs a
Miscellaneous literals
rdfs:subClassOf
Subject blt:PlaceConcept LCSH URI if geo:SpatialThing Assume that most instance data will have
an rdfs:label. These properties have been
Tim Hodson - tim.hodson@talis.com
Corine Deliot - Corine.Deliot@bl.uk
available dct:Location omitted for clarity. Alan Danskin - Alan.Danskin@bl.uk
Heather Rosie - Heather.Rosie@bl.uk
V.1.4 August 2012 Jan Ashton - Jan.Ashton@bl.uk
LINKING
ENRICHMENT
PUBLICATION
EXPLOITATION
19
20. Modelling
SPECIFICATION
LIMITATIONS:
1. Difficulties in using vocabularies that adapt to
MODELLING
• Past and current cataloguing practices
• Past and current library formats and schemas
RDF
GENERATION
2. Mapping and managing vocabularies
IFLA Namespaces Multilingual effort
1. Lack of mapping accross vocabularies
LINKING 2. High manual effort and costly process
ENRICHMENT
GND ontology,
Dunsire et al "Linked Data vocabulary management"*
PUBLICATION 3. Lack of multilingual vocabulary elements
description
EXPLOITATION
IFLA Namespace Multilingual efforts
* http://www.niso.org/publications/isq/2012/v24no2-3/ 20
21. RDF Generation
SPECIFICATION • Several tools, APIs and systems
MODELLING • Almost each new project follows its own
approach it is still a costly process
RDF
GENERATION
• Participation of library experts is crucial (those
who know the formats, e.g. MARC 21)
LINKING
ENRICHMENT
• There is not a generic mapping from source
metadata schemas (due to the differences in use
PUBLICATION
accross libraries, different cataloguing practices)
EXPLOITATION
21
22. RDF Generation
SPECIFICATION LIMITATIONS:
MODELLING
1. Lack of tools and services that are easy to use by
non-technical users (experts in library formats)
RDF 2. Lack of integrating tools and services that provide a
GENERATION
full view of the mapping and RDF generation process
• Analytics of the source data (e.g. how often is an
LINKING specific MARC subfield used, how many records
ENRICHMENT
will an specific transformation rule affect.. )
• Dashboard-like generation that helps to
understand how the RDF data has been produced
PUBLICATION
and allows to refine the mappings, find errors, etc.
EXPLOITATION
http://marimba4lib.com
22
23. Linking and enrichment
SPECIFICATION
• Already a high number of links at the authority
level (mostly Persons and Corporate Bodies)
MODELLING
• VIAF
• Culturegraph Authorities
RDF
GENERATION
• DNB, BNF, BNE, BL
LINKING • Very positive efforts:
ENRICHMENT
• National library cataloguers adding links
during the cataloguing process (e.g. DNB,
PUBLICATION BNE)
• Cross-library collaboration for linking (British
EXPLOITATION Library and DNB*)
* http://www.culturegraph.org/Subsites/culturegraph/DE/Home/news4.html
23
24. Linking and enrichment
SPECIFICATION LIMITATIONS:
MODELLING
1. Very limited linking at the bibliographic level
2. Lack of cross-lingual mechanisms to link resources
RDF accross libraries
GENERATION
3. Lack of the a solid infrastructure to enable links
LINKING sharing and exchange accross libraries
ENRICHMENT
4. No semantic enrichment of the content (textual,
PUBLICATION sound and visual) and linkage of this content to the
URIs that represent the real-world entities
EXPLOITATION
24
25. Publication
SPECIFICATION LIMITATIONS:
MODELLING
1. No extensive use of mechanisms to indicate
provenance, license, last-update, etc. in a per-
resource basis
RDF
GENERATION
2. Low usage of mechanisms to enable efficient
discovery and usage like voID descriptions
LINKING
ENRICHMENT
3. Need for scalable and generic infrastructure to
facilitate consumption of linked data:
PUBLICATION
1. Most of the APIs are not extensively documented
2. Not every dataset provides search over the data
3. W3C Linked Data plattform WG proposition is
EXPLOITATION
promising (REST access to resources and
containers: paging, etc.)
25
26. Exploitation
SPECIFICATION LIMITATIONS:
MODELLING
1. Lack of integration of library linked data in
• Library curation and cataloguing workflows
• Existing library systems (e.g. digital library
RDF systems, patron services)
GENERATION
2. Need of end-user interfaces providing enriched
LINKING information spaces
ENRICHMENT
• that integrate several LD sources,
• allowing for serindipitious discovery and enhanced
PUBLICATION navigation
EXPLOITATION CultureSampo,
http://uilld2013.linkeddata.es Call for participation!
26
28. Conclusions
• There still exist some barriers:
• Organizational: Infrastructure costs, integration of linked
data into library processes, cross-library collaboration
• Language: Multilingual services, cross-lingual linking and
data integration
• Data formats and modalities: Different digital objects and
representations are rarely interlinked, cope with
heterogeneity of formats and vocabularies
• But we are on the right path!
28