A presentation by Daniel Vila Suero of the Ontology Engineering Group at the Universidad Politecnica de Madrid.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
1. datos.bne.es:
Publishing and
consuming
Daniel Vila Suero
dvila@fi.upm.es
Ontology Engineering Group, Universidad Politécnica de Madrid
Acknowledgements: OEG Members, BNE staff (Elena
Escolano, Marina Jimenez Piano, Ana Manchado, Mar
HernándezAgustí, Ricardo Santos and others)
3. Background
datos.bne.es
• Initiative from BibliotecaNacional de
Españatogether with OEG-UPM Madrid.
• Multidisciplinary effort: Librarians, Computer
scientists, linguists..
• Close collaboration between library experts and
computer scientists.
• Initiated as a small scale proof-of-concept: the
"Cervantes dataset" using IFLA vocabularies
(FRBR, ISBD) and others (MADS, DC, RDA..)
3
4. Main goals
datos.bne.es
• Perform the transformation incrementally and
iteratively
• Develop a system where library experts can define
and assess the mappings to RDF independently
from the IT people
• Be vocabulary agnostic (BNE uses FRBR as core
model, but the system would allow them to use RDA
for example)
• Have a clear picture of the source data before you
start to transform (help to detect possible deficiencies
in the source data)
4
5. Source MARC records
datos.bne.es
AUTHORITY BIBLIOGRAPHIC
Persons
76576 Maps
Corporate bodies 320727 Sound recordings
Conferences 166017 Gravings, drawings, pictures
Titles 35770 Manuscripts
Subject 143959 Ancient books
2696560 Modern books
178473 Scores
3021 Electronic resources
156634 Serials
96672 Videos
5
6. Some figures
datos.bne.es
• Total number of authority records: 4.100.000
• Total number of bibliographical records: 2.390.140
• Total number of RDF triples: 58.053.215
• Number of links: (15% authorities): 587.520
• Linked sources:
• VIAF
• SUDOC (French Collective University Catalogue) FR
• GND (German National Library Authorities) GER
• LIBRIS Sweden
• DBPedia
• Soon BNF, BNB, German Bibliographie
6
7. Some statistics
datos.bne.es
282,879
497,644
Manifestation
2,390,103
Work
1,114,719
Person
Expression
1,163,764
Thema
1,969,526
Corporate Body
7
11. Transformation process
Publishing
• How to facilitate the mapping process to library
experts?
1. Use a familiar and intuitive interface: Spreadsheets
2. Work only on what's in the database: Pre-process
records to build the spreadsheets
• 3 step-process 3 different spreadsheets
1. Classification: is it a Person? a Work? a Manifestation?
2. Annotation: name, birth date, title, language of expression
3. Relation: find relationships between entities (Person is
creator of a certain work)
11
16. Still a lot of work to do
Publishing
• We cover only core relations of FRBR
• There are a significant amount of
manifestationsnot linked to their expressions
currently looking at more sophisticated clustering
techniques
• Manifestations are not linked to their corresponding
digitalized materials at the digital library (Biblioteca
Digital Hispánica) Next version (to be published
this year) will contain these links
• Classification step can be further automatized 16
18. Perspectives
Consuming
• 2 different perspectives:
- Systems and applications:
• SPARQL endpoint,
• Linked Data API
- End-user interfaces
• + an interesting side-effect:
- By applying FRBR and RDF mappings we can (and did)
improve the catalogue
• Using standard web technologies and more intuitive
models we open the door to:
- Data analytics and cleansing, catalogue enrichment, reuse
by smaller institutions… 18
19. Graph analysis example
Consuming
http://bne.linkeddata.es/graphvis
Using Open-source tools:
Gephi for example
19
20. Enabling access to systems and apps
Consuming
Linked Data API: http://datos.bne.es/frontend/persons
20
21. Flexible access to data
Consuming Out of the box:
•Search by every field
•Access cluster of resources
•Filtering
•Paging
•Serve multiple formats: XML,
Turtle, JSON
21
23. END-user interfaces
Consuming
Current linked data opens the door to:
•Re-rank OPAC results
•Better clustering of results
•Recommendation
•Enhance data from other sources
23