This document discusses the National Széchényi Library's process of publishing its data as linked open data. It began by developing SRU and SKOS interfaces, then realized it had the components needed for linked data - SKOS thesauri, URL-based record access via LibriUrl, and SRU search of records. It focused on developing cool URIs, identifiers, content negotiation, the RDFDC vocabulary, and an RDF database. XSLT was used to convert MARCXML to RDFDC, and a FOAF file was generated from authority records. The OPAC was modified to support HTML link auto-discovery to the RDF. The library's data is now available as linked open data via S
Linked Data at the National Széchényi Library : road to the publication
1. Linked Data at the NationalLinked Data at the National
Széchényi Library : road to theSzéchényi Library : road to the
publicationpublication
SWIB10 : SEMANTIC WEB IN BIBLIOTHEKENSWIB10 : SEMANTIC WEB IN BIBLIOTHEKEN
Cologne, 29–30 November, 2010Cologne, 29–30 November, 2010
ÁdámÁdám HorváthHorváth
National Széchényi LibraryNational Széchényi Library
2. 2 Linked Data at the National Széchényi
ContentsContents
Why I am hereWhy I am here
Background information on NSZLBackground information on NSZL
Road to the publicationRoad to the publication
Current developments and future plansCurrent developments and future plans
3. 3 Linked Data at the National Széchényi
The newsThe news
The National Széchényi Library (NSZL) hasThe National Széchényi Library (NSZL) has
recently published its entire OPAC andrecently published its entire OPAC and
Digital Library and the correspondingDigital Library and the corresponding
authority data as Linked Open Data.authority data as Linked Open Data.
(2010.04.20)(2010.04.20)
4. 4 Linked Data at the National Széchényi
The newsThe news
The used vocabularies areThe used vocabularies are
– RDFDC for bibliographic data,RDFDC for bibliographic data,
– FOAF for nameFOAF for namess, and, and
– SKOS for subject terms and geographicalSKOS for subject terms and geographical
namesnames
5. 5 Linked Data at the National Széchényi
The newsThe news
NSZL uses CoolURIsNSZL uses CoolURIs
Every resource has both RDF and HTMLEvery resource has both RDF and HTML
representationrepresentation
Our RDFDC, FAOF and SKOS statements areOur RDFDC, FAOF and SKOS statements are
linked togetherlinked together
Our name authority is matched with the DBPediaOur name authority is matched with the DBPedia
name filesname files
URI aliases are handled as owl:sameAsURI aliases are handled as owl:sameAs
statementsstatements
NSZL also supports the HTML link auto-discoveryNSZL also supports the HTML link auto-discovery
6. 6 Linked Data at the National Széchényi
Information infrastructure of NSZLInformation infrastructure of NSZL
Integrated library systemIntegrated library system
– AmicusAmicus
• Consortium systemConsortium system
• ViewsViews
• Oracle basedOracle based
• Authority handlingAuthority handling
• Do not handle all thesaurus relation typesDo not handle all thesaurus relation types
• Products moduleProducts module
• Z39.50 serverZ39.50 server
7. 7 Linked Data at the National Széchényi
Information infrastructure of NSZLInformation infrastructure of NSZL
OPACOPAC
– LibriVisionLibriVision
• HTMLHTML
• XML and XSLT basedXML and XSLT based
• Z39.50 clientZ39.50 client
• Session basedSession based
8. 8 Linked Data at the National Széchényi
Information infrastructure of NSZLInformation infrastructure of NSZL
Thesaurus handlingThesaurus handling
– RelexRelex
• Contains general terms and geographical namesContains general terms and geographical names
• It uses all possible relation type available in ISO2788It uses all possible relation type available in ISO2788
• It also contains UDC equivalents and coordinatesIt also contains UDC equivalents and coordinates
• Relex can produce MARC outputRelex can produce MARC output
• Relex uses the descriptors themselves as identifiersRelex uses the descriptors themselves as identifiers
• Subject terms in Amicus is based on RelexSubject terms in Amicus is based on Relex
9. 9 Linked Data at the National Széchényi
Information infrastructure of NSZLInformation infrastructure of NSZL
MARC is HUNMARCMARC is HUNMARC
– MARC21 basedMARC21 based
– No punctuation marc in recordsNo punctuation marc in records
– More subfieldsMore subfields
– Punctuation is program generatedPunctuation is program generated
– MARC21 tools and utilities can’t be usedMARC21 tools and utilities can’t be used
10. 10 Linked Data at the National
Information infrastructure of NSZLInformation infrastructure of NSZL
The IT department containsThe IT department contains
– ProgrammersProgrammers
– System librariansSystem librarians
– Maintenance stuffMaintenance stuff
11. 11 Linked Data at the National
Information infrastructure of NSZLInformation infrastructure of NSZL
The IT department is responsible forThe IT department is responsible for
– Integrated library systemIntegrated library system
– Developing digital libraryDeveloping digital library
– Developing other utilitiesDeveloping other utilities
– Maintaining the whole IT infrastructureMaintaining the whole IT infrastructure
Not responsible forNot responsible for
– DigitisationDigitisation
– HomepageHomepage
12. 12 Linked Data at the National
Road to the publicationRoad to the publication
13. 13 Linked Data at the National
Motivation of our semantic webMotivation of our semantic web
developmentsdevelopments
Personal interestPersonal interest
Interested colleaguesInterested colleagues
– Kornél Horváth (horvath.kornel@oszk.hu)Kornél Horváth (horvath.kornel@oszk.hu)
– Zsolt Zachár (zachar.zsolt@oszk.hu)Zsolt Zachár (zachar.zsolt@oszk.hu)
Semantic web is a cool thingSemantic web is a cool thing
My friends are also publishing their dataMy friends are also publishing their data
Our role is to provide dataOur role is to provide data
14. 14 Linked Data at the National
Carrying out the developmentCarrying out the development
There were no specific projectThere were no specific project
Small developments pointing to the sameSmall developments pointing to the same
directiondirection
We developed it when time permittedWe developed it when time permitted
15. 15 Linked Data at the National
The very first stepsThe very first steps
ONE2 projectONE2 project
– Interoperability projectInteroperability project
– Z39.50Z39.50
– SRU emerged by the end of the projectSRU emerged by the end of the project
– Semantic web was also mentionedSemantic web was also mentioned
16. 16 Linked Data at the National
SRU interface for AmicusSRU interface for Amicus
TEL-ME-MOR projectTEL-ME-MOR project
– To make NSZL searchable on the TEL portal viaTo make NSZL searchable on the TEL portal via
SRUSRU
YAZ Proxy was usedYAZ Proxy was used
– SRU/Z39.50 gatewaySRU/Z39.50 gateway
– User defined XSLTUser defined XSLT
– Result set is according to the TEL ApplicationResult set is according to the TEL Application
profileprofile
17. 17 Linked Data at the National
SRU interface for AmicusSRU interface for Amicus
The important resultsThe important results
– URL based searchURL based search
– XML result setXML result set
• TTELAPELAP
• MARCXMLMARCXML
– RDFDC is provided via YAZ ProxyRDFDC is provided via YAZ Proxy
18. 18 Linked Data at the National
Development of LibriUrlDevelopment of LibriUrl
The requirement was toThe requirement was to
– make LibriVision OpenUrl compatiblemake LibriVision OpenUrl compatible
– provide access to OPAC records via URLprovide access to OPAC records via URL
The problem was that LibriVision isThe problem was that LibriVision is
– Session based and requires loginSession based and requires login
LibriUrlLibriUrl
– URL based search interface for the OPACURL based search interface for the OPAC
– DDeveloped by NSZL on the bases of a vendoreveloped by NSZL on the bases of a vendor
software codesoftware code
19. 19 Linked Data at the National
Development of LibriUrlDevelopment of LibriUrl
LibriUrl search:LibriUrl search:
http://link.oszk.hu/libriurl.php?http://link.oszk.hu/libriurl.php?
LN=en&DB=any&SRY=an&SRE=2616972LN=en&DB=any&SRY=an&SRE=2616972
LibriUrl side effectsLibriUrl side effects
– Search for Amicus number is a link to a specificSearch for Amicus number is a link to a specific
recordrecord
– Our records became bookmarkable and linkableOur records became bookmarkable and linkable
and OpenSearchableand OpenSearchable
20. 20 Linked Data at the National
How does it work?How does it work?
21. 21 Linked Data at the National
How does it work?How does it work?
24. 24 Linked Data at the National
DemonstrationDemonstration
25. 25 Linked Data at the National
DemonstrationDemonstration
26. 26 Linked Data at the National
Development of LibriUrlDevelopment of LibriUrl
The importance of LibriUrlThe importance of LibriUrl
– It was behind of our CoolURIIt was behind of our CoolURI
http://link.oszk.hu/libriurl.php?http://link.oszk.hu/libriurl.php?
LN=en&DB=any&SRY=an&SRE=2616972LN=en&DB=any&SRY=an&SRE=2616972
27. 27 Linked Data at the National
SKOSifying the thesaurusSKOSifying the thesaurus
TelPlus projectTelPlus project
– Thesaurus in SKOS for search refinementThesaurus in SKOS for search refinement
SKOS conversionSKOS conversion
– SKOS is converted from the MARC output ofSKOS is converted from the MARC output of
RelexRelex
– Not every thesaurus relation type is convertedNot every thesaurus relation type is converted
– UDC and coordinates are also not includedUDC and coordinates are also not included
28. 28 Linked Data at the National
SKOSifying the thesaurusSKOSifying the thesaurus
Concept identifier is an IRI compatibleConcept identifier is an IRI compatible
descriptor:descriptor:
– 150 ** a150 ** a abszurd drámaabszurd dráma
– <skos:Concept<skos:Concept
rdf:about="http://nektar.oszk.hu/auth/rdf:about="http://nektar.oszk.hu/auth/
abszurd_drámaabszurd_dráma">">
29. 29 Linked Data at the National
SKOSifying the thesaurusSKOSifying the thesaurus
Serving SKOS to TelPlusServing SKOS to TelPlus
– The SKOS XML file is indexed by Zebra andThe SKOS XML file is indexed by Zebra and
served via SRUserved via SRU
– http://193.6.201.195:9996/skos?http://193.6.201.195:9996/skos?
version=1.1&operation=searchRetrieve&query=version=1.1&operation=searchRetrieve&query=
%22Gravenhage%22Gravenhage
%22&startRecord=1&maximumRecords=10%22&startRecord=1&maximumRecords=10
– It is not RDF/XMLIt is not RDF/XML
30.
31. 31 Linked Data at the National
SKOSifying the thesaurusSKOSifying the thesaurus
Importance of SKOS developmentImportance of SKOS development
– Having a conversion tool for creating SKOSHaving a conversion tool for creating SKOS
recordsrecords
32. 32 Linked Data at the National
LIBRIS as an exampleLIBRIS as an example
ELAG, 2008, WageningenELAG, 2008, Wageningen
– LIBRIS’ linked open data was presentedLIBRIS’ linked open data was presented
– Content negotiationContent negotiation
– Cool URICool URI
– Link relLink rel
33. 33 Linked Data at the National
Focusing to the publicationFocusing to the publication
Realised that we have almost everything toRealised that we have almost everything to
be able to publish our data as LODbe able to publish our data as LOD
We hadWe had
– SKOSSKOS
– LibriUrl (accessing OPAC records via URL)LibriUrl (accessing OPAC records via URL)
– YAZ Proxy - SRU (URL based search inYAZ Proxy - SRU (URL based search in
Amicus)Amicus)
– LIBRIS as an exampleLIBRIS as an example
34. 34 Linked Data at the National
Focusing to the publicationFocusing to the publication
What were missingWhat were missing
– Name convention of resourcesName convention of resources
– IdentifiersIdentifiers
– Content negotiationsContent negotiations
– RDFDCRDFDC
– RDF databaseRDF database
– FOAFFOAF
– Link rel metatags in the OPAC headerLink rel metatags in the OPAC header
– Creating linksCreating links
35. 35 Linked Data at the National
Name conventionName convention
Resource name for documentsResource name for documents
– /resource/manifestation/2645471/resource/manifestation/2645471
Name for RDF representationName for RDF representation
– /data/manifestation/2645471/data/manifestation/2645471
Name for HTML representationName for HTML representation
– /hu/manifestation/2645471/hu/manifestation/2645471
– /en/manifestation/2645471/en/manifestation/2645471
36. 36 Linked Data at the National
Name conventionName convention
Resource name for authorityResource name for authority
– /resource/auth/33589/resource/auth/33589
Name for RDF representationName for RDF representation
– /data/auth/33589/data/auth/33589
Name for HTML representationName for HTML representation
– /auth/33589/auth/33589
37. 37 Linked Data at the National
IdentifiersIdentifiers
DocumentsDocuments
• Amicus number (MARC 001)Amicus number (MARC 001)
Subject authority (thesaurus)Subject authority (thesaurus)
• The descriptor itself with some conversion rulesThe descriptor itself with some conversion rules
NamesNames
• Special number stored in the Amicus databaseSpecial number stored in the Amicus database
38. 38 Linked Data at the National
Content negotiationContent negotiation
Implementation of content negotiationImplementation of content negotiation
– 303 redirection was chosen303 redirection was chosen
41. 41 Linked Data at the National
Creating RDF for catalogue recordsCreating RDF for catalogue records
Creating „RDFDC”Creating „RDFDC”
XSLT does the jobXSLT does the job
– It is MARCXML RDF/XML conversionIt is MARCXML RDF/XML conversion
• Modification of the MARC to TEL ApplicationModification of the MARC to TEL Application
profile conversionprofile conversion
– Creates links to subjects and namesCreates links to subjects and names
Used vocabulariesUsed vocabularies
– Dublin CoreDublin Core
– BIBOBIBO
42. 42 Linked Data at the National
Installation of RDF databaseInstallation of RDF database
JenaJena
Joseki SPARQL endpointJoseki SPARQL endpoint
43. 43 Linked Data at the National
Creating FOAF for namesCreating FOAF for names
Batch processBatch process
– The name index of Amicus is usedThe name index of Amicus is used
FOAF is stored in and served from JenaFOAF is stored in and served from Jena
During update always the entire FOAFDuring update always the entire FOAF
dataset is rebuiltdataset is rebuilt
44. 44 Linked Data at the National
Contents of JenaContents of Jena
Names (FOAF)Names (FOAF)
Subject authority (SKOS)Subject authority (SKOS)
– It is still available from Zebra via SRUIt is still available from Zebra via SRU
Catalogue records (RDFDC)Catalogue records (RDFDC)
All of our linked data can be searchedAll of our linked data can be searched
via SPARQLvia SPARQL
45. 45 Linked Data at the National
Creating HTML link auto-discoveryCreating HTML link auto-discovery
In the head of our OPAC extendedIn the head of our OPAC extended
view pagesview pages
<link rel="meta"<link rel="meta"
type="application/rdf+xml"type="application/rdf+xml"
title="RDF Version"title="RDF Version"
href="http://nektar.oszk.hu/data/manifestathref="http://nektar.oszk.hu/data/manifestat
ion/2645471" />ion/2645471" />
46. 46 Linked Data at the National
Creating linksCreating links
Links to NSZL resourcesLinks to NSZL resources
– Link from RDFDC to nameLink from RDFDC to namess andand
subjectsubjectss
<dcterms:creator<dcterms:creator
rdf:resource="http://nektar.oszk.hu/resource/rdf:resource="http://nektar.oszk.hu/resource/
auth/33589"/>auth/33589"/>
<dc:creator>Jókai Mór (1825-1904)</dc:creator><dc:creator>Jókai Mór (1825-1904)</dc:creator>
47. 47 Linked Data at the National
Creating linksCreating links
Links to external resourcesLinks to external resources
– Link from the name authority records to DBpediaLink from the name authority records to DBpedia
<foaf:Person<foaf:Person
rdf:about="http://nektar.oszk.hu/resource/auth/33589">rdf:about="http://nektar.oszk.hu/resource/auth/33589">
<foaf:name>Jókai Mór (1825-1904)</foaf:name><foaf:name>Jókai Mór (1825-1904)</foaf:name>
<<owl:sameAsowl:sameAs rdf:resource="http://rdf:resource="http://dbpedia.orgdbpedia.org/resource/M/resource/M
%C3%B3r_J%C3%B3kai"/>%C3%B3r_J%C3%B3kai"/>
48. 48 Linked Data at the National
Resolving the URL of documentsResolving the URL of documents
The URL of the documentThe URL of the document
– /resource/manifestation/2645471/resource/manifestation/2645471
RDF is requestedRDF is requested
– Redirection to /data/manifestation/2645471Redirection to /data/manifestation/2645471
– PHP program gathersPHP program gathers
• MARCXMLMARCXML
• Name idsName ids
– XSLT creates RDF/XML with linksXSLT creates RDF/XML with links
49. 49 Linked Data at the National
Resolving the URL of documentsResolving the URL of documents
Link generation to SKOSLink generation to SKOS
– Automatic conversionAutomatic conversion
• from the literalfrom the literal
– <dcterms:subject>abszurd dráma<dcterms:subject>abszurd dráma
</dcterms:subject></dcterms:subject>
• to the Conceptto the Concept
– <dcterms:subject<dcterms:subject
rdf:resource="http://nektar.oszk.hu/resource/auth/abszurdrdf:resource="http://nektar.oszk.hu/resource/auth/abszurd
_dráma"/>_dráma"/>
50. 50 Linked Data at the National
Resolving the URL of documentsResolving the URL of documents
Link generation to FOAFLink generation to FOAF
– The XSLT gets the name ids as parameters andThe XSLT gets the name ids as parameters and
creates the linkscreates the links
<dc:creator>Jókai Mór (1825-1904)</dc:creator><dc:creator>Jókai Mór (1825-1904)</dc:creator>
<dcterms:creator<dcterms:creator
rdf:resource="http://nektar.oszk.hu/resource/authrdf:resource="http://nektar.oszk.hu/resource/auth
/33589"/>/33589"/>
51.
52.
53. 53 Linked Data at the National
Resolving the URL of documentsResolving the URL of documents
Sample URL of a documentSample URL of a document
/resource/manifestation/2645471/resource/manifestation/2645471
HTML is requestedHTML is requested
– Redirection toRedirection to
/hu/manifestation/2645471 or/hu/manifestation/2645471 or
/en/manifestation/2645471/en/manifestation/2645471
– Nektar proxy calledNektar proxy called
54. 54 Linked Data at the National
Resolving the URL of documentsResolving the URL of documents
Nektar proxyNektar proxy
– Java servlet (substitution of LibriUrl)Java servlet (substitution of LibriUrl)
– It doesIt does
• Loging into LibriVision and getting session idLoging into LibriVision and getting session id
• Searching for the manifestation id and gettingSearching for the manifestation id and getting
query id, etcquery id, etc
• Sending present request and getting the searchSending present request and getting the search
result pageresult page
• Creating the final HTML pageCreating the final HTML page
– Replacing URL-s, filling in the header, etcReplacing URL-s, filling in the header, etc
55.
56.
57. 57 Linked Data at the National
Resolving the URL of authority dataResolving the URL of authority data
Sample URL of an authority recordSample URL of an authority record
/resource/auth/33589/resource/auth/33589
RDF is requestedRDF is requested
– Jena is searched via the Joseki SPARQLJena is searched via the Joseki SPARQL
endpointendpoint
– RDF/XML is given backRDF/XML is given back
58.
59.
60. 60 Linked Data at the National
Resolving the URL of authority dataResolving the URL of authority data
Sample URL of an authority recordSample URL of an authority record
/resource/auth/33589/resource/auth/33589
HTML is requestedHTML is requested
– Jena is searched via the Joseki SPARQLJena is searched via the Joseki SPARQL
endpointendpoint
– RDF/XML is given backRDF/XML is given back
– XSLT creates the HTML pageXSLT creates the HTML page
61.
62.
63. 63 Linked Data at the National
Current activitiesCurrent activities
Improving RDFDCImproving RDFDC
– Making RDFDC 100% compliant with theMaking RDFDC 100% compliant with the
DCMI Metadata TermsDCMI Metadata Terms
• Proper usage of dc:creator and dcterms:creatorProper usage of dc:creator and dcterms:creator
• Better conversion of title and publisherBetter conversion of title and publisher
64. 64 Linked Data at the National
Current activitiesCurrent activities
Improving SKOSImproving SKOS
– Inserting UDC equivalent of ConceptsInserting UDC equivalent of Concepts
– Inserting USE AND and other relationshipsInserting USE AND and other relationships
– Inserting coordinates of geographicalInserting coordinates of geographical
namesnames
65. 65 Linked Data at the National
Current activitiesCurrent activities
Improving FOAFImproving FOAF
– Including organisationsIncluding organisations
66. 66 Linked Data at the National
Current activitiesCurrent activities
VIAFVIAF
– Creating links from our FOAF to VIAFCreating links from our FOAF to VIAF
Studying VirtuosoStudying Virtuoso
– Jena might be changed to VirtuosoJena might be changed to Virtuoso
FRBRising the OPACFRBRising the OPAC
67. 67 Linked Data at the National
FRBRising the OPACFRBRising the OPAC
MethodMethod
– Collecting works, expressions andCollecting works, expressions and
manifestations from the „MARC dump” ofmanifestations from the „MARC dump” of
our catalogueour catalogue
– Creating RDF representations based on IanCreating RDF representations based on Ian
Davis: Expression of Core FRBR ConceptsDavis: Expression of Core FRBR Concepts
in RDFin RDF
– Loading the RDF representations into JenaLoading the RDF representations into Jena
– Showing the work tree based onShowing the work tree based on
manifestation idmanifestation id
68. 68 Linked Data at the National
FRBRising the OPACFRBRising the OPAC
Scope (at first)Scope (at first)
– Printed monographsPrinted monographs
• Excluding multivolumeExcluding multivolume
– This is a „post” FRBRisationThis is a „post” FRBRisation
UsageUsage
– OPAC/Extended view/Link to the work treeOPAC/Extended view/Link to the work tree
First resultsFirst results
69.
70.
71. 71 Linked Data at the National
Current activitiesCurrent activities
Digital Library: OSZKDKDigital Library: OSZKDK
– Mainly contains deposit copiesMainly contains deposit copies
– It is also published as LODIt is also published as LOD
OAI-ORE is being implemented as aOAI-ORE is being implemented as a
container of structural metadatacontainer of structural metadata
– Used parts: resource map, aggregationUsed parts: resource map, aggregation
– Used vocabularies: DC, BIBOUsed vocabularies: DC, BIBO
– It is in test phase nowIt is in test phase now
72.
73. 73 Linked Data at the National
Planned discovery of OAI-OREPlanned discovery of OAI-ORE
„„Link rel” in the head of the OPAC ofLink rel” in the head of the OPAC of
OSZKDKOSZKDK
– <link rel="meta"<link rel="meta"
type="application/rdf+xml"type="application/rdf+xml"
title=„OAI-ORE resource map"title=„OAI-ORE resource map"
href="http://oszkdk.oszk.hu/rem/drj/869" />href="http://oszkdk.oszk.hu/rem/drj/869" />
74. 74 Linked Data at the National
One resource map of test OSZKDKOne resource map of test OSZKDK
<rdf:Description<rdf:Description
rdf:about="rdf:about="http://oszkdk.oszk.hu/rem/drj/876http://oszkdk.oszk.hu/rem/drj/876">">
<<rdf:typerdf:type
rdf:resource="http://www.openarchives.org/ore/termsrdf:resource="http://www.openarchives.org/ore/terms
//ResourceMapResourceMap"/>"/>
<<dcterms:modifieddcterms:modified
rdf:datatype="http://www.w3.org/2001/XMLSchema#drdf:datatype="http://www.w3.org/2001/XMLSchema#d
ate">2010-04-07</dcterms:modified>ate">2010-04-07</dcterms:modified>
<<dcterms:creatordcterms:creator
rdf:resource="http://oszkdk.oszk.hu/"/>rdf:resource="http://oszkdk.oszk.hu/"/>
<<ore:describesore:describes
rdf:resource="rdf:resource="http://oszkdk.oszk.hu/aggr/drj/876http://oszkdk.oszk.hu/aggr/drj/876"/>"/>
</rdf:Description></rdf:Description>