This document discusses lessons learned from using linked open data in applications. It describes converting metadata from a language resource catalogue into RDF triples, including resolving complex instances like people and organizations. Issues addressed include data enrichment, linking to external datasets, and making implicit relations explicit. The goals of displaying comprehensive data to users and aggregating external data sensitively are discussed.
1. Publishing and Consuming Linked Data.
(Lessons learnt when using LOD in an application)
Marta Villegas
Universitat Pompeu Fabra
Cercedillas, June 2015
4. IULA-UPF moving to LOD
Ojectives:
- Displaying data to the user in a comprehensive way
- Aggregating external data in a sensitive manner
- Making hidden implicit relations explicit.
Triple store (Virtuoso) http://lodserver.iula.upf.edu
Sparql server (Virtuoso) http://lodserver.iula.upf.edu/sparql
Web Browser (RoR + SPARQL) http://lod.iula.upf.edu/
RDF
5. When the focus shifts from growing the cloud to
deploying applications
• Complex types (identity resolution)
• Simple types (as instances)
• Linking data (linking vs. reusing)
• Data enrichment
• Approach: incremental process first bunch and curation
process
RDFying – index
10. <fundingProject>
<projectName> Platform for Automatic, Normalised
Annotation and Cost-Effective Acquisition of
Language Resources for Human Languages
Technologies </projectName>
<projectShortName> PANACEA </projectShortName>
<url> http://panacea-lr.eu/ </url>
<fundingType> euFunds </fundingType>
<funder> European Union </funder>
</fundingProject>
<organizationInfo>
<organizationName> Consiglio Nazionale delle
Ricerche. Istituto di Linguistica Computazionale
“Antonio Zampolli” </organizationName>
<organizationShortName>CNR</organizationShortName>
…
11. For each embeded Project/Person/Organization/
1. Generate: Subject property URI triple for the
backwards relation.
– If Person then use “name_givenName”
– If “short name” exists use “shortname”
– Else use 20 first characters of “long name”
2. Generate URI property object triples as the result of
the union of all local declarations (where union
removes duplicate triples).
– This requires a final curation task that agrees on node values
in case they are different.
– The preliminary version needs further curation (we used
SPARQL select distinct to identify oddities)
12. RDFying Documents:
- DBLP to get full RDF descriptions
- Google Scholar to get BibTex descriptions
- For a small dataset this can be assumed. For big
datasets this needs a lot of work (some automatic
tasks may be defined)
<document>Quochi V, Frontini F, Rubino F. A MWE
Acquisition and Lexicon Builder Web Service. COLING 2012,
24th International Conference on Computational
Linguistics, Proceedings of the Conference: Technical
Papers,8-15 December 2012, Mumbai, India</document>
13. RDFying - Where to stop?
BIBTEX:
@inproceedings {quochi2012mwe,
title={A MWE Acquisition and Lexicon Builder Web Service.},
author={Quochi, Valeria & Frontini, Francesca & Rubino, Francesco},
booktitle={COLING},
year={2012}}
DBLP
<http://dblp.uni-trier.de/rec/conf/coling/QuochiFR12 >
owl:sameAs <http://dblp.org/rec/conf/coling/QuochiFR12> ;
dblp:title “A MWE Acquisition and Lexicon Builder Web Service”;
dblp:authoredBy <http://dblp.uni-trier.de/pers/q/Quochi:Valeria>;
dblp:authoredBy <http://dblp.uni-trier.de/pers/f/Frontini:Francesca>;
dblp:authoredBy <http://dblp.uni-trier.de/pers/r/Rubino:Francesco >;
dblp:publishedAsPartOf <http://dblp.uni-trier.de/rec/conf/coling/2012 >:
dblp:yearOfPublication “2012” .
14. Article
title
creator Mikel Forcada
subject discourse analysis, question answering
keywords NER, LMF, ...
references FreeLing, TreeBank, PANACEA ...
language English
RDFying- simple types
16. RDFying - simple types as instances
Value Value
counter
Resource
counter
eng 518 476
en 215 174
EN 120 120
Spa 390 376
es 77 71
ES 10 10
Language codes in MS central node
17. Enumerations:
object property + Class + instances +
checking existing vocabularies
‘free strings’:
1) generate data type property + string value.
2) curation process that:
a) identifies ‘enumeration like’ candidates (eg.
language) and choose an appropriate Vocabulary
b) Match value strings to relevant URIS (Dbpedia)
RDFying - simple types as instances
18. SELECT DISTINCT ?language
WHERE { ?s ms:languageId ?language }
(eng , en , EN …)
INSERT { ?s ms:language <http://.../English>.}
WHERE { ?s ms:language “EN". }
DELETE { ?s ms:language “EN". }
Curation using SPARQL
RDFying - simple types as instances
20. Linking data !! – linking vs reusing
documentation sameAs
documentation
21. Linking data !! – linking vs reusing
http://lod.iula.upf.edu/resources/PAN_metad
ata_MW_ENV_IT
http://lod.iula.upf.edu/resources/doc_37
local
URIs
external
URIs
Core concepts which belong to
some ‘local’ Class.
Instances which belong to some
‘external’ Class:
• Person (FOAF)
• Document (BIBO)
• Organisation (FOAF)
•….
But, some functional reasons:
22.
23.
24. Why all this ? Is it worth it?
- Displaying data to the user in a comprehensive way
- Aggregating external data in a sensitive manner
- Making hidden implicit relations explicit.
29. P E R S O N
ID
name
description
...
A N I S AT I O N
ID
name
description
...
R E S O U R C E
ID
name
description
...
L I C E N S E
ID
name
description
...
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
D O C U M E N T
ID
name
description
...
P E R S O N
ID
name
description
...
P R O J E C T
ID
name
description
...
SELECT * FROM WHERE { … ...}HELP!!
Everything about IULA?
HELP!!
36. • LOD opens new possibilities and SPARQL is a powerful tool
BUT
• Curation task is crucial and effort/time consuming. You can
address it as an incremental process.
Publishing LOD vs. deploying LOD applications
• Until now, the LOD community seems to focus on “growing
the cloud”
• In this scenario, creating new URIs and mapping to existing
URIs is OK but,
• when the focus shifts from growing the cloud to developing
applications, new problems will arise: massive redundancy of
URIs, trust on third party servers/data, …
Conclussions & reflections