Publishing Linked Data Lessons for Applications

Publishing and Consuming Linked Data.
(Lessons learnt when using LOD in an application)
Marta Villegas
Universitat Pompeu Fabra
Cercedillas, June 2015

OLAC
Language
Resource
Catalogue
OAI-PMH
SERVER
Dublin Core Metashare OLAC
Metadata
Formats
.....METADATA HARVESTING....

IULA-UPF moving to LOD
Ojectives:
- Displaying data to the user in a comprehensive way
- Aggregating external data in a sensitive manner
- Making hidden implicit relations explicit.
Triple store (Virtuoso) http://lodserver.iula.upf.edu
Sparql server (Virtuoso) http://lodserver.iula.upf.edu/sparql
Web Browser (RoR + SPARQL) http://lod.iula.upf.edu/
RDF

When the focus shifts from growing the cloud to
deploying applications
• Complex types (identity resolution)
• Simple types (as instances)
• Linking data (linking vs. reusing)
• Data enrichment
• Approach: incremental process first bunch and curation
process
RDFying – index

RDFyinf – complex instances
<Document>
<Person>
<Organisation>
<Project>
<LangResourceInfo>
<identificationInfo>
<distributionInfo>
<contactPerson>
<metadataInfo>
<validationInfo>
<resourceDocumentationInfo>
<resourceCreationInfo>
<resourceComponentType>
</LangResourceInfo>

RDFyinf – complex instances
<langResource-URI-1>
<langResource-URI-n>
<person-URI-1>
<person-URI-2>
<person-URI-3>
=?
Identity
resolution

<contactPerson>
<surname>Monachini</surname>
<givenName>Monica</givenName>
<communicationInfo>
<email>monica.monachini@ilc.cnr.it</email>
<email>risorse@ilc.cnr.it</email>
<url>http://www.ilc.cnr.it/</url>
<address>Via Giuseppe Moruzzi</address>
<zipCode>56124</zipCode>
<city>Pisa</city>
<country>Italy</country>
</communicationInfo>
<affiliation>
<organizationName>………</organizationName>
<departmentName>Istituto …</departmentName>
<communicationInfo>
</affiliation>
</contactPerson>
http://…/Monica_Monachini

<fundingProject>
<projectName> Platform for Automatic, Normalised
Annotation and Cost-Effective Acquisition of
Language Resources for Human Languages
Technologies </projectName>
<projectShortName> PANACEA </projectShortName>
<url> http://panacea-lr.eu/ </url>
<fundingType> euFunds </fundingType>
<funder> European Union </funder>
</fundingProject>
<organizationInfo>
<organizationName> Consiglio Nazionale delle
Ricerche. Istituto di Linguistica Computazionale
“Antonio Zampolli” </organizationName>
<organizationShortName>CNR</organizationShortName>
…

For each embeded Project/Person/Organization/
1. Generate: Subject property URI triple for the
backwards relation.
– If Person then use “name_givenName”
– If “short name” exists use “shortname”
– Else use 20 first characters of “long name”
2. Generate URI property object triples as the result of
the union of all local declarations (where union
removes duplicate triples).
– This requires a final curation task that agrees on node values
in case they are different.
– The preliminary version needs further curation (we used
SPARQL select distinct to identify oddities)

RDFying Documents:
- DBLP to get full RDF descriptions
- Google Scholar to get BibTex descriptions
- For a small dataset this can be assumed. For big
datasets this needs a lot of work (some automatic
tasks may be defined)
<document>Quochi V, Frontini F, Rubino F. A MWE
Acquisition and Lexicon Builder Web Service. COLING 2012,
24th International Conference on Computational
Linguistics, Proceedings of the Conference: Technical
Papers,8-15 December 2012, Mumbai, India</document>

RDFying - Where to stop?
BIBTEX:
@inproceedings {quochi2012mwe,
title={A MWE Acquisition and Lexicon Builder Web Service.},
author={Quochi, Valeria & Frontini, Francesca & Rubino, Francesco},
booktitle={COLING},
year={2012}}
DBLP
<http://dblp.uni-trier.de/rec/conf/coling/QuochiFR12 >
owl:sameAs <http://dblp.org/rec/conf/coling/QuochiFR12> ;
dblp:title “A MWE Acquisition and Lexicon Builder Web Service”;
dblp:authoredBy <http://dblp.uni-trier.de/pers/q/Quochi:Valeria>;
dblp:authoredBy <http://dblp.uni-trier.de/pers/f/Frontini:Francesca>;
dblp:authoredBy <http://dblp.uni-trier.de/pers/r/Rubino:Francesco >;
dblp:publishedAsPartOf <http://dblp.uni-trier.de/rec/conf/coling/2012 >:
dblp:yearOfPublication “2012” .

Article
title
creator Mikel Forcada
subject discourse analysis, question answering
keywords NER, LMF, ...
references FreeLing, TreeBank, PANACEA ...
language English
RDFying- simple types

<subject>Gender Studies</subject>
<usage>NER</usage>
<format>XCES</format>
<standard>LMF</standard>
Not only Enumerations but also string
elements !!!
RDFying - simple types as instances

Value Value
counter
Resource
counter
eng 518 476
en 215 174
EN 120 120
Spa 390 376
es 77 71
ES 10 10
Language codes in MS central node

Enumerations:
object property + Class + instances +
checking existing vocabularies
‘free strings’:
1) generate data type property + string value.
2) curation process that:
a) identifies ‘enumeration like’ candidates (eg.
language) and choose an appropriate Vocabulary
b) Match value strings to relevant URIS (Dbpedia)

SELECT DISTINCT ?language
WHERE { ?s ms:languageId ?language }
(eng , en , EN …)
INSERT { ?s ms:language <http://.../English>.}
WHERE { ?s ms:language “EN". }
DELETE { ?s ms:language “EN". }
Curation using SPARQL

Linking data !!
Person
Organization
Document
Project
Enumerations
String valued
VIAF
ORCID
DBLP
Vocabularies
DBpedia

Linking data !! – linking vs reusing
documentation sameAs
documentation

Linking data !! – linking vs reusing
http://lod.iula.upf.edu/resources/PAN_metad
ata_MW_ENV_IT
http://lod.iula.upf.edu/resources/doc_37
local
URIs
external
URIs
Core concepts which belong to
some ‘local’ Class.
Instances which belong to some
‘external’ Class:
• Person (FOAF)
• Document (BIBO)
• Organisation (FOAF)
•….
But, some functional reasons:

Why all this ? Is it worth it?
- Displaying data to the user in a comprehensive way
- Aggregating external data in a sensitive manner
- Making hidden implicit relations explicit.

<usage>NER</usage>
<format>XCES</format>
<standard>LMF</standard>
Any good article or tool ?

NER
Projects
Services Articles
Reports
Named
Entity
SELECT * WHERE { ?s ?p ms:NER }

IULA?
10!
Why all this ? – IULA at MS central node

P E R S O N
ID
name
description
...
A N I S AT I O N
ID
name
description
...
R E S O U R C E
ID
name
description
...
L I C E N S E
ID
name
description
...
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
D O C U M E N T
ID
name
description
...
P E R S O N
ID
name
description
...
P R O J E C T
ID
name
description
...
SELECT * FROM WHERE { … ...}HELP!!
Everything about IULA?
HELP!!

SELECT * WHERE { ?s ?p “IULA” }

Why all this ? – data Mashups

• LOD opens new possibilities and SPARQL is a powerful tool
BUT
• Curation task is crucial and effort/time consuming. You can
address it as an incremental process.
Publishing LOD vs. deploying LOD applications
• Until now, the LOD community seems to focus on “growing
the cloud”
• In this scenario, creating new URIs and mapping to existing
URIs is OK but,
• when the focus shifts from growing the cloud to developing
applications, new problems will arise: massive redundancy of
URIs, trust on third party servers/data, …
Conclussions & reflections

Publishing Linked Data Lessons for Applications

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (17)

Similaire à Publishing Linked Data Lessons for Applications

Similaire à Publishing Linked Data Lessons for Applications (20)

Dernier

Dernier (20)

Publishing Linked Data Lessons for Applications