SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
LODifier: Generating Linked Data from
Unstructured Text
Isabelle Augenstein1 Sebastian Pad´o1 Sebastian Rudolph2
1Department of Computational Linguistics, Universit¨at Heidelberg, Germany
2Institute AIFB, Karlsruhe Institute of Technology, Germany
{augenste,pado}@cl.uni-heidelberg.de, rudolph@kit.edu
Extended Semantic Web Conference, 2012
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 1 / 28
Outline
1 Motivation
2 The LODifier Architecture and Workflow
3 Evaluation in a Document Similarity Task
4 Conclusion
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 2 / 28
Introduction
Introduction
What?
Represent information from NL text as Linked Data
Creation of graph-based representation for unstructured plain text
Why?
Task is easy for structured text, but quite challenging for unstructured
text
Most other approaches are very selective, e.g., extract only
pre-defined relations
Our approach translates the text in its entirety
Possible use cases: Domain-independent scenarios in which no
pre-defined schema is available
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 3 / 28
Introduction
Introduction
How?
Use noise-tolerant NLP pipeline to analyze text
Translate result into RDF representation
Embed output in the LOD cloud by using vocabulary from DBPedia
and Wordnet
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 4 / 28
System Overview Pipeline
http:
//ww
w.w3
.org/
RDF
/icon
s/rdf
_flye
r.svg
1 of 1
08.12
.11
23:48
lemmatized
text
NE-marked
text
parsed text
word sense
URIs
parsed text
DRS
tokenized
text
DBpedia
URIs
LOD-linked
RDF
unstructured
text
Named Entity
Recognition
(Wikifier)
Parsing
(C&C)
Deep Semantic
Analysis
(Boxer)
Lemmatization
Word Sense
Disambiguation
(UKB)
Tokenization
RDF Generation and Integration
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 5 / 28
System NLP analysis steps
NLP analysis steps (NER)
Named Entity Recognition (Wikifier 1):
Named entities are recognized using Wikifier
Wikifier finds Wikipedia links for named entities
Wikipedia URLs are converted to DBPedia URIs
The New York Times reported that John McCarthy died.
He invented the programming language LISP.
Figure: Test sentences
[[The New York Times]] reported that [[John McCarthy (computer
scientist)|John McCarthy]] died. He invented the [[Programming
language|programming language]] [[Lisp (programming language)|
Lisp]].
Figure: Wikifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 6 / 28
System NLP analysis steps
NLP analysis steps (WSD)
Word Sense Disambiguation (UKB 2)):
Words are disambiguated using UKB, an unsupervised graph-based
WSD tool
UKB finds WordNet links for word senses
UKB output is converted to RDF WordNet URIs
ctx_0 w1 00965687-v !! report
ctx_0 w4 00358431-v !! die
ctx_1 w7 01634424-v !! invent
ctx_1 w9 06898352-n !! programming_language
ctx_1 w10 06901936-n !! lisp
Figure: UKB output for the test sentences.
2
http://ixa2.si.ehu.es/ukb/
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 7 / 28
System NLP analysis steps
NLP analysis steps (Parsing and Semantic Analysis)
Parsing (C&C 3):
Sentences are parsed with the C&C parser
The C&C parser is a statistical parser that uses the combined
categorial grammar (CCG)
CCG is a semantically motivated grammar, making it suitable for
further semantic analysis
Deep Semantic Analysis (Boxer 4):
Boxer builds on the output of C&C and produces discourse
representation structures (DRSs)
DRSs model the meaning of texts in terms of the relevant entities
(discourse referents) and relations between them (conditions)
3
http://svn.ask.it.usyd.edu.au/trac/candc/
4
http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 8 / 28
System NLP analysis steps
______________________________ _______________________
| x0 x1 x2 x3 | | x4 x5 x6 |
|..............................| |.......................|
(| male(x0) |+| event(x4) |)
| named(x0,john_mccarthy,per) | | invent(x4) |
| programming_language(x1) | | agent(x4,x0) |
| nn(x1,x2) | | patient(x4,x2) |
| named(x2,lisp,nam) | | event(x5) |
| named(x3,new_york_times,org) | | report(x5) |
|______________________________| | agent(x5,x3) |
| theme(x5,x6) |
| proposition(x6) |
| ______________ |
| | x7 | |
| x6:|..............| |
| | event(x7) | |
| | die(x7) | |
| | agent(x7,x0) | |
| |______________| |
|_______________________|
Figure: Boxer output for the example sentences
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 9 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 10 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 11 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 12 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 13 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 14 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 15 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 16 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 17 / 28
System NLP analysis steps
_:var0x0 drsclass:named ne:john_mccarthy ;
rdf:type drsclass:male , foaf:Person ;
owl:sameAs dbpedia:John_McCarthy_(computer_scientist) .
_:var0x1 rdf:type class:programming_language ;
owl:sameAs dbpedia:Programming_language .
_:var0x2 drsrel:nn _:var0x1 .
_:var0x2 drsclass:named ne:lisp ;
owl:sameAs dbpedia:Lisp_(programming_language) .
_:var0x3 drsclass:named ne:the_new_york_times ;
owl:sameAs dbpedia:The_New_York_Times .
_:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 .
drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 .
_:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ;
drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 .
_:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ;
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object drsclass:event . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate rdf:type ;
rdf:object wn30:wordsense-die-verb-1 . ]
reify:conjunct [ rdf:subject _:var0x7 ;
rdf:predicate drsrel:agent ;
rdf:object _:var0x0 . ]
Figure: LODifier output for the test sentences.
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 18 / 28
Evaluation Evaluation overview
Evaluation overview
Task:
Assess document similarity
Story link detection task, part of topic detection and tracking (TDT)
family of tasks
Data:
TDT-2 benchmark dataset 5
Consists of newspaper, TV and radio news in English, Arabic and
Mandarin
Subset: English language, only newspaper articles
183 positive and negative document pairs
5
http://projects.ldc.upenn.edu/TDT2/
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 19 / 28
Evaluation Evaluation overview
Evaluation overview
Method:
Define various document similarity measures
Measures without structural information
Measures with structural information
Documents are predicted to have the same topic if they have a
similarity of θ or more
θ is determined with supervised learning
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 20 / 28
Evaluation Evaluation method
Evaluation method
Similarity measures without structure:
Random baseline
Bag-of-words baseline: Word overlap between the two documents in
a document pair
Bag-of-URI baseline: URI overlap between two RDF documents
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 21 / 28
Evaluation Evaluation method
Evaluation method
URI class variants:
Three variants for URI baseline: Take the relative importance of
various URI classes into account
Variant 1: All NEs identified by Wikifier, all words successfully
disambiguated by UKB (namespaces dbpedia:, wn30:)
Variant 2: Adds all NEs recognized by Boxer (namespace ne:)
Variant 3: Further adds the URIs for all words that were not
recognized by Wikifier, UKB or Boxer (namespace class:)
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 22 / 28
Evaluation Evaluation method
Evaluation method
Extended setting:
Aims at drawing more information from LOD cloud into the similarity
computation
The generated RDF graph is enriched by:
DBpedia categories (namespace dbpedia:)
WordNet synsets and WordNet senses related to the URIs in the
generated graph (namespace wn30:)
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 23 / 28
Evaluation Evaluation method
Evaluation method
Similarity measures with structure:
Full-fledged graph similarity measures, e.g. based on isomorphic
subgraphs, are infeasible due to the size of the RDF graphs
Our similarity measures are based on the shortest paths between
relevant nodes
Our intuition is that short paths between URI pairs denote relevant
semantic information and we expect they are related by a short path
in other documents as well
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 24 / 28
Evaluation Evaluation method
Evaluation method
Similarity measures with structure:
proSimk,Rel,f (G1, G2) =
a,b∈Rel(G1)
a,b ∈Ck (G1)∩Ck (G2)
f ( (a, b))
a,b∈Rel(G1)
a,b ∈Ck (G1)
f ( (a, b))
k: path length, Rel: semantic relations, f : similarity function
proSimcnt: Uses f ( ) = 1, i.e., just counts the number of paths
irrespective of their length
proSimlen: Uses f ( ) = 1/ , giving less weight to longer paths
proSimsqlen: Uses f ( ) = 1/
√
, discounting long paths less
aggressively than proSimlen
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 25 / 28
Evaluation Results
Results
Model normal extended
Similarity measures without structure
Random Baseline 50.0 –
Bag of Words 63.0 –
Bag of URIs (Variant 3) 73.4 76.4
Similarity measures with structure
proSimcnt (k=6, Variant 1) 77.7 77.6
proSimcnt (k=6, Variant 2) 79.2 79.0
proSimcnt (k=8, Variant 3) 82.1 81.9
proSimlen (k=6, Variant 3) 81.5 81.4
proSimsqlen (k=8, Variant 3) 81.1 80.9
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 26 / 28
Summary
Summary
Main contributions:
Service for the LOD community that can be used in different
scenarios that can profit from structural representation of text
Combination of well-established concepts and systems in a deep
semantic pipeline
Linking existing NLP technologies to the LOD cloud
Evaluation of LODifier in a topic link detection task gives clear
evidence of its potential
Possible future work directions:
Instead of a pipeline approach, simultaneously consider information
from the NLP components to allow interaction between them
Test LODifier in different scenarios
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 27 / 28
Summary
Thank you!
Augenstein, Pad´o, Rudolph LODifier ESWC 2012 28 / 28

Contenu connexe

Tendances

Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
European Data Forum
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 

Tendances (20)

Distributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data ManagementDistributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data Management
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online News
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
RDF Constraint Checking using RDF Data Descriptions (RDD)
RDF Constraint Checking using RDF Data Descriptions (RDD)RDF Constraint Checking using RDF Data Descriptions (RDD)
RDF Constraint Checking using RDF Data Descriptions (RDD)
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
Quantifying RDF data sets
Quantifying RDF data setsQuantifying RDF data sets
Quantifying RDF data sets
 
How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...
 

En vedette

Deploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal ServerDeploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal Server
rumito
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
Management information system question and answers
Management information system question and answersManagement information system question and answers
Management information system question and answers
pradeep acharya
 

En vedette (19)

Deploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal ServerDeploying RDF Linked Data via Virtuoso Universal Server
Deploying RDF Linked Data via Virtuoso Universal Server
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
Seed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation ExtractionSeed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation Extraction
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 
Human Neural Machine
Human Neural MachineHuman Neural Machine
Human Neural Machine
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic Web
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
Semantic Search Over The Web
Semantic Search Over The WebSemantic Search Over The Web
Semantic Search Over The Web
 
Management information system question and answers
Management information system question and answersManagement information system question and answers
Management information system question and answers
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic Web
 

Similaire à Lodifier: Generating Linked Data from Unstructured Text

Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255
deffa5
 

Similaire à Lodifier: Generating Linked Data from Unstructured Text (20)

Optimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the webOptimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the web
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)2014.12 - Let's Disco - 2 (EDDI 2014)
2014.12 - Let's Disco - 2 (EDDI 2014)
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
DRESD Project Presentation - December 2006
DRESD Project Presentation - December 2006DRESD Project Presentation - December 2006
DRESD Project Presentation - December 2006
 
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack ArchitectureLOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack Architecture
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Linked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠELinked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠE
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
 
Multilingual qa
Multilingual qaMultilingual qa
Multilingual qa
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Not-So-Linked Solution to the Linked Data Mining Challenge 2016Not-So-Linked Solution to the Linked Data Mining Challenge 2016
Not-So-Linked Solution to the Linked Data Mining Challenge 2016
 

Plus de Isabelle Augenstein

Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 

Plus de Isabelle Augenstein (17)

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Lodifier: Generating Linked Data from Unstructured Text

  • 1. LODifier: Generating Linked Data from Unstructured Text Isabelle Augenstein1 Sebastian Pad´o1 Sebastian Rudolph2 1Department of Computational Linguistics, Universit¨at Heidelberg, Germany 2Institute AIFB, Karlsruhe Institute of Technology, Germany {augenste,pado}@cl.uni-heidelberg.de, rudolph@kit.edu Extended Semantic Web Conference, 2012 Augenstein, Pad´o, Rudolph LODifier ESWC 2012 1 / 28
  • 2. Outline 1 Motivation 2 The LODifier Architecture and Workflow 3 Evaluation in a Document Similarity Task 4 Conclusion Augenstein, Pad´o, Rudolph LODifier ESWC 2012 2 / 28
  • 3. Introduction Introduction What? Represent information from NL text as Linked Data Creation of graph-based representation for unstructured plain text Why? Task is easy for structured text, but quite challenging for unstructured text Most other approaches are very selective, e.g., extract only pre-defined relations Our approach translates the text in its entirety Possible use cases: Domain-independent scenarios in which no pre-defined schema is available Augenstein, Pad´o, Rudolph LODifier ESWC 2012 3 / 28
  • 4. Introduction Introduction How? Use noise-tolerant NLP pipeline to analyze text Translate result into RDF representation Embed output in the LOD cloud by using vocabulary from DBPedia and Wordnet Augenstein, Pad´o, Rudolph LODifier ESWC 2012 4 / 28
  • 5. System Overview Pipeline http: //ww w.w3 .org/ RDF /icon s/rdf _flye r.svg 1 of 1 08.12 .11 23:48 lemmatized text NE-marked text parsed text word sense URIs parsed text DRS tokenized text DBpedia URIs LOD-linked RDF unstructured text Named Entity Recognition (Wikifier) Parsing (C&C) Deep Semantic Analysis (Boxer) Lemmatization Word Sense Disambiguation (UKB) Tokenization RDF Generation and Integration Augenstein, Pad´o, Rudolph LODifier ESWC 2012 5 / 28
  • 6. System NLP analysis steps NLP analysis steps (NER) Named Entity Recognition (Wikifier 1): Named entities are recognized using Wikifier Wikifier finds Wikipedia links for named entities Wikipedia URLs are converted to DBPedia URIs The New York Times reported that John McCarthy died. He invented the programming language LISP. Figure: Test sentences [[The New York Times]] reported that [[John McCarthy (computer scientist)|John McCarthy]] died. He invented the [[Programming language|programming language]] [[Lisp (programming language)| Lisp]]. Figure: Wikifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 6 / 28
  • 7. System NLP analysis steps NLP analysis steps (WSD) Word Sense Disambiguation (UKB 2)): Words are disambiguated using UKB, an unsupervised graph-based WSD tool UKB finds WordNet links for word senses UKB output is converted to RDF WordNet URIs ctx_0 w1 00965687-v !! report ctx_0 w4 00358431-v !! die ctx_1 w7 01634424-v !! invent ctx_1 w9 06898352-n !! programming_language ctx_1 w10 06901936-n !! lisp Figure: UKB output for the test sentences. 2 http://ixa2.si.ehu.es/ukb/ Augenstein, Pad´o, Rudolph LODifier ESWC 2012 7 / 28
  • 8. System NLP analysis steps NLP analysis steps (Parsing and Semantic Analysis) Parsing (C&C 3): Sentences are parsed with the C&C parser The C&C parser is a statistical parser that uses the combined categorial grammar (CCG) CCG is a semantically motivated grammar, making it suitable for further semantic analysis Deep Semantic Analysis (Boxer 4): Boxer builds on the output of C&C and produces discourse representation structures (DRSs) DRSs model the meaning of texts in terms of the relevant entities (discourse referents) and relations between them (conditions) 3 http://svn.ask.it.usyd.edu.au/trac/candc/ 4 http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer Augenstein, Pad´o, Rudolph LODifier ESWC 2012 8 / 28
  • 9. System NLP analysis steps ______________________________ _______________________ | x0 x1 x2 x3 | | x4 x5 x6 | |..............................| |.......................| (| male(x0) |+| event(x4) |) | named(x0,john_mccarthy,per) | | invent(x4) | | programming_language(x1) | | agent(x4,x0) | | nn(x1,x2) | | patient(x4,x2) | | named(x2,lisp,nam) | | event(x5) | | named(x3,new_york_times,org) | | report(x5) | |______________________________| | agent(x5,x3) | | theme(x5,x6) | | proposition(x6) | | ______________ | | | x7 | | | x6:|..............| | | | event(x7) | | | | die(x7) | | | | agent(x7,x0) | | | |______________| | |_______________________| Figure: Boxer output for the example sentences Augenstein, Pad´o, Rudolph LODifier ESWC 2012 9 / 28
  • 10. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 10 / 28
  • 11. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 11 / 28
  • 12. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 12 / 28
  • 13. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 13 / 28
  • 14. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 14 / 28
  • 15. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 15 / 28
  • 16. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 16 / 28
  • 17. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 17 / 28
  • 18. System NLP analysis steps _:var0x0 drsclass:named ne:john_mccarthy ; rdf:type drsclass:male , foaf:Person ; owl:sameAs dbpedia:John_McCarthy_(computer_scientist) . _:var0x1 rdf:type class:programming_language ; owl:sameAs dbpedia:Programming_language . _:var0x2 drsrel:nn _:var0x1 . _:var0x2 drsclass:named ne:lisp ; owl:sameAs dbpedia:Lisp_(programming_language) . _:var0x3 drsclass:named ne:the_new_york_times ; owl:sameAs dbpedia:The_New_York_Times . _:var0x4 rdf:type drsclass:event , wn30:wordsense-invent-verb-2 . drsrel:agent _:var0x0 ; drsrel:patient _:var0x2 . _:var0x5 rdf:type drsclass:event , wn30:wordsense-report-verb-3 ; drsrel:agent _:var0x3 ; drsrel:theme _:var0x6 . _:var0x6 rdf:type drsclass:proposition , reify:proposition , reify:conjunction ; reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object drsclass:event . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate rdf:type ; rdf:object wn30:wordsense-die-verb-1 . ] reify:conjunct [ rdf:subject _:var0x7 ; rdf:predicate drsrel:agent ; rdf:object _:var0x0 . ] Figure: LODifier output for the test sentences. Augenstein, Pad´o, Rudolph LODifier ESWC 2012 18 / 28
  • 19. Evaluation Evaluation overview Evaluation overview Task: Assess document similarity Story link detection task, part of topic detection and tracking (TDT) family of tasks Data: TDT-2 benchmark dataset 5 Consists of newspaper, TV and radio news in English, Arabic and Mandarin Subset: English language, only newspaper articles 183 positive and negative document pairs 5 http://projects.ldc.upenn.edu/TDT2/ Augenstein, Pad´o, Rudolph LODifier ESWC 2012 19 / 28
  • 20. Evaluation Evaluation overview Evaluation overview Method: Define various document similarity measures Measures without structural information Measures with structural information Documents are predicted to have the same topic if they have a similarity of θ or more θ is determined with supervised learning Augenstein, Pad´o, Rudolph LODifier ESWC 2012 20 / 28
  • 21. Evaluation Evaluation method Evaluation method Similarity measures without structure: Random baseline Bag-of-words baseline: Word overlap between the two documents in a document pair Bag-of-URI baseline: URI overlap between two RDF documents Augenstein, Pad´o, Rudolph LODifier ESWC 2012 21 / 28
  • 22. Evaluation Evaluation method Evaluation method URI class variants: Three variants for URI baseline: Take the relative importance of various URI classes into account Variant 1: All NEs identified by Wikifier, all words successfully disambiguated by UKB (namespaces dbpedia:, wn30:) Variant 2: Adds all NEs recognized by Boxer (namespace ne:) Variant 3: Further adds the URIs for all words that were not recognized by Wikifier, UKB or Boxer (namespace class:) Augenstein, Pad´o, Rudolph LODifier ESWC 2012 22 / 28
  • 23. Evaluation Evaluation method Evaluation method Extended setting: Aims at drawing more information from LOD cloud into the similarity computation The generated RDF graph is enriched by: DBpedia categories (namespace dbpedia:) WordNet synsets and WordNet senses related to the URIs in the generated graph (namespace wn30:) Augenstein, Pad´o, Rudolph LODifier ESWC 2012 23 / 28
  • 24. Evaluation Evaluation method Evaluation method Similarity measures with structure: Full-fledged graph similarity measures, e.g. based on isomorphic subgraphs, are infeasible due to the size of the RDF graphs Our similarity measures are based on the shortest paths between relevant nodes Our intuition is that short paths between URI pairs denote relevant semantic information and we expect they are related by a short path in other documents as well Augenstein, Pad´o, Rudolph LODifier ESWC 2012 24 / 28
  • 25. Evaluation Evaluation method Evaluation method Similarity measures with structure: proSimk,Rel,f (G1, G2) = a,b∈Rel(G1) a,b ∈Ck (G1)∩Ck (G2) f ( (a, b)) a,b∈Rel(G1) a,b ∈Ck (G1) f ( (a, b)) k: path length, Rel: semantic relations, f : similarity function proSimcnt: Uses f ( ) = 1, i.e., just counts the number of paths irrespective of their length proSimlen: Uses f ( ) = 1/ , giving less weight to longer paths proSimsqlen: Uses f ( ) = 1/ √ , discounting long paths less aggressively than proSimlen Augenstein, Pad´o, Rudolph LODifier ESWC 2012 25 / 28
  • 26. Evaluation Results Results Model normal extended Similarity measures without structure Random Baseline 50.0 – Bag of Words 63.0 – Bag of URIs (Variant 3) 73.4 76.4 Similarity measures with structure proSimcnt (k=6, Variant 1) 77.7 77.6 proSimcnt (k=6, Variant 2) 79.2 79.0 proSimcnt (k=8, Variant 3) 82.1 81.9 proSimlen (k=6, Variant 3) 81.5 81.4 proSimsqlen (k=8, Variant 3) 81.1 80.9 Augenstein, Pad´o, Rudolph LODifier ESWC 2012 26 / 28
  • 27. Summary Summary Main contributions: Service for the LOD community that can be used in different scenarios that can profit from structural representation of text Combination of well-established concepts and systems in a deep semantic pipeline Linking existing NLP technologies to the LOD cloud Evaluation of LODifier in a topic link detection task gives clear evidence of its potential Possible future work directions: Instead of a pipeline approach, simultaneously consider information from the NLP components to allow interaction between them Test LODifier in different scenarios Augenstein, Pad´o, Rudolph LODifier ESWC 2012 27 / 28
  • 28. Summary Thank you! Augenstein, Pad´o, Rudolph LODifier ESWC 2012 28 / 28