KeyNote talk Given at the DanTermBank workshop on Januaray,9th 2015.
http://dantermbank.cbs.dk/dtb_uk/the_dantermbank_project_launches_a_new_website/dantermbank_workshop_revealing_hidden_knowledge_9_january_2015
2. Outline of the talk
• Semantic relations
– what do we mean?
– why do we need them for?
• Finding semantic relations
– what are the issues?
– what can large corpora and machine learning do
for you?
– how can terminology contribute to Linked Data?
2Semantic relations - Aussenac09/01/2015
3. Semantic relations,
what do we mean?
Research field
• Linguistics: semantic
relations, semantic roles,
discourse relations
• Terminology
– Weak structure
– Stored in DB or SKOS
models
• Information extraction
What is a relation
A tree comprises at least a trunk,
roots and branches.
A tree [Plants] comprises [meronymy] at
least a trunk, roots and branches.
tree has_parts trunk, roots, branches
(tree, has parts, trunk) …
in a gardening terminology
looks for relations between instances
09/01/2015 Semantic relations - Aussenac 3
tree Plantation year Species Branches
Tree1 1990 Oak > 20
Tree2 1995 Oak 15
whole
parts
4. Semantic relations,
what do we mean?
Research field
• Domain Ontology engineering
– Formal (logic, RDF, OWL …) and
may lead to infer new
knowledge
– The relation is part of a network
– May be shared or not
• Semantic web
– Independent triples
– Publically available in data
repositories with W3C Standard
format
– Connect triples with existing
ones, with web ontologies
What is a relation
bot:Tree bot:has_part bot:Branch
09/01/2015 Semantic relations - Aussenac 4
Trunk
Has-part
Root
Plant
Fonguscereals
Has-
part
Root
is_a
Tree
Has-
part
Branch
bot:myTr
ee
bot:has-
part
bot:MyTre
eRoots
bot:Tree
bot:has-
part
bot:Branch
rdf:Type
5. Example: tree in DBPedia
5
dbpedia-
owl:tree
dbpedia-
owl:Species
dbpedia-
owl:Place
Semantic relations - Aussenac09/01/2015
6. dbpedia-
owl:PhysicalEntity
rdfs:subClassOf
dbpedia-
owl:Organism
Example: Plants in DbPedia
6
owl:SameAs
yago:WordNet_Plant_
100017222
dbpedia-
owl:Plant
Semantic web specificities
- New types of relations: mappings
- Focus 1: resources, either classes or instances
- Focus 2: relations between resources
- Focus 3: give a type to resources
dbpedia-
owl:Acer_Stone
bergae
dbpedia-
owl:Alopecurus_ca
rolinianus
dbpedia-
owl:Alsmithia_long
ipes
dbpedia-owl:…
rdf:typerdf:type
rdf:typerdf:type
Semantic relations - Aussenac09/01/2015
7. dbpedia-
owl:PhysicalEntity
rdfs:subClassOf
dbpedia-
owl:Organism
Example: Plants in DbPedia
7
owl:SameAs
yago:WordNet_Plant_
100017222
dbpedia-
owl:Plant
Semantic web specificities
- New types of relations: mappings
- Focus 1: resources, either classes or instances
- Focus 2: relations between resources
- Focus 3: give a type to resources
dbpedia-
owl:Acer_Stone
bergae
dbpedia-
owl:Alopecurus_ca
rolinianus
dbpedia-
owl:Alsmithia_long
ipes
dbpedia-owl:…
rdf:typerdf:type
rdf:typerdf:type
Semantic relations - Aussenac09/01/2015
8. dbpedia-
owl:PhysicalEntity
rdfs:subClassOf
dbpedia-
owl:Organism
Example: Plants in DbPedia
8
owl:SameAs
yago:WordNet_Plant_
100017222
dbpedia-
owl:Plant
Semantic web specificities
- New types of relations: mappings
- Focus 1: resources, either classes or instances
- Focus 2: relations between resources
- Focus 3: give a type to resources
dbpedia-
owl:Acer_Stone
bergae
dbpedia-
owl:Alopecurus_ca
rolinianus
dbpedia-
owl:Alsmithia_long
ipes
dbpedia-owl:…
rdf:typerdf:type
rdf:typerdf:type
Semantic relations - Aussenac09/01/2015
9. Semantic relations:
why do we need them for?
Research field
• Linguistics: to understand how
language produces meaning
• Terminology: to capture domain
specific terms and meanings
• Information extraction: to
collect structured data – the
schema (classes and relations) is
known
Relations contribute to
• semantics and language
interpretation
• formal semantics and
discourse semantics
• Structure terminologies and
make browsing easier
• Connect terms
• Give meaning to “concepts”
• Find related entities (values) to
build DB and then mine these
data (statistics)
09/01/2015 Semantic relations - Aussenac 9
10. Semantic relations:
why do we need them for?
Research field
• Formal ontologies: to define
axioms and inferences
associated with relations
• Domain ontologies: human
and machine “understanding”
• Linked data: interoperability,
data connection from various
sites or applications
Relations contribute to
• Inferences and reasoning
• Ex: sub-classes inherit of some class
properties
• Ex : transitivity (cf. some parthood rel.),
symmetry
• Ex: cardinality …
• Relations define concepts by
differences and similarities
• Relations have labels and are human
readable ; each one can be processed in
a specific way
• Relations connect resources (data)
• Their semantics is defined in ontologies
or produce behaviors in inference
engines (rdf:subClassOf)
09/01/2015 Semantic relations - Aussenac 10
11. 09/01/2015 Semantic relations - Aussenac 11
Erarht Rahm http://dbs.uni-leipzig.de/file/paris-Octob2014.pdf
Semantic web specificities
- Relations connect web data called resources
- Relations connect data with ontology classes: importance of hypernymy
- Relations may map ontology classes
12. Finding semantic relations,
what are the issues?
• Knowledge sources:
– where can we find relations?
• Extraction techniques
– How can we identify them?
• Representation
– Which way do I represent this information?
• Validation
– What makes a relation representation valild? Relevant?
09/01/2015 Semantic relations - Aussenac 12
13. Finding semantic relations,
what are the issues?
• Knowledge sources
– text, human experts, existing “semantic” resources (lexicon,
terminologies, ontologies, Linked Data vocabularies)
– Domain specific vs general knowledge
• Extraction techniques
– “obvious” language regularities, known relations and classes (or
entities) -> Patterns
• Issues : domain dependence, domain coverage, variation and
flexibility, rigidity (need to be regularly updated)
• Research issues: automatic building by machine learning
– “more implicit” language regularities, medium size corpora,
open list of classes/entities -> supervised learning
– Very large corpora, unexpected relations -> unsupervised
learning
13Semantic relations - Aussenac09/01/2015
14. Pattern based relation extraction,
an issue: variation
• A tree comprises at least a trunk,
roots and branches.
• With branches reaching the ground,
the willow is an ornamental tree.
• The tree of the neighbor has been
delimed.
• He climbs on the branches of the tree.
• This tree is wonderful. Its branches
reach the ground.
• Contains: very systematic pattern; the
parts may be difficult to spot;
enumeration > various parts
• With: meronymy pattern only in some
genres (such as catalogs, biology
documents)
• Delimed : Term and pattern are in the
same word; requires background
knowledge: delimed -> has_part
branches (and branches are cut)
• Of : Very ambiguous pattern; polysemy
reduced in [verb N1 of N2]
• Its : very ambiguous pattern; necessity
to take into account two sentences
14Semantic relations - Aussenac09/01/2015
15. Pattern based relation extraction,
learning patterns (1)
• Patterns are seen (and stored) as lexicalizations of ontology properties
• Patterns are “extracted” from syntactic dependencies between related
entities (in triples)
• Assumes that patterns are structured around ONE lexical entry
• Lemon format for lexical ontologies
• Entries can be frames
09/01/2015 Semantic relations - Aussenac 15
ATOLL—A framework for the automatic induction of ontology lexica
S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)
16. Pattern based relation extraction,
learning patterns (1)
• Patterns are seen (and stored) as lexicalizations of ontology properties
• Patterns are “extracted” from syntactic dependencies between related
entities (in triples)
• Assumes that patterns are structured around ONE lexical entry
• Lemon format for lexical ontologies
• Entries can be frames
09/01/2015 Semantic relations - Aussenac 16
ATOLL—A framework for the automatic induction of ontology lexica
S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)
17. Pattern based relation extraction,
learning patterns (2)
09/01/2015 Semantic relations - Aussenac 17
Michelle Obama is the wife of Barack Obama, the current president.
Michelle Obama allegedly told her husband, Barack Obama, to ..
Michelle Obama, the 44th first lady and wife of President Barack
Dbpedia:spouse
Find all lexicalizations of the entities: Michelle Obama, Mrs. Obama, Michelle
Robinson …
18. Pattern based relation extraction,
learning patterns (3)
09/01/2015 Semantic relations - Aussenac 18
• Pattern = shortest path btw the 2 entities in the dependency graph
[MichelleObama (subject), wife (root), of (preposition), BarackObama (object)]
• Lexical entry in the ontology
19. Relation extraction:
learning relations from enumerative structures
09/01/2015 Semantic relations - Aussenac 19
IS_A
IS_A
Learning relations from an parallel enumerative structure =
- classification task to identify the relation (IS_A, part_Of, other)
- Term extraction to identify the primer and the items
20. Relation extraction:
learning relations from enumerative structures
• Corpus
– 745 enumerative structures from
Wikipedia pages
– 3 relation types: taxonomic,
ontological_non_taxonomic,
non_ontological
• Classification task
– Feature definition
– Automatic evaluation of features
– 3 algorithms are compared : SVM,
MaxEntropy and baseline (majority)
– Training of the 2 algorithms
• Results
– 82% f-measure for SVM
– Best result with a 2 step process
(ontological yes/no -> feature and
then taxonomic yes/no)
09/01/2015 Semantic relations - Aussenac 20
21. From intepretation to representation
• A tree comprises at least a trunk,
roots and branches.
• With branches reaching the
ground, the willow is an
ornamental tree.
• The tree of the neighbor has been
delimed.
Tree
Trunk
Branches
Has-part Roots
Ornamental
Tree
Willow Tree Has-part Branches
Has-part Branches
Has-part Branches
Semantic relations - Aussenac 2109/01/2015
22. From intepretation to representation
• A tree comprises at least a trunk,
roots and branches.
• With branches reaching the
ground, the willow is an
ornamental tree.
• The tree of the neighbor has been
delimed.
• He’s climbing on the branches of
the tree.
• This tree is wonderful. Its
branches reach the ground.
Tree
Trunk
Branches
Has-part Roots
Has-part Branches
Has-part Branches
Semantic relations - Aussenac 2209/01/2015
Neighbor
Tree
Instance _of
23. Finding semantic relations:
what can large corpora and machine
learning do for you ?
• Learning patterns
– Poor results
– Requires very large data sets
– Reasonable for general
knowledge
• Learning relations
– Much more relevant
– Large variety of approaches in
the state of the art
– Key step = select feature
– more features, better the results
09/01/2015 Semantic relations - Aussenac 23
24. Finding semantic relations,
what can terminology do for the semantic web?
• Terminology can play a key role
– change its practices
– contribute to enrich the LOD with QUALITY data
• Contribute to define /evaluate tools
– evaluate machine learning tools
– Test and experiment them on various textual genre, in
various domains
– Propose more features
• Propose terminology tools
– … to improve these approaches with the studies carried
out during the last 20 years to build terminologies and
terminological knowledge bases.
– term extractors, relation extractors, patterns
24Semantic relations - Aussenac09/01/2015
25. Finding semantic relations,
what can terminology do for the semantic web?
• Contribute to enrich the LOD with terminologies
– Terminologies as structuring networked vocabularies
– Publish your ontologies as public web resources
– Conform to linked data requirements
– Use standards format like SKOS (Simple Knowledge
Organization System) the W3C’s OWL ontology for creating
thesauruses, taxonomies, and controlled vocabularies
• Contribute to the multilingual LOD
– Add lexicon to ontologies
– Multilingual efforts: LEMON, MONNET projects DID NOT
include terminologists
09/01/2015 Semantic relations - Aussenac 25
26. The LOD and the
linguistic LOD
need you!
http://linguistics.okfn.org/resources/llod/
09/01/2015 Semantic relations - Aussenac 26
27. Some landmark KOS LD
implementations
• Many Libraries
– Swedish National Library’s Libris catalogue and thesaurus http://libris.kb.se/
– Library of Congress’ vocabularies, including LCSH http://id.loc.gov/
– DNB’s Gemeinsame Normdatei (incl. SWD subject headings) http://dnb.info/gnd/
• Documentation at https://wiki.dnb.de/display/LDS
– BnF’s RAMEAU subject headings http://stitch.cs.vu.nl/
– OCLC’s DDC classification http://dewey.info/ and VIAF http://viaf.org/
– STW economy thesaurus http://zbw.eu/stw
– National Library of Hungary’s catalogue and thesauri
– http://oszkdk.oszk.hu/resource/DRJ/404(example)
• Other fields
– Wikipedia categories through Dbpedia http://dbpedia.org/
– New York Times subject headings http://data.nytimes.com/
– IVOA astronomy vocabularies http://www.ivoa.net/Documents/latest/Vocabularies.html
– GEMET environmental thesaurus http://eionet.europa.eu/gemet
– Agrovoc http://aims.fao.org/
– Linked Life Data http://linkedlifedata.com/
– Taxonconcept http://www.taxonconcept.org/
– UK Public sector vocabularies http://standards.esd.org.uk/ (e.g., http://id.esd.org.uk/lifeEvent/7)
09/01/2015 Semantic relations - Aussenac 27