2. My background
Involved with cross-domain ontology engineering since Master thesis
1. at CNIPA, Italy – SPCoop (Public Co-operation system)
2. at Consiglio Nazionale delle Ricerche, Italy
3. at The Open University, UK
4. at Insight Centre, NUIG, Ireland
2
Fields: eGovernment, Content Management, Education, Digital Humanities,
Smart Cities, Social Learning
3. Outline
1. Knowledge Organisation
2. Features of ontologies and the OWL language
3. Methodologies and patterns
4. Caveats in ontology development
3
4. Knowledge Organisation Systems
Knowledge Organisation System (KOS)
• Umbrella term used for grouping
• Authority files
• Classification schemes (taxonomies, meronomies…)
• Thesauri
• Topic maps
• Ontologies
4
Garshol, L.M. Metadata? Thesauri? Taxonomies? Topic maps! Making sense of it all.
Journal of Information Science, 30 (4). 378-391 (2004).
https://web.archive.org/web/20081017174807/http://www.ontopia.net/topicmaps/
materials/tm-vs-thesauri.html
5. KOS and Lexical Semantics in Data
5
Play
Tragedy Comedy
Drama
antonym
synonym
KOS heavily employed in representing the population of a world (e.g. a lexicon)
hypernym (hyponym)
6. KOS and the Lexical Meta-level
6
Antonym
Hypernym Hyponym
Synonym
Inverse?
?
A lexical ontology will model the terminology itself (of the lexicon and lexica in general)
?
7. Motivation
Two primary approaches that warrant the usage of a KOS to
model the terminologies that describe a part of the world
• Intensional: the intent to formally characterise a problem
scope or context, often dictated by sheer philosophical
curiosity (e.g. DOLCE, SUMO)
• Extensional: driven by the need to create / re-engineer /
refactor / annotate etc. a data set, coupled with the
informed dissatisfaction with existing (shared) data
schemas (e.g. Schema.org, DBpedia ontology).
Hybrid approaches are not to be ruled out (e.g. distilling
classifications from data clusters)
7
8. Hierarchical structures
Typically assume one property whereupon hierarchies are defined
8
Place
Settlement
City
Country
Region
City
Country
ItalyMereological:
Taxonomical:
Instance-level:
9. From taxonomies to thesauri
Very useful if one needs nothing more than lax semantics for
controlled vocabularies whose terms are loosely aligned.
9
Province
County
narrowMatch
Region
broader
State
related
Restricted number of flexible semantic
properties
Necessarily limited expressivity
Typically not used to define new properties
SKOS is a state-of-the-art schema for the
construction of thesauri
Country
broader
10. Ontologies and ontology languages
A formal, explicit specification of a shared conceptualization
(Gruber, 1993)
A vocabulary used to describe (a particular view on) some
domain.
Sort of what you could do with a thesaurus, but with
1. Organisation into type and property hierarchies
2. Class-instance relationships
3. Custom constraints upon the above
4. Support for complex inferencing (within limits)
10
11. Some Applications
Linked Data representation schemas
Annotation vocabularies (e.g. for RDFa, Microdata in HTML)
Global schemas for mediated data integration
Ontology-based data access (OBDA)
Reasoning for Bioinformatics
Service interoperability
…
11
12. Ontology Languages
12
OWL – Web Ontology Language
• Standardised in Version 2 as W3C Recommendation
• Mapped to Description Logics
• Evolved from DAML-OIL
• Accommodates the expressivity of RDF-Schema (RDFS)
• Not a serialisation format!
14. Ontology implementation
14
Region Country
partOf exactly 1
AdministrativeDenomination
denom.Province
Netherlands
adopts some
Italy
denom.County
UK
US
adopts
adopts
type
type
Convention
subClassOf
16. Restrictions
Strengthen the semantics of class-property relations, in
addition to RDFS domain and range assertions.
They create new, unnamed classes that some classes of your
ontology will subsume.
16
A linguistic unit is forceful is it has at least one force property (e.g. an imperative
mood, hortative or volitive property)
hasProperty some ForceProperty
ForcefulLinguisticUnit
Existential restriction
17. Restrictions
17
A lost linguistic unit is a linguistic unit whose language
can only be an extinct one.
inLanguage only ExtinctVariety
LostLinguisticUnit
LinguisticUnit
Universal restriction
LostLinguisticUnit LinguisticUnit inLanguage.ExtinctVariety
18. Restrictions
18
hasPart min 2 OrtographicPart
OrtographicWord
An ortographic word is made of at least two ortographic parts
Qualified Cardinality
OrtographicWord ≥2 hasPart.OrtographicPart
20. Property features
OWL allows an algebraic characterisation of properties in
ways that allow further property assertions to be inferred.
(ir)reflexivity: it always [never] holds from A to A
symmetry: if it holds from A to B, it also holds from B to A.
transitivity: if it holds from A to B and from B to C, also from
A to C
inversion: if it holds from A to B, then there is another
property that holds from B to A.
functionality: if it holds from A to B and from A to C, then B
and C are the same.
20
21. Relation to RDF
21
RDF (Resource Description Framework) is merely a data
representation paradigm
• Not even a format: formats include RDF/XML, RDF/JSON, N-
Triples, Turtle, TriG, JSON-LD
RDF provides a convenient linguistic metaphor, but no
guidelines as to how triples should be organised into a model.
RDFS does some of that.
To treat RDF data as ontology axioms, they need to be
interpreted in OWL first.
There is a full specification that maps OWL Axioms to RDF
triples.
24. Manchester OWL
24
Class: gold-ext:OrthographicWord
SubClassOf:
dc:hasPart min 2 gold:OrthographicPart
Class: gold-ext:ForcefulLinguisticUnit
SubClassOf:
gold:LinguisticUnit,
gold:hasProperty some gold:ForceProperty
Class: gold-ext:LostLinguisticUnit
SubClassOf:
gold:LinguisticUnit ,
gold:inLanguage only gold:ExtinctVariety
25. Ontologies and Reasoning
Two main forms of reasoning based on RDFS and OWL
1. Inferencing, both at class level and at instance level.
2. Consistency checking: detection of contradictions.
Mostly useful for testing the ontology itself, or a sample
dataset.
Distributed reasoning: still very difficult to achieve at scale
and not standardised even with current commoditised ML
techniques (e.g. [PauStu16])
25
[PauStu16] H. Paulheim, H. Stuckenschmidt: Fast Approximate A-Box Consistency Checking
Using Machine Learning. ESWC 2016: 135-150
26. Ontologies and Reasoning
Inferencing
1. Find non-asserted subclasses
2. Find non-asserted types of class instances
3. Detect properties that implicitly hold between instances
due to e.g. symmetry, inverses, transitivity, property
chaining
All these in OWL are decidable, but only tractable under
certain conditions.
26
27. Ontologies and Reasoning
Consistency Checking
Can also be on class-level or instance-level
The open world assumption is key here: in the absence of
class disjointness, any two classes could be one and the
same.
an ontology without disjointness axioms is always consistent,
but that is not very meaningful.
27
28. Fragments of OWL 2
Description Constraints
OWL 2 QL
Designed for OBDA: data stored in a standard
relational database system can be queried
through an ontology via a simple rewriting
mechanism.
• No functional or transitive properties
• No existential, universal or cardinality
restrictions
• Only class disjointness
OWL 2 EL
Reasoning in polynomial time for ontologies
with very large number of classes and
properties.
• No class negation
• No cardinality restrictions
• Only class disjointness
OWL 2 RL
amenable to implementation using rule-
based technologies
cardinality restriction of at most 1 to data
ranges and class expressions
28
29. Methodologies
Proliferation of Methodologies in the Golden Age of ontology
engineering:
• Methontology
• DILIGENT
• On-to-knowledge
• NeOn Methodology
Each has a focus on one or some aspects of OE e.g. reuse,
competency questions, modularisation, evolution,
collaborative development…
Some expect tool support, but even without it, best practices can
be distilled that eventually come natural.
29
30. Staples in ontology engineering
Staples in common to most methodologies
1. Unit-test your ontology, even through non-DL queries
2. Reuse existing ontologies, or parts thereof
3. Modularise your ontologies
4. Develop iteratively
30
31. Staples: Competency questions
• Formulate precise questions that your model, in the presence
of adequate data, must be able to answer.
“Give me the part-of-speech sequence of this sentence”
“Which features of the lemma ‘Apple’ are morphosemantic?”
• Both unit tests and design justifications for your ontology
• Write them as free text and then formalise them (e.g. as
SPARQL queries)
• Make a small, separate dataset to test them and evaluate
your ontology.
• Typically much finer-grained than the research questions that
drive the project.
31
32. Staples: Reuse
• We can no longer develop in silos.
• Existing ontologies can be incorporate in early
stages of development.
• Open vocabulary services: BARTOC.org, LOV
Caveat: do not reinvent the wheel
• Ontology reuse is not all-or-nothing
• Terms can be adopted piecemeal
• Even if a reused ontology is implicitly accepted
as a whole, it is still a monotonic environment
and terms can be added at will.
• Ontology alignment is nowadays more
appreciated than standalone novel ontologies
32
https://lov.linkeddata.es
Linked Open Vocabularies (LOV)
33. Historical linguistic ontologies
GOLD - General Ontology for Linguistic Description
– Farrar and Langendoen, 2003
– http://www.linguistics-ontology.org/gold.html
LIR - Linguistic Information Repository
– NeOn Project, 2006-09
– http://mayor2.dia.fi.upm.es/oeg-upm/index.php/en/technologies/63-lir/
LMM - Linguistic MetaModel
– Picca, Gliozzo and Gangemi, 2008
– http://ontologydesignpatterns.org/wiki/Ontology:LMM
33
34. GOLD
General Ontology for Linguistic Description
- All-encompassing model
- Very lightly axiomatised (RDFS)
- Not evolved since 2010
34
35. Staples: Modularity
Split your ontology into connected
components that reflect the partitioning of
the domain at hand.
Pattern-based methodologies take this to
an extreme: every application of a pattern
(e.g. event participation, agent role)
becomes a module in its own right.
Caveat: Know when to separate:
• Axioms on specific domains from core
• Terminological axioms (Tbox) from
Assertions on domain instances (Abox)
35
http://ontologydesignpatterns.org
Ontology Design Patterns (ODP)
36. Staples: iterative development
Develop your ontology (network) as you would a software project
• Maintain a development branch, test release candidates, make
releases
• Publish and identify new and old versions using the OWL
( ontologyIRI , versionIRI ) scheme
• Mark stable, transitional and deprecated terms
• Don’t be shy on splitting into modules
Caveat: do not rename terms across versions if their
semantics are unchanged
36
F. Zablith, G. Antoniou, M. d'Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis, M. Sabou:
Ontology evolution: a process-centric survey. Knowledge Eng. Review 30(1): 45-75 (2015)
38. Caveats in ontology engineering
Using the OWL taxonomy for anything
38
Country
Region
City
Rome
Piazza Navona
Town
39. Caveats in ontology engineering
Using the OWL taxonomy for anything
39
Country
Region
City
Rome
Piazza Navona
Town
Location
Country
Settlement
City
Town
Region
Do not “demote” ontological relations to those of a mere thesaurus
40. Caveats in ontology engineering
OWL Axioms are not RDF triples
40
• The arity of OWL axioms is variable
• You still need multiple axioms to represent n-ary relations
• RDF is merely a funnel towards the serialisation of OWL to machine-
readable formats
• However most Object/Data property assertions or annotation are a
good match with triples.
41. Caveats in ontology engineering
You most likely won’t need
Boolean literals
41
If you sense the need to attach a Boolean value to an entity,
chances are you could refactor it into a new class or
property…
… So you probably should
42. Tool support
Ontology editors have had their own “1983 video game crash” in
the late 2000’s
• Swoop
• KAON
• OntoEdit
• Protégé
• NeOn Toolkit
• TopBraid (proprietary)
• PoolParty (proprietary)
42