Senso Comune (www.sensocomune.it) is an open, machine-readable knowledge base of the Italian language. This talk has been given at Clic-it 2104, the first Italian Conference on Computational Linguistics
Senso Comune as a Knowledge Base of Italian language - The Resource and its Development
1. Senso Comune as a Knowledge Base of Italian
language
The Resource and its Development
Tommaso Caselli 1 Isabella Chiari 2 Aldo Gangemi 3 Elisabetta
Jezek 4 Alessandro Oltramari 5 Guido Vetere 6 Laure Vieu 7
Fabio Massimo Zanzotto 8
1VU Amsterdam
2Universit `a di Roma ’Sapienza’
3CNR ISTC
4Universit `a di Pavia
5Carnegie Mellon University
6IBM Italia
7CNRS IRIT
8Universit `a di Roma ’Tor Vergata’
Tommaso Caselli , Isabella Chiari , Aldo GangemSie,nEsloisDaCboeemtctuaneJeemazeskba,KeAnlreosws1lae0nddg,reoB2Oa0lstrea1mo4faIrtia,liaGnuildaongVueategree , LauDreecVeiemub,eFra1b0i,o2M01a4ssimo 1Za/ n1z1otto
2. Introduction
Senso Comune (www.sensocomune.it) is an open, machine-readable
knowledge base of the Italian language
Lexical content has been extracted from a monolingual Italian
dictionary (De Mauro’s GRADIT), and is continuously enriched
through a collaborative online platform
Linguistic knowledge is represented by a semasiological model where
each sense can be qualified with respect to a small set of ontological
categories
Senses can be further enriched in many ways and mapped to other
dictionaries, such as the Italian version of MultiWordnet, thus
qualifying Senso Comune as a linguistic Linked Open Data resource
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 2 / 11
3. General principles
(Computational) lexicography should be able to build on the direct
witness of native speakers (not only textual sources)
The way linguistic meanings relate to ontological categories is
tangential
Linguistic knowledge belongs to the entire community of speakers,
thus we are committed to keep the resource as open as possible
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 3 / 11
4. Lexicon and ontology
To map lexical senses to concepts Senso Comune adopts a notion of
ontological commitment: If the sense S commits to (7!) the concept C,
then there are entities of type C to which occurrences of S may refer to.
Ontological Commitment
(S7! C) , 9s; cjS(s) ^ C(c) ^ refers to(s; c)
A sense may commit to several different ontological categories (e.g.
ARTIFACT, INFORMATION)
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 4 / 11
5. Lexicon and ontology, a semiotic approach
Senses are semiotic objects whose relationship with real world
entities is mediated by cognitive structures, emotional polarity and
social interactions
Lexical relations, such as synonymy, which hold among senses, do
not bear direct ontological import
Conversely, ontological axioms, such as equivalence, do not have
immediate linguistic side-effects
If the equivalence of linguistic senses to ontological concepts is
desired (e.g. for technical portions of the dictionary), this condition
has to be specifically formalized and managed
Synonymy < Equivalence
S7! C ^ S07! C0 ^ S S0 ; C C0
S7! C ^ S07! C0 ^ C C0 ; S S0
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 5 / 11
6. Sense classification
Senso Comune meanings are
classified w.r.t. a small set of
categories inspired by DOLCE
A tutoring methodology (TMEO)
supports the classification
process
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 6 / 11
7. Annotation of lexicographic examples and definitions
Ongoing work in Senso Comune focuses on manual annotation of the
usage examples associated with the sense definitions of the most
common verbs in the resource, with the goal of providing Senso Comune
with corpus-derived verbal frames. The annotation task, which is
performed through a Web-based tool, is organized in two main subtasks.
1 consists in identifying the
constituents that hold a relation
with the target verb in the
example and to annotate them
with information about the type
of phrase and grammatical
relation
2 users are asked to attach a
semantic role, an ontological
category and the sense
definition associated with the
argument filler of each frame
participant in the instances
Figure: Annotation of andare a cavallo
(riding)
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 7 / 11
8. Word Sense Alignment
To enrich Senso Comune (SC) and make it interoperable with other
lexical-semantic resources, we conducted Word Sense Alignment (WSA)
experiments with MultiWordNet (MWN), both manually and automatically
Figure: Aligment of appartamento
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 8 / 11
9. Manual Alignment
At the time of this writing
584 SC lemmas (nouns) have been processed for manual alignment,
for a total of
6,730 word senses, with 3.64 average word senses for each lemma
2,131 senses could be aligned with at least one MWN synset (31.7%)
2,187 MWN synsets could be aligned to at least one SC sense
1,093 biunique alignments
SC MWN %
1,622 1 76.1
367 2 17.2
108 3 5
25 4 1.1
11 5,7 0.6
Table: SC to MWN
MWN SC %
1,681 1 76.8
400 2 18.2
85 3 3.8
17 4 0.9
4 5,6 0.3
Table: MWN to SC
=) Similar granularity, relatively little overlap
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 9 / 11
10. Automatic Alignment
Lexical Match (overlapping tokens between two sense description),
with
1 Lemmatized version of the original glosses of Senso Comune
2 Bag-of-words based on synset words, direct hypernyms, nearest
synsets, the corresponding Italian synset words from the “Princeton
Annotated Gloss Corpus” and Wikipedia glosses from BabelNet
Sense Similarity (cosine score between the vector representations
of sense descriptions)
1 Vector representations have been obtained by means of the
Personalized Page Rank (PPR) vector representation with WN30 and
“Princeton Annotated Gloss Corpus” as knowledge base
Evaluation
Two Gold Standards, one for verbs (350 sense pairs) and one for
nouns (166 sense pairs), with Precision (P), Recall (R) and F1 scores
Best F1 by merging the outputs of the two methods: 0.47 for verbs
(P=0.61, R=0.38) and 0.64 for nouns (P=0.67, R=0.61).
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 10 / 11
11. Conclusion
The gap between a “native” Italian dictionary and an
English-derivative Wordnet may be relevant
This should be carefully taken into account when devising techniques
and methodologies to construct multilingual resources
Our results suggest that more attention should be paid to the
semantic peculiarity of each language, i.e. the specific way each
language constructs a conceptual view of the world
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 11 / 11