Sense enumeration in WordNet is one of the main reasons behind the problem of high polysemous nature of WordNet. The sense enumeration refers to misconstruction that results in wrong assigning of a synset to a term. In this paper, we propose a novel approach to discover and solve the problem of sense enumerations in compound noun polysemy in WordNet. The proposed solution reduces the number of sense enumerations in WordNet and thus its high polysemous nature without affecting its efficiency as a lexical resource for natural language processing.
Compound Noun Polysemy and Sense Enumeration in WordNet
1. Compound Noun Polysemy and Sense
Enumeration in WordNet
1Abed Alhakim Freihat, 2Biswanath Dutta and
1Fausto Giunchiglia
1DISI, University of Trento
Trento, Italy
2Indian Statistical Institute (ISI)
Bangalore, India
eKNOW-2015, 22-27 February 2015, Lisbon, Portugal. 1
2. Outlines
Problem
WordNet
Compound Nouns
Polysemy
Compound Noun Polysemy
Sense Enumerations in Compound Nouns
Solution
Detecting Sense Enumerations in WordNet
Results
Conclusion and Future Work
2
3. WordNet (Princeton WordNet)
A lexical Database for English
A set of one or more synonyms (similar words) called a
synset
#1 pizza, pizza pie: Italian open pie made of thin bread dough spread with a
spiced mixture of e.g. tomato sauce and cheese.
Organized through semantic and lexical relations
Semantic Relations between synsets
hypernym, hyponym, meronym, …
Lexical Relations between words
Antonym, derivationally related form, ...
3
4. Compound Nouns
Multi-words or collocations that consist of noun modifier
and modified nouns.
Nerve center
Nerve is the noun modifier
Center is the modified noun
Red Coral
Red is the noun modifier
Coral is the modified noun
4
5. Polysemy
A word is Polysemous if
It has more than one meaning (i.e., It belongs to more than
one synset)
BANK
HONEY
5
6. Compound Noun Polysemy
The cases where we use the modified noun to refer to
several different compound nouns.
Using the word Center to refer:
center, centre, nerve center, nerve centre -- a cluster of nerve cells governing
a specific bodily process.
plaza, mall, center, shopping mall, shopping center, shopping centre --
mercantile establishment consisting of a carefully landscaped complex of
shops representing leading merchandisers; usually includes restaurants and
a convenient parking area; a modern version of the traditional marketplace.
Using the word head to refer:
fountainhead, drumhead, head teacher, …
6
7. Statistics
#Nouns 104290
#Synsets that contain these nouns 74314
#Compound nouns 58946
#Synsets that contain at least one
compound noun
40560
#Compound polysemous nouns 3407
7
• More than 56% of the nouns in WordNet are compound
nouns.
• More than 45% of the synsets contain compound nouns.
8. Types of Compound Noun Polysemy
• *Specialization polysemy:
• Using the word turtledove to refer:
#1 Australian turtledove, turtledove, Stictopelia cuneata: small
Australian dove
#2 turtledove: any of several Old World wild doves.
• Metonymy:
• Using the word cherry to refer:
• #2 cherry, cherry tree: any of numerous trees and shrubs
producing a small fleshy round fruit with a single hard stone.
• #3 cherry: a red fruit with a single hard stone.
• Sense enumerations
*Freihat, A. A., Giunchiglia, F. and Dutta, B. (2013). Solving specialization polysemy in WordNet. International Journal of
Computational Linguistics and Applications, vol. 4, no. 1, pp. 29-52. 8
9. Sense Enumeration in Compound Nouns
• Assignment of the noun modifier or the modified noun as a
synonym of the compound noun itself.
• Storing this kind of polysemy in a lexical database leads to a
redundant explosion of the word meanings.
• E.g., WordNet contains 135 non polysemous synsets in
which the term head is a noun modifier/modified noun of a
compound noun. Word head should have 168 senses (at
present 33 + 135 to add).
• WordNet assigns modified noun as a synonym of the
compound noun inconsistently.
9
10. Sense Enumeration in Compound Nouns
(contd.)
• Possible solutions
• Adding the modified noun as a synoym to all its
corresponding compound nouns → redundancy
• Removing this kind of polysemy → our proposed solution
10
11. Disambiguating Compound Nouns
We use usually modified nouns to refer to their corresponding
compound nouns (e.g., center to refer: shopping center,
research center, medical center,...)
Is it necessary to store the compound nouns and their
corresponding modified nouns as synonyms in the lexicon?
Disambiguating the modified nouns …
Are we able to disambiguate modified nouns because
We store the synonymy in our mental lexicon, OR
It is a syntactic process that does not depend on the
lexicon?
11
12. Discovery and Elimination of Sense
Enumerations in Compound Nouns
Two phases:
Discovery of sense enumerations in Compound
Nouns
A semi automatic process
Elimination of sense enumerations
An automatic process
12
13. Discovery of sense enumerations in
Compound Nouns (phase I)
Semi automatic:
Deploying an algorithm that returns sense enumeration
candidates in compound noun the polysemous nouns.
The algorithm excludes:
Specialization polysemy instances
Metonymy instances
Exclusion of false positives.
This step is manual where we exclude the false positives
We exclude: missing adjunct noun/modified noun synset
and term abbreviations.
13
14. Discovery of sense enumerations in
Compound Nouns (phase I Contd…)
Exclusion of false positives:
Missing adjunct noun/ modified noun:
#1 party, political party -- an organization to gain political power.
#2. party -- an occasion on which people can assemble for social interaction and
entertainment.
#3. party, company -- a band of people associated temporarily in some activity.
#4. party -- a group of people gathered together for pleasure.
#5. party -- a person involved in legal proceedings.
Term abbreviation
milliliter, millilitre, mil, ml, cubic centimeter, cubic centimetre, cc -- a metric unit of
volume equal to one thousandth of a liter.
14
15. Elimination of Sense Enumerations in
Compound Nouns (phase II)
An automatic process:
We eliminate the sense enumerations by removing the
polysemous modified nouns.
E.g., applying the function on head, the synset #32 is
the synset #32':
#32 drumhead, head: a membrane that is stretched taut
over a drum.
#32' drumhead: a membrane that is stretched taut over a
drum.
15
16. Result and Evaluation
Results of the discovery of the algorithm.
Manual validation result.
Disambiguation algorithm result.
• In 80% cases, there is total agreement between the two evaluators.
• In 94% cases, there is partial agreement between the two evaluators.
16
#Compound noun polysemous terms 2270
#Compound noun polysemous synsets 2952
#Compound noun polysemous instances 11650
#Compound noun polysemous terms 1905
#Compound noun polysemous synsets 2547
#Compound noun polysemous instances 11088
#Nouns #Synsets #Senses
Before applying the algorithm 104290 74314 130207
After applying the algorithm 104290 74314 127660
17. Conclusion
• Sense enumeration in compound noun is a source of
noise rather than a source of knowledge.
• Which compound noun polysemus nouns we should store
in a lexical dayabase?
• Only metonymy
• Lexicon should avoid redundant information that can be
derived by syntactic rules or by NLP tools.
17
18. Future work
• Evaluation in terms of recall and precision to test our approach
• Examine the relation between sense enumeration and missing
terms.
• e.g., bony pelvis and head of muscle are missing in the
following two synsets respectively:
#25 head: the rounded end of a bone that bits into a
rounded cavity in another bone to form a joint.
#26 head: that part of a skeletal muscle that is away from
the bone that it moves.
18
19. Acknowledgement
• The research leading to these results has received funding from
the European Community’s Seventh Framework Program under
grant agreement n. 600854, Smart Society (http://www.smart-
society-project.eu/).
19