SlideShare une entreprise Scribd logo
1  sur  66
Word Sense Disambiguation
                                               and
                                Intelligent Information Access

                                           Pierpaolo Basile

                                           basilepp@di.uniba.it
                                    Department of Computer Science
                                  University of Bari “A. Moro” (ITALY)




                                             29 May 2009


Pierpaolo Basile (basilepp@di.uniba.it)        WSD and IIA               29/05/09   1 / 55
Outline

1    Introduction
        Word Sense Disambiguation
        Intelligent Information Access
2    WSD Strategies
      JIGSAW
      JIGSAWz
      HYDE : a hybrid strategy for WSD
      COMBY : a combined strategy for WSD
3    WSD at Work
      Information Filtering: ITR - ITem Recommender
      Information Retrieval: Semantic Search
4    Conclusions and Future Work

Pierpaolo Basile (basilepp@di.uniba.it)   WSD and IIA   29/05/09   2 / 55
Introduction   Word Sense Disambiguation


Word Sense Disambiguation


          Word Sense Disambiguation (WSD) is the problem of selecting a
          sense for a word from a set of predefined possibilities
                 sense inventory usually comes from a dictionary or thesaurus
                 polysemous word: having more than one possible meaning, e.g.
                 bank1 :
                     1   sloping land (especially the slope beside a body of water);
                     2   a financial institution that accepts deposits and channels the money
                         into lending activities;
                     3   a long ridge or pile;
                     4   an arrangement of similar objects in a row or in tiers;
                 knowledge intensive methods, supervised learning, and (sometimes)
                 bootstrapping approaches



     1
         First four meanings in WordNet 3.0
Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                         29/05/09   3 / 55
Introduction   Word Sense Disambiguation


Brief History



         1949: noted as problem for Machine Translation
         1950s - 1960s: semantic networks, AI approaches
         1970s - 1980s: rule based systems, rely on hand crafted knowledge
         sources
         1990s: WordNet, corpus based approaches, sense tagged text
         2000s: Hybrid Systems, minimizing or eliminating use of sense tagged
         text, taking advantage of the Web, domain WSD




Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                         29/05/09   4 / 55
Introduction   Intelligent Information Access


Intelligent Information Access

Problems
    Explosion of irrelevant, unclear, inaccurate information
         Users overloaded with a large amount of information impossible to
         absorb




Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                              29/05/09   5 / 55
Introduction   Intelligent Information Access


Intelligent Information Access

Problems
    Explosion of irrelevant, unclear, inaccurate information
         Users overloaded with a large amount of information impossible to
         absorb

Consequences
    Searching is time consuming
         Need for intelligent solutions able to support users in finding
         documents




Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                              29/05/09   5 / 55
Introduction   Intelligent Information Access


Intelligent Information Access

Problems
    Explosion of irrelevant, unclear, inaccurate information
         Users overloaded with a large amount of information impossible to
         absorb

Consequences
    Searching is time consuming
         Need for intelligent solutions able to support users in finding
         documents

Solution
     Intelligent Information Access: user-centric and semantically rich
     approach to access information

Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                              29/05/09   5 / 55
Introduction   Intelligent Information Access


WSD in Information Access


         Machine Translation
         Translate “plant” from English to Italian
         Is it a “pianta” or a “impianto/stabilimento”?
         Information Retrieval
         Find all Web Pages about “bat”
         The sport equipment or the nocturnal mammal ?
         Question Answering
         What is George Millers position on gun control?
         The psychologist or US congressman?
         Knowledge Acquisition
         Add to KB: Herb Bergson is the mayor of Duluth, Minnesota or
         Georgia?


Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                              29/05/09   6 / 55
Introduction   Intelligent Information Access


WSD and Intelligent Information Access



         Natural Language Processing can enhance Intelligent Information
         Access
                 keywords not appropriate for representing content, due to polysemy,
                 synonymy, multi-word concepts
                 WSD provides semantics: concepts identification in documents
         Humans are able to comprehend the meaning of a text
         Natural Language Processing and WSD convert human linguistic
         abilities into more formal representations that are easier for computer
         programs to understand




Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                              29/05/09   7 / 55
WSD Strategies   JIGSAW


JIGSAW

JIGSAW
    Knowledge-based WSD algorithm
         Exploits WordNet senses
         Three different strategies for: nouns, verbs and adjectives/adverbs
         Main motivation: the effectiveness of a WSD algorithm is strongly
         influenced by the PoS-tag




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   8 / 55
WSD Strategies   JIGSAW


JIGSAW

JIGSAW
    Knowledge-based WSD algorithm
         Exploits WordNet senses
         Three different strategies for: nouns, verbs and adjectives/adverbs
         Main motivation: the effectiveness of a WSD algorithm is strongly
         influenced by the PoS-tag

WordNet [Mil95]
         Lexical reference database designed by Princeton University
         English nouns, verbs, adverbs and adjectives organized into SYNonym
         SETs (SYNSET)
         Semantic relations among synsets

Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   8 / 55
WSD Strategies   JIGSAW


WordNet




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   9 / 55
WSD Strategies   JIGSAW


WordNet



    1    Synset Rank
    2    Occurrences in SemCor
    3    Offset
    4    SYNonym-SET



         Gloss: synset definition
         Examples of usage
         Synset description = gloss + examples

Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   10 / 55
WSD Strategies   JIGSAW


JIGSAW algorithm



The algorithm
    Input
                 d = (w1 , w2 , . . . , wh )      document
         Output
                 X = (s1 , s2 , . . . , sk )      k ≤h
                         each si obtained by disambiguating wi based on the context of each
                         word
                         some words not recognized by WordNet
                         groups of words recognized as a single concept




Pierpaolo Basile (basilepp@di.uniba.it)             WSD and IIA                 29/05/09   11 / 55
WSD Strategies   JIGSAW


JIGSAWnouns

The idea
    Based on Resnik [Res95] algorithm for disambiguating noun groups
    Given a set of nouns N = {n1 , n2 , . . . , nn } from document d
                 each ni has an associated sense inventory Si = {si1 , si2 , . . . , sik } of
                 possible senses
         Goal: assigning each wi with the most appropriate sense sih ∈ Si ,
         maximizing the similarity of ni with the other nouns in N

The strategy
    Computing Semantic Similarity exploiting “noun hierarchy”
         Give more credit to senses that are hyponym of the Most Specific
         Subsumer (MSS)
         Combine MSS information with Semantic Similarity

Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                      29/05/09    12 / 55
WSD Strategies   JIGSAW


JIGSAWnouns




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   13 / 55
WSD Strategies   JIGSAW


JIGSAWnouns




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   13 / 55
WSD Strategies   JIGSAW


JIGSAWnouns




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   13 / 55
WSD Strategies   JIGSAW


JIGSAWnouns




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   13 / 55
WSD Strategies   JIGSAW


JIGSAWnouns




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   13 / 55
WSD Strategies   JIGSAW


JIGSAWnouns



Final synset score
         Linear combination between semantic similarity (with MSS
         information) and synset rank in WordNet:

                       ϕ(sik ) = α ∗ sim(sik , N) + β ∗ R(k)              (α + β = 1)       (1)
         R(k) takes into account the synset rank in WordNet:

                                                                     k
                                           R(k) = 1 − 0.8 ∗                                 (2)
                                                                    n−1




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                   29/05/09   14 / 55
WSD Strategies   JIGSAW


JIGSAWnouns



Differences between JIGSAWnouns and Resnik
    Leacock-Chodorow measure to compute similarity (instead of
    Information Content)
         Gaussian factor G, which takes into account the distance between
         words in the text
         Factor R, which takes into account the synset frequency score in
         WordNet




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   15 / 55
WSD Strategies   JIGSAW


JIGSAWverbs
The idea
    Try to establish a relation between verbs and nouns (distinct IS-A
    hierarchies in WordNet)
    Verb wi disambiguated using:
                 nouns in the context C of wi
                 nouns into the description (gloss + WordNet usage examples) of each
                 candidate synset for wi




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA          29/05/09   16 / 55
WSD Strategies   JIGSAW


JIGSAWverbs
The idea
    Try to establish a relation between verbs and nouns (distinct IS-A
    hierarchies in WordNet)
    Verb wi disambiguated using:
                 nouns in the context C of wi
                 nouns into the description (gloss + WordNet usage examples) of each
                 candidate synset for wi

The strategy
    For each candidate synset sik of wi
                 computes nouns(i, k): the set of nouns in the description for sik
                 for each wj in C and each synset sik computes the highest similarity
                 maxjk
                 maxjk is the highest similarity value for wj wrt the nouns related to the
                 k-th sense for wi (using Leacock-Chodorow measure)
                 using G and R factors (JIGSAWnouns ) to weight semantic similarity
Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA              29/05/09   16 / 55
WSD Strategies   JIGSAW


JIGSAWverbs : The algorithm


I play basketball and soccer.                       wi = play       C = {basketball, soccer }
    1    (70) play - (participate in games or sport; “We played hockey all
         afternoon”; “play cards”; “Pele played for the Brazilian teams in
         many important matches”)
    2    (29) play - (play on an instrument; “The band played all night long”)
    3    ...
Build nouns set for each sik :
    1    nouns(play,1): game, sport, hockey, afternoon, card, team, match
    2    nouns(play,2): instrument, band, night
    3    ...



Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                      29/05/09    17 / 55
WSD Strategies   JIGSAW


JIGSAWverbs : The algorithm

                        wi = play
                        C = {basketball, soccer }
nouns(play,1): game, sport, hockey, afternoon, card, team, match




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA      29/05/09   18 / 55
WSD Strategies   JIGSAW


JIGSAWverbs : The algorithm




         Finally, an overall similarity score, ϕ(i, k), among sik and the whole
         context C is computed:

                                          wj ∈C Gauss(position(wi ), position(wj ))   · maxjk
           ϕ(i, k) = R(k) ·                                                                      (3)
                                                 h Gauss(position(wi ), position(wh ))

         The synset assigned to wi is the one with the highest ϕ value




Pierpaolo Basile (basilepp@di.uniba.it)              WSD and IIA                      29/05/09   19 / 55
WSD Strategies   JIGSAW


JIGSAWothers

         Based on the WSD algorithm proposed by Banerjee and Pedersen
         [BP02, BP03] (inspired to Lesk [Les86])
         Idea: computes the overlap between the glosses of each candidate
         sense (including related synsets) for the target word to the glosses of
         all words in its context
                 assigns the synset with the highest overlap score
                 if ties occur, the most common synset in WordNet is chosen
         Given the sentence: “I bought a bottle of aged wine”
                 the context is C = {bottle, wine}
                 the first two synsets for aged are:
                     1   (advanced in years; ”aged members of the society”; ”elderly residents
                         could remember the construction of the first skyscraper”; ”senior
                         citizen”);
                     2   (of wines, fruit, cheeses; having reached a desired or final condition;
                         ”mature well-aged cheeses”)


Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                    29/05/09   20 / 55
WSD Strategies   JIGSAWz


JIGSAWz : ZIPF distribution




         Zipf’s law: the frequency of an event is inversely proportional to its
         rank in the frequency table
         similar to words distribution: the most frequent word occurs
         approximately twice the second most frequent word, which occurs
         twice the fourth most frequent word, ...
Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA       29/05/09   21 / 55
WSD Strategies   JIGSAWz


JIGSAWz


         Modify R factor using ZIPF distribution:

                                                                 1/k s
                                           f (k; N; s) =         N
                                                                                        (4)
                                                                         s
                                                                 n=1 1/n
         where:
                 N is the number of word meanings
                 k is the word meaning rank. We adopt the WordNet synset rank
                 s is the value of the exponent characterizing the distribution
         Compute the frequency of the word meaning in SemCor
         Approximate s using the Pearson’s chi-square χ2 test method




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA               29/05/09   22 / 55
WSD Strategies   JIGSAWz


NLP tools for the evaluation


         WSD requires pre-processing steps: tokenization, stemming,
         PoS-tagging and lemmatization
         META (MultilanguagE Text Analyzer) [BdG+ 08] implements several
         NLP tasks and provides tools for semantic indexing of documents:
                 Text normalization and tokenization
                 Stemming (SNOWBALL library)
                 Lemmatization
                         English: WordNet Morphological Analyzer
                         Italian: Morph-it! and Lemmagen tool (Ripple Down Rule learning)
                 POS-tagging based on ACOPOST T3 (HMM - Hidden Markov Model)
                 Entity recognition based on SVM classifier (YAMCHA)
                 WSD: English/Italian




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                 29/05/09     23 / 55
WSD Strategies   JIGSAWz


JIGSAW Evaluation

SensEval-3 All-Words Task
    disambiguation of all words contained into English texts
         sense inventory: WordNet 1.7.1
         2.041 words
         inter-annotators agreement rate was approximately 72,5%

EVALITA WSD All-Words Task
   disambiguation of all words contained into Italian texts
         sense inventory: ItalWordNet
         about 5,000 words
         no information about inter-annotators agreement rate


Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA       29/05/09   24 / 55
WSD Strategies    JIGSAWz


JIGSAW Evaluation: Results
JIGSAW at SensEval-3 All-Words Task
                            system                  P           R        A(%)     F
                           1st sense              0.624       0.651       100   0.651
                      BestUnsupervised            0.583       0.582       100   0.582
                          JIGSAW                  0.525       0.525       100   0.525
                          JIGSAWz                 0.606       0.606       100   0.606


JIGSAW at EVALITA WSD All-Words Task [BS07]
                               system         P              R        A(%)     F
                              1st sense     0.648          0.614      94.7   0.631
                             Random         0.483          0.458      94.7   0.470
                             JIGSAW         0.598          0.567      94.7   0.582
                             JIGSAWz        0.639          0.606      94.7   0.622

Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                          29/05/09   25 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


Supervised Learning for WSD



Supervised Learning for WSD
Exploits machine learning techniques to induce models of word usage from
large text collections
         annotated corpora are tagged manually using semantic classes chosen
         from a sense inventory
         each sense-tagged occurrence of a particular word is transformed into
         a feature vector, which is then used in an automatic learning process




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   26 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


Problems and Motivation
Knowledge-based methods
   outperformed by supervised methods
         high coverage: applicable to all words in unrestricted text




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   27 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


Problems and Motivation
Knowledge-based methods
   outperformed by supervised methods
         high coverage: applicable to all words in unrestricted text

Supervised methods
    high precision
         low coverage: applicable only to those words for which annotated
         corpora are available




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   27 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


Problems and Motivation
Knowledge-based methods
   outperformed by supervised methods
         high coverage: applicable to all words in unrestricted text

Supervised methods
    high precision
         low coverage: applicable only to those words for which annotated
         corpora are available

Solution
HYDE : combination of Knowledge-based (JIGSAW ) methods and
Supervised Learning can improve WSD effectiveness [BdLS08]
         Knowledge-based methods improve coverage
         Supervised Learning strategies improve precision
Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   27 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


Supervised Learning
Exploited features
    nouns: the first noun, verb or adjective before the target noun, within
    a (left) window of at most three words to the left and its PoS-tag
         verbs: the first word before and the first word after the target verb
         and their PoS-tag
         adjectives: six nouns (before and after the target adjective)
         adverbs: the same as adjectives but adjectives rather than nouns are
         used

Training corpus: MultiSemCor
    1    Italian translations of the SemCor texts
    2    automatically aligning Italian and English texts
    3    automatically transferring the word sense annotations from English
         (WordNet) to the aligned Italian (MultiWordNet) words
Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   28 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


Supervised Learning



K-NN algorithm for WSD
   Learning: build a vector for each annotated word
   Classification:
                 build a vector vf for each word in the text
                 compute similarity between vf and the training vectors
                 rank the training vectors in decreasing order according to the similarity
                 value
                 choose the most frequent sense in the first K vectors




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   29 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


HYDE Evaluation

         Dataset: EVALITA WSD All-Words Task Dataset
         Two strategies:

Integrating JIGSAW into a supervised learning method
     supervised method is applied to words for which training examples are
     provided
         JIGSAW is applied to words not covered by the first step

Integrating supervised learning into JIGSAW
     JIGSAW is applied to assign a sense to the words which can be
     disambiguated with a high level of confidence
         remaining words are disambiguated by the supervised method


Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   30 / 55
WSD Strategies   HYDE : a hybrid strategy for WSD


HYDE Evaluation: Baselines




Baselines for EVALITA WSD All-Words Task Dataset
                              Setting       P          R           F         A (%)
                              1st sense   0.648      0.614       0.631        94.7
                              Random      0.484      0.484       0.484       100.0
                              JIGSAW      0.639      0.606       0.622        94.7
                              K-NN        0.797      0.336       0.473        42.2




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                29/05/09   31 / 55
WSD Strategies    HYDE : a hybrid strategy for WSD


HYDE : Evaluation results
1st sense (0.631), Random (0.470), JIGSAW (0.622), K-NN (0.484)
Integrating JIGSAW into a supervised learning method
              Setting                                          P         R           F           A (%)
              K-NN +        JIGSAW                           0.624     0.591       0.607          94.7
              K-NN +        JIGSAW        (ϕ ≥ 0.80)         0.693     0.337       0.453          48.6
              K-NN +        JIGSAW        (ϕ ≥ 0.60)         0.680     0.410       0.512          60.3
              K-NN +        JIGSAW        (ϕ ≥ 0.40)         0.652     0.452       0.534          69.3
              K-NN +        JIGSAW        (ϕ ≥ 0.20)         0.652     0.452       0.534          69.3


Integrating supervised learning into JIGSAW
                      Setting                                P           R           F           A (%)
              JIGSAW (ϕ ≥ 0.80) + K-NN                       0.715     0.392       0.556          55.6
              JIGSAW (ϕ ≥ 0.60) + K-NN                       0.688     0.440       0.537          64.0
              JIGSAW (ϕ ≥ 0.40) + K-NN                       0.651     0.484       0.555          74.4

Pierpaolo Basile (basilepp@di.uniba.it)              WSD and IIA                                   29/05/09   32 / 55
WSD Strategies   COMBY : a combined strategy for WSD


COMBY : a combined strategy for WSD


COMBY WSD framework: combines the output data of several WSD
algorithms
     run a set of WSD algorithms on a sense-annotated corpus (TRC )
                 obtain a set of output data O = {o1 , o2 , .., oN } where each oi is the
                 output provided by the i − th algorithm
                 each output oi contains for each word instance wj in TRC a list of pairs
                 (< synset1 , score1 >, ..., < synsetk , scorek >, ..., < synsetl , scorel >)
         combination step: run WSD algorithms on a not sense-annotated
         corpus (TSC ):
                 run the WSD algorithms on a different dataset (TSC )
                 obtain a set of output data
                 combination of outputs: voting strategies and supervised methods




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                   29/05/09   33 / 55
WSD Strategies   COMBY : a combined strategy for WSD


Combination strategies


Voting strategies
    1    simple voting: the sense that has the majority of votes is chosen
    2    simple voting using the information about the synset score: the vote
         for each synset is the sum of all scores in each WSD system
    3    simple voting using different weights for each system according to the
         WSD performance in TRC




Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                   29/05/09   34 / 55
WSD Strategies   COMBY : a combined strategy for WSD


Combination strategies


Voting strategies
    1    simple voting: the sense that has the majority of votes is chosen
    2    simple voting using the information about the synset score: the vote
         for each synset is the sum of all scores in each WSD system
    3    simple voting using different weights for each system according to the
         WSD performance in TRC

Supervised methods
    1    several classification algorithms using the WEKA package
    2    Support Vector Machine adopting the open-source software LIBSVM
    3    using unsupervised predictions into a supervised system


Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                   29/05/09   34 / 55
WSD Strategies   COMBY : a combined strategy for WSD


COMBY evaluation

Dataset
    TRC training: SemCor 1.7.1
    TRS testing: SensEval-3 All-Words Task
                 1s t sense baseline: (F=0,651)

Involved WSD systems
     JIGSAW : a knowledge-based WSD algorithm that exploits WordNet
     as knowledge-base.
         AitorKB: graph-based method for performing knowledge-based WSD
         [AS08]
         TS: exploits Topic Signatures to disambiguate nouns [AdL04]
         RIC : automatically builds examples from the Web using a new
         approach based on the “monosemous relative” method [MAW06]

Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                   29/05/09   35 / 55
WSD Strategies   COMBY : a combined strategy for WSD


COMBY evaluation: voting strategies
Performance of each systems
                              System       P           R           F           A(%)
                             JIGSAW        0.554       0.554       0.554       100.0
                                TS         0.458       0.215       0.292       46.9
                               RIC         0.397       0.396       0.396       99.8
                             AitorKB       0.600       0.600       0.600       100.0


Voting strategies
                            Strategy      P            R           F           A(%)
                             Simple       0.587        0.587       0.587       100.0
                            Z-Score       0.575        0.575       0.575       100.0
                              Rank        0.615        0.615       0.615       100.0


Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                   29/05/09   36 / 55
WSD Strategies   COMBY : a combined strategy for WSD


COMBY evaluation: supervised combination
Combination using WEKA
                         Classifier            P            R          F           A(%)
                        Naive Bayes           0.653        0.653      0.653       100.0
                       Decision Trees         0.649        0.649      0.649       100.00
                        Ada Boost             0.647        0.647      0.647       100.0
                           K-NN               0.643        0.643      0.643       100.0
                       SMO (SVM)              0.653        0.653      0.653       100.0


Combination using LIBSVM and a supervised system (Knn.ehu [AdL07])
                        System                      P            R          F           A(%)
                       LIBSVM                       0.654        0.654      0.654       100.0
                       Knn.ehu                      0.667        0.667      0.667       100.0
                  Knn.ehu+predictions               0.671        0.671      0.671       100.0
Pierpaolo Basile (basilepp@di.uniba.it)            WSD and IIA                                   29/05/09   37 / 55
WSD at Work


WSD at Work

Exploit WSD techniques in real application scenarios
Information Filtering
     Content-based recommending system
         User profiles compared against item descriptions to provide
         recommendations
         Problems: keywords not appropriate for representing content, due to
         polysemy, synonymy, multi-word concepts

Information Retrieval
     Selection of documents, from a fixed collection, which satisfy a user’s
     one-off information need (query)
         Problems: polysemy and synonymy


Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA     29/05/09   38 / 55
WSD at Work   Information Filtering: ITR - ITem Recommender


Information Filtering: ITR - ITem Recommender


ITR - ITem Recommender [SDLB07]: framework for Intelligent User
Profiling based on:
         Word Sense Disambiguation for detecting relevant concepts
         representing user interests
         Naive Bayes text categorization algorithm for learning user profiles
         from disambiguated documents
         Concept-based user profiles:
                 Bag-of-Synset: a synset vector corresponds to a document, instead of a
                 word vector
                 synsets provided by JIGSAW
                 recognition of n-grams
                 synonyms represented by the same synsets



Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                                  29/05/09   39 / 55
WSD at Work   Information Filtering: ITR - ITem Recommender


ITR evaluation

         EachMovie Dataset: Project conducted by Compaq Research Centre
         (1996-1997)
         Dataset of user-movie ratings
                 About 2.8 millions ratings
                 72,916 users
                 1,628 items (movies) divided in 10 categories (Genre)
                 Discrete rating on a 6-point scale
                 Movie content crawled from the Internet Movie Database (IMDb)
         10 movie categories/genres
                 933 randomly selected users
                 100 users for each category, only for Category 2 Animation, 33 users
                 selected
                 Each user rated between 30 and 100 movies
         Goal: compare performance of keyword-based profiles vs.
         synset-based profiles

Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                                  29/05/09   40 / 55
WSD at Work     Information Filtering: ITR - ITem Recommender


ITR evaluation results
Performance of the two versions of ITR on 10 different ‘genre’ EachMovie
dataset
                                          Precision             Recall                  F1
                           Id             ITR      ITR         ITR      ITR       ITR        ITR
                          Genre           BOW      BOS      BOW        BOS       BOW         BOS

                            1             0.70    0.74      0.83       0.89      0.76        0.80
                            2             0.51    0.57      0.62       0.70      0.54        0.61
                            3             0.76    0.86      0.84       0.96      0.79        0.91
                            4             0.92    0.93      0.99       0.99      0.96        0.96
                            5             0.56    0.67      0.66       0.80      0.59        0.72
                            6             0.75    0.78      0.89       0.92      0.81        0.84
                            7             0.58    0.73      0.67       0.83      0.71        0.79
                            8             0.53    0.72      0.65       0.89      0.58        0.79
                            9             0.70    0.77      0.83       0.91      0.75        0.83
                           10             0.71    0.75      0.86       0.91      0.77        0.81
                          Mean            0.67    0.75      0.78       0.88      0.73        0.81

Pierpaolo Basile (basilepp@di.uniba.it)                  WSD and IIA                                   29/05/09   41 / 55
WSD at Work   Information Retrieval: Semantic Search


Information Retrieval Evaluation

Two kinds of evaluation
SemEval-2007 Task 1: indexing of a documents collection for Cross
Language IR [BDG+ 07]
         application-driven task
         fixed cross-language information retrieval system
         participants disambiguate text by assigning WordNet synsets (29,681
         documents)

CLEF 2008: Ad-Hoc Robust WSD task: classical IR benchmark using
Cross Language dataset [BCS08]
         166,726 documents
         160 topics in English and Spanish


Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                                     29/05/09   42 / 55
WSD at Work   Information Retrieval: Semantic Search


SemEval-2007 Task 1 results

SemEval-2007 task 1 results
                                system            IR documents            CLIR
                                no expansion      0.3599                  0.1446
                                full expansion    0.1610                  0.2676
                                1st sense         0.2862                  0.2637
                                ORGANIZERS        0.2886                  0.2664
                                JIGSAW            0.3030                  0.1373
                                PART-B            0.3036                  0.1734

Performance of each system

                          system              precision        recall    attempted
                          ORGANIZERS          0.591            0.566     95.76%
                          JIGSAW              0.484            0.338     69.98%
                          PART-B              0.334            0.186     55.68%


Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                                     29/05/09   43 / 55
WSD at Work   Information Retrieval: Semantic Search


CLEF 2008: system setup



         N-Levels model [BCG+ 08]: each document has N levels of
         representations
         Each level has:
                 local feature weighting
                 local similarity function
         Global ranking function: merges the results of different levels
         N-Levels for CLEF 2008:
                 2 levels: stemming (TF/IDF) and synset (SF/IDF)
                 Global ranking function: Z-Score normalization and CombSUM
                 aggregation strategy




Pierpaolo Basile (basilepp@di.uniba.it)          WSD and IIA                                     29/05/09   44 / 55
WSD at Work   Information Retrieval: Semantic Search


CLEF 2008: results
N-levels results on CLEF 2008
             Run                          MONO     CROSS           N-Levels          WSD           MAP
         MONO1TDnus2f                      X         -                -               -            0.168
          MONO11nus2f                      X         -                -               -            0.192
          MONO12nus2f                      X         -                -               -            0.145
          MONO13nus2f                      X         -                -               -            0.154
          MONO14nus2f                      X         -                -               -            0.068
        MONOwsd1nus2f                      X         -                -               X            0.180
        MONOwsd11nus2f                     X         -                -               X            0.186
        MONOwsd12nus2f                     X         -                X               X            0.220
        MONOwsd13nus2f                     X         -                X               X            0.227
         CROSS1TDnus2f                     X         X                -               -            0.025
          CROSS1nus2f                      X         X                -               -            0.015
        CROSSwsd1nus2f                     X         X                -               X            0.071
        CROSSwsd11nus2f                    X         X                X               X            0.060
        CROSSwsd12nus2f                    X         X                X               X            0.072

Pierpaolo Basile (basilepp@di.uniba.it)           WSD and IIA                                     29/05/09   45 / 55
Conclusions and Future Work


Conclusions


         The problem of word ambiguity into the context of intelligent
         information access is exploited
         Several WSD methods are proposed and evaluated:
                 JIGSAW knowledge-based algorithm
                 HYDE combination of knowledge-based and supervised approaches
                 COMBY combination of unsupervised methods
                 Evaluation: Senseval-3 All Words Task and EVALITA All Words Task
                 Languages different from English: knowledge-based and a hybrid
                 strategy for Italian WSD are proposed
         Evaluation in real application scenarios: Information Filtering and
         Information Retrieval
                 WSD can enhance real applications in the domain of Intelligent
                 Information Access



Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA       29/05/09   46 / 55
Conclusions and Future Work


Future Work




         Include information about a specific domain into the WSD process
         More investigation on the interaction between IR and WSD is needed
                 document expansion
                 query disambiguation/expansion
                 word polysemy
         Other semantic features could be exploited: Named Entity and Entity
         Relation




Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   47 / 55
Conclusions and Future Work




                               That’s all folks!

Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   48 / 55
Conclusions and Future Work


For Further Reading I

        E. Agirre and O.L. de Lacalle.
        Publicly available topic signatures for all WordNet nominal senses.
        In Proceedings of the 4th International Conference on Languages
        Resources and Evaluations (LREC 2004), 2004.
        E. Agirre and O.L. de Lacalle.
        UBC-ALM: Combining k-NN with SVD for WSD.
        pages 342–345, 2007.
        Eneko Agirre and Aitor Soroa.
        Using the Multilingual Central Repository for Graph-Based Word
        Sense Disambiguation.
        In European Language Resources Association (ELRA), editor,
        Proceedings of the Sixth International Language Resources and
        Evaluation (LREC’08), Marrakech, Morocco, may 2008.

Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   49 / 55
Conclusions and Future Work


For Further Reading II


        Pierpaolo Basile, Annalina Caputo, Anna Lisa Gentile, Marco
        Degemmis, Pasquale Lops, and Giovanni Semeraro.
        Enhancing Semantic Search using N-Levels Document Representation.
        In Stephan Bloehdorn, Marko Grobelnik, Peter Mika, and Duc Thanh
        Tran, editors, SemSearch, volume 334 of CEUR Workshop
        Proceedings, pages 29–43. CEUR-WS.org, 2008.
        P. Basile, A. Caputo, and G. Semeraro.
        Uniba-Sense at Clef 2008: SEmantic N-levels Search Engine.
        In F. Borri, A. Nardi, and C. Peters, editors, Results of
        Cross-Language Evaluation Forum 2008 (CLEF 2008), page 9, 2008.
        ISSN: 1818-8044.



Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   50 / 55
Conclusions and Future Work


For Further Reading III


        P. Basile, M. Degemmis, A.L. Gentile, P. Lops, and G. Semeraro.
        UNIBA: JIGSAW Algorithm for Word Sense Disambiguation.
        In Proceedings of the 4th ACL 2007 International Worshop on
        Semantic Evaluations (SemEval-2007), pages 398–401. Association for
        Computational Linguistics (ACL), 2007.
        P. Basile, M. de Gemmis, A.L. Gentile, L. Iaquinta, P. Lops, and
        G. Semeraro.
        META - MultilanguagE Text Analyzer.
        In Proceedings of the Language and Speech Technnology Conference -
        LangTech 2008, Rome, Italy, February 28-29, pages 137–140, 2008.




Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   51 / 55
Conclusions and Future Work


For Further Reading IV

        Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni
        Semeraro.
        Combining Knowledge-based Methods and Supervised Learning for
        Effective Italian Word Sense Disambiguation.
        In Rodolfo Delmonte and Johan Bos, editors, Symposium on
        Semantics in Systems for Text Processing, STEP 2008, Venice, Italy,
        September 22-24, 2008, Proceedings, volume 1 of Research in
        Computational Semantics, pages 5–16. College Publications, 2008.
        S. Banerjee and T. Pedersen.
        An Adapted Lesk Algorithm for Word Sense Disambiguation Using
        WordNet.
        In CICLing ’02: Proceedings of the Third International Conference on
        Computational Linguistics and Intelligent Text Processing, pages
        136–145, London, UK, 2002. Springer-Verlag.

Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   52 / 55
Conclusions and Future Work


For Further Reading V


        S. Banerjee and T. Pedersen.
        Extended gloss overlaps as measure of semantic relatedness.
        In Proceedings of 18th International Joint Conference on Artificial
        Intelligence (IJCAI), pages 805–810, Acapulco Mexico, 2003.
        Pierpaolo Basile and Giovanni Semeraro.
        JIGSAW: An algorithm for word sense disambiguation.
        Intelligenza Artificiale, 4(2):53–54, 2007.
        M. Lesk.
        Automatic sense disambiguation using machine readable dictionaries:
        how to tell a pine cone from an ice cream cone.
        In Proceedings of ACM SIGDOC Conference, pages 24–26, 1986.



Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   53 / 55
Conclusions and Future Work


For Further Reading VI


        D. Martinez, E. Agirre, and X. Wang.
        Word relatives in context for word sense disambiguation.
        In Proc. of the 2006 Australasian Language Technology Workshop,
        pages 42–50, 2006.
        G. A. Miller.
        WordNet: a lexical database for English.
        Commun. ACM, 38(11):39–41, 1995.
        P. Resnik.
        Disambiguating noun groupings with respect to WordNet senses.
        In Proceedings of the Third Workshop on Very Large Corpora, pages
        54–68. Association for Computational Linguistics, 1995.



Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   54 / 55
Conclusions and Future Work


For Further Reading VII




        G. Semeraro, M. Degemmis, P. Lops, and P. Basile.
        Combining Learning and Word Sense Disambiguation for Intelligent
        User Profiling.
        In Proceedings of the Twentieth International Joint Conference on
        Artificial Intelligence IJCAI-07, pages 2856–2861, 2007.
        M. Kaufmann, San Francisco, California. ISBN: 978-I-57735-298-3.




Pierpaolo Basile (basilepp@di.uniba.it)                WSD and IIA   29/05/09   55 / 55

Contenu connexe

Similaire à Word Sense Disambiguation and Intelligent Information Access

Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloPierpaolo Basile
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriData Driven Innovation
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsPasquale Lops
 
Collaboration and improvisation
Collaboration and improvisationCollaboration and improvisation
Collaboration and improvisationChristian Voigt
 
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...Roberto Casadei
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...Paris Open Source Summit
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognitiondiannepatricia
 
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...SignWriting For Sign Languages
 
Objectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative ConnotationsObjectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative ConnotationsBeth Johnson
 
Talks submitted
Talks submittedTalks submitted
Talks submittedKim Minh
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
 
Technology and the changing face of education
Technology and the changing face of educationTechnology and the changing face of education
Technology and the changing face of educationdwesting
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
DESIGNING POSTERS: Strategies for Communicating Scientific Projects
DESIGNING POSTERS: Strategies for Communicating Scientific ProjectsDESIGNING POSTERS: Strategies for Communicating Scientific Projects
DESIGNING POSTERS: Strategies for Communicating Scientific ProjectsMaryam Bolouri
 
Deep misconceptions and the myth of data driven NLU
Deep misconceptions and the myth of data driven NLUDeep misconceptions and the myth of data driven NLU
Deep misconceptions and the myth of data driven NLUWalid Saba
 
Informal Learning at the Workplace via Adaptive Video
Informal Learning at the Workplace via Adaptive VideoInformal Learning at the Workplace via Adaptive Video
Informal Learning at the Workplace via Adaptive VideoNicolaescu Petru
 

Similaire à Word Sense Disambiguation and Intelligent Information Access (20)

Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaolo
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico Neri
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
 
Collaboration and improvisation
Collaboration and improvisationCollaboration and improvisation
Collaboration and improvisation
 
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
Programming (and Learning) Self-Adaptive & Self-Organising Behaviour with Sca...
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
 
Objectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative ConnotationsObjectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative Connotations
 
Talks submitted
Talks submittedTalks submitted
Talks submitted
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
Technology and the changing face of education
Technology and the changing face of educationTechnology and the changing face of education
Technology and the changing face of education
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Gic2011 aula0-ingles
Gic2011 aula0-inglesGic2011 aula0-ingles
Gic2011 aula0-ingles
 
DESIGNING POSTERS: Strategies for Communicating Scientific Projects
DESIGNING POSTERS: Strategies for Communicating Scientific ProjectsDESIGNING POSTERS: Strategies for Communicating Scientific Projects
DESIGNING POSTERS: Strategies for Communicating Scientific Projects
 
Deep misconceptions and the myth of data driven NLU
Deep misconceptions and the myth of data driven NLUDeep misconceptions and the myth of data driven NLU
Deep misconceptions and the myth of data driven NLU
 
Informal Learning at the Workplace via Adaptive Video
Informal Learning at the Workplace via Adaptive VideoInformal Learning at the Workplace via Adaptive Video
Informal Learning at the Workplace via Adaptive Video
 

Plus de Pierpaolo Basile

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsPierpaolo Basile
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaPierpaolo Basile
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesPierpaolo Basile
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterPierpaolo Basile
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017Pierpaolo Basile
 
Diachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramDiachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramPierpaolo Basile
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOPierpaolo Basile
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationPierpaolo Basile
 

Plus de Pierpaolo Basile (20)

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisions
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storia
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language games
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian Tweets
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexing
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017
 
Diachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramDiachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google Ngram
 
Diachronic Analysis
Diachronic AnalysisDiachronic Analysis
Diachronic Analysis
 
(Open) data hacking
(Open) data hacking(Open) data hacking
(Open) data hacking
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing Machine
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spaces
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional Spaces
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHO
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutation
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Word Sense Disambiguation and Intelligent Information Access

  • 1. Word Sense Disambiguation and Intelligent Information Access Pierpaolo Basile basilepp@di.uniba.it Department of Computer Science University of Bari “A. Moro” (ITALY) 29 May 2009 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 1 / 55
  • 2. Outline 1 Introduction Word Sense Disambiguation Intelligent Information Access 2 WSD Strategies JIGSAW JIGSAWz HYDE : a hybrid strategy for WSD COMBY : a combined strategy for WSD 3 WSD at Work Information Filtering: ITR - ITem Recommender Information Retrieval: Semantic Search 4 Conclusions and Future Work Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 2 / 55
  • 3. Introduction Word Sense Disambiguation Word Sense Disambiguation Word Sense Disambiguation (WSD) is the problem of selecting a sense for a word from a set of predefined possibilities sense inventory usually comes from a dictionary or thesaurus polysemous word: having more than one possible meaning, e.g. bank1 : 1 sloping land (especially the slope beside a body of water); 2 a financial institution that accepts deposits and channels the money into lending activities; 3 a long ridge or pile; 4 an arrangement of similar objects in a row or in tiers; knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches 1 First four meanings in WordNet 3.0 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 3 / 55
  • 4. Introduction Word Sense Disambiguation Brief History 1949: noted as problem for Machine Translation 1950s - 1960s: semantic networks, AI approaches 1970s - 1980s: rule based systems, rely on hand crafted knowledge sources 1990s: WordNet, corpus based approaches, sense tagged text 2000s: Hybrid Systems, minimizing or eliminating use of sense tagged text, taking advantage of the Web, domain WSD Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 4 / 55
  • 5. Introduction Intelligent Information Access Intelligent Information Access Problems Explosion of irrelevant, unclear, inaccurate information Users overloaded with a large amount of information impossible to absorb Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 5 / 55
  • 6. Introduction Intelligent Information Access Intelligent Information Access Problems Explosion of irrelevant, unclear, inaccurate information Users overloaded with a large amount of information impossible to absorb Consequences Searching is time consuming Need for intelligent solutions able to support users in finding documents Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 5 / 55
  • 7. Introduction Intelligent Information Access Intelligent Information Access Problems Explosion of irrelevant, unclear, inaccurate information Users overloaded with a large amount of information impossible to absorb Consequences Searching is time consuming Need for intelligent solutions able to support users in finding documents Solution Intelligent Information Access: user-centric and semantically rich approach to access information Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 5 / 55
  • 8. Introduction Intelligent Information Access WSD in Information Access Machine Translation Translate “plant” from English to Italian Is it a “pianta” or a “impianto/stabilimento”? Information Retrieval Find all Web Pages about “bat” The sport equipment or the nocturnal mammal ? Question Answering What is George Millers position on gun control? The psychologist or US congressman? Knowledge Acquisition Add to KB: Herb Bergson is the mayor of Duluth, Minnesota or Georgia? Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 6 / 55
  • 9. Introduction Intelligent Information Access WSD and Intelligent Information Access Natural Language Processing can enhance Intelligent Information Access keywords not appropriate for representing content, due to polysemy, synonymy, multi-word concepts WSD provides semantics: concepts identification in documents Humans are able to comprehend the meaning of a text Natural Language Processing and WSD convert human linguistic abilities into more formal representations that are easier for computer programs to understand Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 7 / 55
  • 10. WSD Strategies JIGSAW JIGSAW JIGSAW Knowledge-based WSD algorithm Exploits WordNet senses Three different strategies for: nouns, verbs and adjectives/adverbs Main motivation: the effectiveness of a WSD algorithm is strongly influenced by the PoS-tag Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 8 / 55
  • 11. WSD Strategies JIGSAW JIGSAW JIGSAW Knowledge-based WSD algorithm Exploits WordNet senses Three different strategies for: nouns, verbs and adjectives/adverbs Main motivation: the effectiveness of a WSD algorithm is strongly influenced by the PoS-tag WordNet [Mil95] Lexical reference database designed by Princeton University English nouns, verbs, adverbs and adjectives organized into SYNonym SETs (SYNSET) Semantic relations among synsets Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 8 / 55
  • 12. WSD Strategies JIGSAW WordNet Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 9 / 55
  • 13. WSD Strategies JIGSAW WordNet 1 Synset Rank 2 Occurrences in SemCor 3 Offset 4 SYNonym-SET Gloss: synset definition Examples of usage Synset description = gloss + examples Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 10 / 55
  • 14. WSD Strategies JIGSAW JIGSAW algorithm The algorithm Input d = (w1 , w2 , . . . , wh ) document Output X = (s1 , s2 , . . . , sk ) k ≤h each si obtained by disambiguating wi based on the context of each word some words not recognized by WordNet groups of words recognized as a single concept Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 11 / 55
  • 15. WSD Strategies JIGSAW JIGSAWnouns The idea Based on Resnik [Res95] algorithm for disambiguating noun groups Given a set of nouns N = {n1 , n2 , . . . , nn } from document d each ni has an associated sense inventory Si = {si1 , si2 , . . . , sik } of possible senses Goal: assigning each wi with the most appropriate sense sih ∈ Si , maximizing the similarity of ni with the other nouns in N The strategy Computing Semantic Similarity exploiting “noun hierarchy” Give more credit to senses that are hyponym of the Most Specific Subsumer (MSS) Combine MSS information with Semantic Similarity Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 12 / 55
  • 16. WSD Strategies JIGSAW JIGSAWnouns Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 13 / 55
  • 17. WSD Strategies JIGSAW JIGSAWnouns Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 13 / 55
  • 18. WSD Strategies JIGSAW JIGSAWnouns Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 13 / 55
  • 19. WSD Strategies JIGSAW JIGSAWnouns Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 13 / 55
  • 20. WSD Strategies JIGSAW JIGSAWnouns Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 13 / 55
  • 21. WSD Strategies JIGSAW JIGSAWnouns Final synset score Linear combination between semantic similarity (with MSS information) and synset rank in WordNet: ϕ(sik ) = α ∗ sim(sik , N) + β ∗ R(k) (α + β = 1) (1) R(k) takes into account the synset rank in WordNet: k R(k) = 1 − 0.8 ∗ (2) n−1 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 14 / 55
  • 22. WSD Strategies JIGSAW JIGSAWnouns Differences between JIGSAWnouns and Resnik Leacock-Chodorow measure to compute similarity (instead of Information Content) Gaussian factor G, which takes into account the distance between words in the text Factor R, which takes into account the synset frequency score in WordNet Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 15 / 55
  • 23. WSD Strategies JIGSAW JIGSAWverbs The idea Try to establish a relation between verbs and nouns (distinct IS-A hierarchies in WordNet) Verb wi disambiguated using: nouns in the context C of wi nouns into the description (gloss + WordNet usage examples) of each candidate synset for wi Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 16 / 55
  • 24. WSD Strategies JIGSAW JIGSAWverbs The idea Try to establish a relation between verbs and nouns (distinct IS-A hierarchies in WordNet) Verb wi disambiguated using: nouns in the context C of wi nouns into the description (gloss + WordNet usage examples) of each candidate synset for wi The strategy For each candidate synset sik of wi computes nouns(i, k): the set of nouns in the description for sik for each wj in C and each synset sik computes the highest similarity maxjk maxjk is the highest similarity value for wj wrt the nouns related to the k-th sense for wi (using Leacock-Chodorow measure) using G and R factors (JIGSAWnouns ) to weight semantic similarity Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 16 / 55
  • 25. WSD Strategies JIGSAW JIGSAWverbs : The algorithm I play basketball and soccer. wi = play C = {basketball, soccer } 1 (70) play - (participate in games or sport; “We played hockey all afternoon”; “play cards”; “Pele played for the Brazilian teams in many important matches”) 2 (29) play - (play on an instrument; “The band played all night long”) 3 ... Build nouns set for each sik : 1 nouns(play,1): game, sport, hockey, afternoon, card, team, match 2 nouns(play,2): instrument, band, night 3 ... Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 17 / 55
  • 26. WSD Strategies JIGSAW JIGSAWverbs : The algorithm wi = play C = {basketball, soccer } nouns(play,1): game, sport, hockey, afternoon, card, team, match Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 18 / 55
  • 27. WSD Strategies JIGSAW JIGSAWverbs : The algorithm Finally, an overall similarity score, ϕ(i, k), among sik and the whole context C is computed: wj ∈C Gauss(position(wi ), position(wj )) · maxjk ϕ(i, k) = R(k) · (3) h Gauss(position(wi ), position(wh )) The synset assigned to wi is the one with the highest ϕ value Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 19 / 55
  • 28. WSD Strategies JIGSAW JIGSAWothers Based on the WSD algorithm proposed by Banerjee and Pedersen [BP02, BP03] (inspired to Lesk [Les86]) Idea: computes the overlap between the glosses of each candidate sense (including related synsets) for the target word to the glosses of all words in its context assigns the synset with the highest overlap score if ties occur, the most common synset in WordNet is chosen Given the sentence: “I bought a bottle of aged wine” the context is C = {bottle, wine} the first two synsets for aged are: 1 (advanced in years; ”aged members of the society”; ”elderly residents could remember the construction of the first skyscraper”; ”senior citizen”); 2 (of wines, fruit, cheeses; having reached a desired or final condition; ”mature well-aged cheeses”) Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 20 / 55
  • 29. WSD Strategies JIGSAWz JIGSAWz : ZIPF distribution Zipf’s law: the frequency of an event is inversely proportional to its rank in the frequency table similar to words distribution: the most frequent word occurs approximately twice the second most frequent word, which occurs twice the fourth most frequent word, ... Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 21 / 55
  • 30. WSD Strategies JIGSAWz JIGSAWz Modify R factor using ZIPF distribution: 1/k s f (k; N; s) = N (4) s n=1 1/n where: N is the number of word meanings k is the word meaning rank. We adopt the WordNet synset rank s is the value of the exponent characterizing the distribution Compute the frequency of the word meaning in SemCor Approximate s using the Pearson’s chi-square χ2 test method Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 22 / 55
  • 31. WSD Strategies JIGSAWz NLP tools for the evaluation WSD requires pre-processing steps: tokenization, stemming, PoS-tagging and lemmatization META (MultilanguagE Text Analyzer) [BdG+ 08] implements several NLP tasks and provides tools for semantic indexing of documents: Text normalization and tokenization Stemming (SNOWBALL library) Lemmatization English: WordNet Morphological Analyzer Italian: Morph-it! and Lemmagen tool (Ripple Down Rule learning) POS-tagging based on ACOPOST T3 (HMM - Hidden Markov Model) Entity recognition based on SVM classifier (YAMCHA) WSD: English/Italian Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 23 / 55
  • 32. WSD Strategies JIGSAWz JIGSAW Evaluation SensEval-3 All-Words Task disambiguation of all words contained into English texts sense inventory: WordNet 1.7.1 2.041 words inter-annotators agreement rate was approximately 72,5% EVALITA WSD All-Words Task disambiguation of all words contained into Italian texts sense inventory: ItalWordNet about 5,000 words no information about inter-annotators agreement rate Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 24 / 55
  • 33. WSD Strategies JIGSAWz JIGSAW Evaluation: Results JIGSAW at SensEval-3 All-Words Task system P R A(%) F 1st sense 0.624 0.651 100 0.651 BestUnsupervised 0.583 0.582 100 0.582 JIGSAW 0.525 0.525 100 0.525 JIGSAWz 0.606 0.606 100 0.606 JIGSAW at EVALITA WSD All-Words Task [BS07] system P R A(%) F 1st sense 0.648 0.614 94.7 0.631 Random 0.483 0.458 94.7 0.470 JIGSAW 0.598 0.567 94.7 0.582 JIGSAWz 0.639 0.606 94.7 0.622 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 25 / 55
  • 34. WSD Strategies HYDE : a hybrid strategy for WSD Supervised Learning for WSD Supervised Learning for WSD Exploits machine learning techniques to induce models of word usage from large text collections annotated corpora are tagged manually using semantic classes chosen from a sense inventory each sense-tagged occurrence of a particular word is transformed into a feature vector, which is then used in an automatic learning process Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 26 / 55
  • 35. WSD Strategies HYDE : a hybrid strategy for WSD Problems and Motivation Knowledge-based methods outperformed by supervised methods high coverage: applicable to all words in unrestricted text Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 27 / 55
  • 36. WSD Strategies HYDE : a hybrid strategy for WSD Problems and Motivation Knowledge-based methods outperformed by supervised methods high coverage: applicable to all words in unrestricted text Supervised methods high precision low coverage: applicable only to those words for which annotated corpora are available Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 27 / 55
  • 37. WSD Strategies HYDE : a hybrid strategy for WSD Problems and Motivation Knowledge-based methods outperformed by supervised methods high coverage: applicable to all words in unrestricted text Supervised methods high precision low coverage: applicable only to those words for which annotated corpora are available Solution HYDE : combination of Knowledge-based (JIGSAW ) methods and Supervised Learning can improve WSD effectiveness [BdLS08] Knowledge-based methods improve coverage Supervised Learning strategies improve precision Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 27 / 55
  • 38. WSD Strategies HYDE : a hybrid strategy for WSD Supervised Learning Exploited features nouns: the first noun, verb or adjective before the target noun, within a (left) window of at most three words to the left and its PoS-tag verbs: the first word before and the first word after the target verb and their PoS-tag adjectives: six nouns (before and after the target adjective) adverbs: the same as adjectives but adjectives rather than nouns are used Training corpus: MultiSemCor 1 Italian translations of the SemCor texts 2 automatically aligning Italian and English texts 3 automatically transferring the word sense annotations from English (WordNet) to the aligned Italian (MultiWordNet) words Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 28 / 55
  • 39. WSD Strategies HYDE : a hybrid strategy for WSD Supervised Learning K-NN algorithm for WSD Learning: build a vector for each annotated word Classification: build a vector vf for each word in the text compute similarity between vf and the training vectors rank the training vectors in decreasing order according to the similarity value choose the most frequent sense in the first K vectors Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 29 / 55
  • 40. WSD Strategies HYDE : a hybrid strategy for WSD HYDE Evaluation Dataset: EVALITA WSD All-Words Task Dataset Two strategies: Integrating JIGSAW into a supervised learning method supervised method is applied to words for which training examples are provided JIGSAW is applied to words not covered by the first step Integrating supervised learning into JIGSAW JIGSAW is applied to assign a sense to the words which can be disambiguated with a high level of confidence remaining words are disambiguated by the supervised method Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 30 / 55
  • 41. WSD Strategies HYDE : a hybrid strategy for WSD HYDE Evaluation: Baselines Baselines for EVALITA WSD All-Words Task Dataset Setting P R F A (%) 1st sense 0.648 0.614 0.631 94.7 Random 0.484 0.484 0.484 100.0 JIGSAW 0.639 0.606 0.622 94.7 K-NN 0.797 0.336 0.473 42.2 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 31 / 55
  • 42. WSD Strategies HYDE : a hybrid strategy for WSD HYDE : Evaluation results 1st sense (0.631), Random (0.470), JIGSAW (0.622), K-NN (0.484) Integrating JIGSAW into a supervised learning method Setting P R F A (%) K-NN + JIGSAW 0.624 0.591 0.607 94.7 K-NN + JIGSAW (ϕ ≥ 0.80) 0.693 0.337 0.453 48.6 K-NN + JIGSAW (ϕ ≥ 0.60) 0.680 0.410 0.512 60.3 K-NN + JIGSAW (ϕ ≥ 0.40) 0.652 0.452 0.534 69.3 K-NN + JIGSAW (ϕ ≥ 0.20) 0.652 0.452 0.534 69.3 Integrating supervised learning into JIGSAW Setting P R F A (%) JIGSAW (ϕ ≥ 0.80) + K-NN 0.715 0.392 0.556 55.6 JIGSAW (ϕ ≥ 0.60) + K-NN 0.688 0.440 0.537 64.0 JIGSAW (ϕ ≥ 0.40) + K-NN 0.651 0.484 0.555 74.4 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 32 / 55
  • 43. WSD Strategies COMBY : a combined strategy for WSD COMBY : a combined strategy for WSD COMBY WSD framework: combines the output data of several WSD algorithms run a set of WSD algorithms on a sense-annotated corpus (TRC ) obtain a set of output data O = {o1 , o2 , .., oN } where each oi is the output provided by the i − th algorithm each output oi contains for each word instance wj in TRC a list of pairs (< synset1 , score1 >, ..., < synsetk , scorek >, ..., < synsetl , scorel >) combination step: run WSD algorithms on a not sense-annotated corpus (TSC ): run the WSD algorithms on a different dataset (TSC ) obtain a set of output data combination of outputs: voting strategies and supervised methods Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 33 / 55
  • 44. WSD Strategies COMBY : a combined strategy for WSD Combination strategies Voting strategies 1 simple voting: the sense that has the majority of votes is chosen 2 simple voting using the information about the synset score: the vote for each synset is the sum of all scores in each WSD system 3 simple voting using different weights for each system according to the WSD performance in TRC Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 34 / 55
  • 45. WSD Strategies COMBY : a combined strategy for WSD Combination strategies Voting strategies 1 simple voting: the sense that has the majority of votes is chosen 2 simple voting using the information about the synset score: the vote for each synset is the sum of all scores in each WSD system 3 simple voting using different weights for each system according to the WSD performance in TRC Supervised methods 1 several classification algorithms using the WEKA package 2 Support Vector Machine adopting the open-source software LIBSVM 3 using unsupervised predictions into a supervised system Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 34 / 55
  • 46. WSD Strategies COMBY : a combined strategy for WSD COMBY evaluation Dataset TRC training: SemCor 1.7.1 TRS testing: SensEval-3 All-Words Task 1s t sense baseline: (F=0,651) Involved WSD systems JIGSAW : a knowledge-based WSD algorithm that exploits WordNet as knowledge-base. AitorKB: graph-based method for performing knowledge-based WSD [AS08] TS: exploits Topic Signatures to disambiguate nouns [AdL04] RIC : automatically builds examples from the Web using a new approach based on the “monosemous relative” method [MAW06] Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 35 / 55
  • 47. WSD Strategies COMBY : a combined strategy for WSD COMBY evaluation: voting strategies Performance of each systems System P R F A(%) JIGSAW 0.554 0.554 0.554 100.0 TS 0.458 0.215 0.292 46.9 RIC 0.397 0.396 0.396 99.8 AitorKB 0.600 0.600 0.600 100.0 Voting strategies Strategy P R F A(%) Simple 0.587 0.587 0.587 100.0 Z-Score 0.575 0.575 0.575 100.0 Rank 0.615 0.615 0.615 100.0 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 36 / 55
  • 48. WSD Strategies COMBY : a combined strategy for WSD COMBY evaluation: supervised combination Combination using WEKA Classifier P R F A(%) Naive Bayes 0.653 0.653 0.653 100.0 Decision Trees 0.649 0.649 0.649 100.00 Ada Boost 0.647 0.647 0.647 100.0 K-NN 0.643 0.643 0.643 100.0 SMO (SVM) 0.653 0.653 0.653 100.0 Combination using LIBSVM and a supervised system (Knn.ehu [AdL07]) System P R F A(%) LIBSVM 0.654 0.654 0.654 100.0 Knn.ehu 0.667 0.667 0.667 100.0 Knn.ehu+predictions 0.671 0.671 0.671 100.0 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 37 / 55
  • 49. WSD at Work WSD at Work Exploit WSD techniques in real application scenarios Information Filtering Content-based recommending system User profiles compared against item descriptions to provide recommendations Problems: keywords not appropriate for representing content, due to polysemy, synonymy, multi-word concepts Information Retrieval Selection of documents, from a fixed collection, which satisfy a user’s one-off information need (query) Problems: polysemy and synonymy Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 38 / 55
  • 50. WSD at Work Information Filtering: ITR - ITem Recommender Information Filtering: ITR - ITem Recommender ITR - ITem Recommender [SDLB07]: framework for Intelligent User Profiling based on: Word Sense Disambiguation for detecting relevant concepts representing user interests Naive Bayes text categorization algorithm for learning user profiles from disambiguated documents Concept-based user profiles: Bag-of-Synset: a synset vector corresponds to a document, instead of a word vector synsets provided by JIGSAW recognition of n-grams synonyms represented by the same synsets Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 39 / 55
  • 51. WSD at Work Information Filtering: ITR - ITem Recommender ITR evaluation EachMovie Dataset: Project conducted by Compaq Research Centre (1996-1997) Dataset of user-movie ratings About 2.8 millions ratings 72,916 users 1,628 items (movies) divided in 10 categories (Genre) Discrete rating on a 6-point scale Movie content crawled from the Internet Movie Database (IMDb) 10 movie categories/genres 933 randomly selected users 100 users for each category, only for Category 2 Animation, 33 users selected Each user rated between 30 and 100 movies Goal: compare performance of keyword-based profiles vs. synset-based profiles Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 40 / 55
  • 52. WSD at Work Information Filtering: ITR - ITem Recommender ITR evaluation results Performance of the two versions of ITR on 10 different ‘genre’ EachMovie dataset Precision Recall F1 Id ITR ITR ITR ITR ITR ITR Genre BOW BOS BOW BOS BOW BOS 1 0.70 0.74 0.83 0.89 0.76 0.80 2 0.51 0.57 0.62 0.70 0.54 0.61 3 0.76 0.86 0.84 0.96 0.79 0.91 4 0.92 0.93 0.99 0.99 0.96 0.96 5 0.56 0.67 0.66 0.80 0.59 0.72 6 0.75 0.78 0.89 0.92 0.81 0.84 7 0.58 0.73 0.67 0.83 0.71 0.79 8 0.53 0.72 0.65 0.89 0.58 0.79 9 0.70 0.77 0.83 0.91 0.75 0.83 10 0.71 0.75 0.86 0.91 0.77 0.81 Mean 0.67 0.75 0.78 0.88 0.73 0.81 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 41 / 55
  • 53. WSD at Work Information Retrieval: Semantic Search Information Retrieval Evaluation Two kinds of evaluation SemEval-2007 Task 1: indexing of a documents collection for Cross Language IR [BDG+ 07] application-driven task fixed cross-language information retrieval system participants disambiguate text by assigning WordNet synsets (29,681 documents) CLEF 2008: Ad-Hoc Robust WSD task: classical IR benchmark using Cross Language dataset [BCS08] 166,726 documents 160 topics in English and Spanish Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 42 / 55
  • 54. WSD at Work Information Retrieval: Semantic Search SemEval-2007 Task 1 results SemEval-2007 task 1 results system IR documents CLIR no expansion 0.3599 0.1446 full expansion 0.1610 0.2676 1st sense 0.2862 0.2637 ORGANIZERS 0.2886 0.2664 JIGSAW 0.3030 0.1373 PART-B 0.3036 0.1734 Performance of each system system precision recall attempted ORGANIZERS 0.591 0.566 95.76% JIGSAW 0.484 0.338 69.98% PART-B 0.334 0.186 55.68% Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 43 / 55
  • 55. WSD at Work Information Retrieval: Semantic Search CLEF 2008: system setup N-Levels model [BCG+ 08]: each document has N levels of representations Each level has: local feature weighting local similarity function Global ranking function: merges the results of different levels N-Levels for CLEF 2008: 2 levels: stemming (TF/IDF) and synset (SF/IDF) Global ranking function: Z-Score normalization and CombSUM aggregation strategy Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 44 / 55
  • 56. WSD at Work Information Retrieval: Semantic Search CLEF 2008: results N-levels results on CLEF 2008 Run MONO CROSS N-Levels WSD MAP MONO1TDnus2f X - - - 0.168 MONO11nus2f X - - - 0.192 MONO12nus2f X - - - 0.145 MONO13nus2f X - - - 0.154 MONO14nus2f X - - - 0.068 MONOwsd1nus2f X - - X 0.180 MONOwsd11nus2f X - - X 0.186 MONOwsd12nus2f X - X X 0.220 MONOwsd13nus2f X - X X 0.227 CROSS1TDnus2f X X - - 0.025 CROSS1nus2f X X - - 0.015 CROSSwsd1nus2f X X - X 0.071 CROSSwsd11nus2f X X X X 0.060 CROSSwsd12nus2f X X X X 0.072 Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 45 / 55
  • 57. Conclusions and Future Work Conclusions The problem of word ambiguity into the context of intelligent information access is exploited Several WSD methods are proposed and evaluated: JIGSAW knowledge-based algorithm HYDE combination of knowledge-based and supervised approaches COMBY combination of unsupervised methods Evaluation: Senseval-3 All Words Task and EVALITA All Words Task Languages different from English: knowledge-based and a hybrid strategy for Italian WSD are proposed Evaluation in real application scenarios: Information Filtering and Information Retrieval WSD can enhance real applications in the domain of Intelligent Information Access Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 46 / 55
  • 58. Conclusions and Future Work Future Work Include information about a specific domain into the WSD process More investigation on the interaction between IR and WSD is needed document expansion query disambiguation/expansion word polysemy Other semantic features could be exploited: Named Entity and Entity Relation Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 47 / 55
  • 59. Conclusions and Future Work That’s all folks! Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 48 / 55
  • 60. Conclusions and Future Work For Further Reading I E. Agirre and O.L. de Lacalle. Publicly available topic signatures for all WordNet nominal senses. In Proceedings of the 4th International Conference on Languages Resources and Evaluations (LREC 2004), 2004. E. Agirre and O.L. de Lacalle. UBC-ALM: Combining k-NN with SVD for WSD. pages 342–345, 2007. Eneko Agirre and Aitor Soroa. Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation. In European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, may 2008. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 49 / 55
  • 61. Conclusions and Future Work For Further Reading II Pierpaolo Basile, Annalina Caputo, Anna Lisa Gentile, Marco Degemmis, Pasquale Lops, and Giovanni Semeraro. Enhancing Semantic Search using N-Levels Document Representation. In Stephan Bloehdorn, Marko Grobelnik, Peter Mika, and Duc Thanh Tran, editors, SemSearch, volume 334 of CEUR Workshop Proceedings, pages 29–43. CEUR-WS.org, 2008. P. Basile, A. Caputo, and G. Semeraro. Uniba-Sense at Clef 2008: SEmantic N-levels Search Engine. In F. Borri, A. Nardi, and C. Peters, editors, Results of Cross-Language Evaluation Forum 2008 (CLEF 2008), page 9, 2008. ISSN: 1818-8044. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 50 / 55
  • 62. Conclusions and Future Work For Further Reading III P. Basile, M. Degemmis, A.L. Gentile, P. Lops, and G. Semeraro. UNIBA: JIGSAW Algorithm for Word Sense Disambiguation. In Proceedings of the 4th ACL 2007 International Worshop on Semantic Evaluations (SemEval-2007), pages 398–401. Association for Computational Linguistics (ACL), 2007. P. Basile, M. de Gemmis, A.L. Gentile, L. Iaquinta, P. Lops, and G. Semeraro. META - MultilanguagE Text Analyzer. In Proceedings of the Language and Speech Technnology Conference - LangTech 2008, Rome, Italy, February 28-29, pages 137–140, 2008. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 51 / 55
  • 63. Conclusions and Future Work For Further Reading IV Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro. Combining Knowledge-based Methods and Supervised Learning for Effective Italian Word Sense Disambiguation. In Rodolfo Delmonte and Johan Bos, editors, Symposium on Semantics in Systems for Text Processing, STEP 2008, Venice, Italy, September 22-24, 2008, Proceedings, volume 1 of Research in Computational Semantics, pages 5–16. College Publications, 2008. S. Banerjee and T. Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In CICLing ’02: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, pages 136–145, London, UK, 2002. Springer-Verlag. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 52 / 55
  • 64. Conclusions and Future Work For Further Reading V S. Banerjee and T. Pedersen. Extended gloss overlaps as measure of semantic relatedness. In Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI), pages 805–810, Acapulco Mexico, 2003. Pierpaolo Basile and Giovanni Semeraro. JIGSAW: An algorithm for word sense disambiguation. Intelligenza Artificiale, 4(2):53–54, 2007. M. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of ACM SIGDOC Conference, pages 24–26, 1986. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 53 / 55
  • 65. Conclusions and Future Work For Further Reading VI D. Martinez, E. Agirre, and X. Wang. Word relatives in context for word sense disambiguation. In Proc. of the 2006 Australasian Language Technology Workshop, pages 42–50, 2006. G. A. Miller. WordNet: a lexical database for English. Commun. ACM, 38(11):39–41, 1995. P. Resnik. Disambiguating noun groupings with respect to WordNet senses. In Proceedings of the Third Workshop on Very Large Corpora, pages 54–68. Association for Computational Linguistics, 1995. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 54 / 55
  • 66. Conclusions and Future Work For Further Reading VII G. Semeraro, M. Degemmis, P. Lops, and P. Basile. Combining Learning and Word Sense Disambiguation for Intelligent User Profiling. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence IJCAI-07, pages 2856–2861, 2007. M. Kaufmann, San Francisco, California. ISBN: 978-I-57735-298-3. Pierpaolo Basile (basilepp@di.uniba.it) WSD and IIA 29/05/09 55 / 55