SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 63-68
www.iosrjournals.org
www.iosrjournals.org 63 | Page
Review: Semi-Supervised Learning Methods for Word Sense
Disambiguation
Ms. Ankita Sati
(M.Tech Student, Banasthali Vidyapith, Banasthali, Jaipur, India)
Abstract : Word sense disambiguation (WSD) is an open problem of natural language processing, which
governs the process of identifying the appropriate sense of a word in a sentence, when the word has multiple
meanings. Many approaches have been proposed to solve the problem, of which supervised learning
approaches are the most successful. However supervised machine learning are limited by the difficulties in
defining sense tags, acquiring labeled data for training and cannot utilize raw un-annotated data. Semi-
supervised learning has recently become active research area because it require small amount of labeled
training data and sometimes able to improve performance using unlabeled data. In this paper, we discuss the
methods of semi-supervised learning and their performance.
Keywords: -word sense disambiguation, supervised learning, semi-supervised learning.
I. Introduction
In a language, a word can have many different meanings, or senses. For example, bank in English can
either mean a financial institution, or a sloping raised land. The task of word sense disambiguation (WSD) is to
assign the correct sense to such ambiguous words based on the surrounding context. Word sense disambiguation
(WSD) is a central question in the computational linguistics community. It is fundamental to natural language
understanding and once it is improved, many other language processing tasks will benefit from it, such as
information retrieval (IR) and machine translation (MT) (Ide and Veronis, 1998). Many methods have been
proposed to deal with this problem, including supervised learning algorithms (Leacock et al., 1998), semi-
supervised learning algorithms (Yarowsky, 1995), and unsupervised learning algorithms (Schutze, 1998).
Among these methods supervised sense disambiguation has been very successful, but it requires a lot
of manually sense tagged data (labeled data) and cannot utilize raw unannotated data (unlabeled data) that can
be cheaply acquired. Mostly, to achieve good performance, the amount of training data required by supervised
learning is quite large. This is undesirable as hand-labeled training data is expensive and only available for a
small set of words. Fully unsupervised methods do not need the definition of senses and manually sense-tagged
data, but their sense clustering results cannot be directly used in many NLP tasks since there is no sense tag for
each instance in clusters. Considering both the availability of a large amount of unlabeled data and direct use of
word semi-supervised learning has recently become an active research area. It requires only a small amount of
labeled training data and is sometimes able to improve performance using unlabeled data.
II. Semi–Supervised Learning
Semi-supervised learning is a class of machine learning techniques that use both labeled and
unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. The
name “semi-supervised learning” comes from the fact that the data used is between supervised learning (with
completely labeled training data) and unsupervised learning (without any labeled training data). Many machine-
learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled
data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning
problem often requires a skilled human agent and the process is expensive and time consuming whereas
acquisition of unlabeled data is relatively inexpensive and less time consuming.
As in the supervised learning framework, we are given a set of 𝑙 independently identically distributed
examples 𝑥1, … … . . , 𝑥𝑙 ∈ 𝑋 with corresponding labels 𝑦1,… . , 𝑦𝑙 ∈ 𝑌 . Additionally; we are given 𝑢 unlabeled
examples 𝑥𝑙+1, … . , 𝑥𝑙+𝑢 ∈ 𝑋. Semi-supervised learning attempts to make use of this combined information to
surpass the classification performance that could be obtained either by discarding the unlabeled data and doing
supervised learning or by discarding the labels and doing unsupervised learning.
Semi-supervised learning promises higher accuracies with less annotating effort. It is therefore of great theoretic
and practical interest.
Submitted date 19 June 2013 Accepted Date: 24 June 2013
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
www.iosrjournals.org 64 | Page
III. Semi–Supervised Learning Techniques
The semi-supervised or minimally supervised methods are gaining popularity because of their ability to
get by with only a small amount of annotated reference data while often outperforming totally unsupervised
methods on large data sets.
Semi-supervised methods for WSD are characterized in terms of exploiting unlabeled data in learning
procedure with the requirement of predefined sense inventory for target words. They roughly fall into three
categories according to what is used for supervision in learning process:
Using external resources, e.g., thesaurus or lexicons, to disambiguate word senses or automatically
generate sense-tagged corpus, (Lesk, 1986; Lin, 1997; McCarthy et al., 2004; Yarowsky, 1992)
Exploiting the differences between mapping of words to senses in different languages by the use of
bilingual corpora (e.g. parallel corpora or untagged monolingual corpora in two languages) (Brown et al., 1991;
Dagan and Itai, 1994; Diab and Resnik, 2002; Li and Li, 2004; Ng et al., 2003)
Bootstrapping sense tagged seed examples to overcome the bottleneck of acquisition of large sense-
tagged data (Hearst, 1991; Karov and Edelman, 1998; Mihalcea, 2004; Park et al., 2000; Yarowsky, 1995).
Bootstrapping algorithm is commonly used semi-supervised learning method for WSD. It works by iteratively
classifying unlabeled examples and adding confidently classified examples into labeled dataset using a model
learned from augmented labeled dataset in previous iteration. It can be found that the affinity information among
unlabeled examples is not fully explored in this bootstrapping process. Bootstrapping is based on a local
consistency assumption: examples close to labeled examples within same class will have same labels, which is
also the assumption underlying many supervised learning algorithms, such as kNN.
Recently promising families of graph based semi-supervised learning algorithms are introduced, which
can effectively combine unlabeled data with labeled in learning process by exploiting cluster structure in data .
Label propagation algorithm is a graph based semi-supervised learning algorithm (LP algorithm) (Zhu and
Ghahramani, 2002) for WSD, which works by representing labeled and unlabeled examples as vertices in a
connected graph, then iteratively propagating label information from any vertex to nearby vertices through
weighted edges, finally inferring the labels of unlabeled examples after this propagation process converges.
Compared with bootstrapping, LP algorithm is based on a global consistency assumption. Intuitively, if there is
at least one labeled example in each cluster that consists of similar examples, then unlabeled examples will have
the same labels as labeled examples in the same cluster by propagating the label information of any example to
nearby examples according to their proximity.
In Section 3.1 we describe Bootstrapping algorithms .Section 3.2 describe Graph based approach: label
propagation algorithm
3.1 Bootstrapping Algorithms
To partly overcome the knowledge acquisition bottleneck, some methods have been devised for
building sense classifiers when only a few annotated examples are available together with a high quantity of un-
annotated data. These methods are often referred to as bootstrapping methods (Abney 2002, 2004). Among
them, we can highlight co-training (Blum and Mitchell 1998), their derivatives (Collins and Singer 1999; Abney
2002, 2004), and self-training (Nigam and Ghani 2000). Briefly, co-training algorithms work by learning two
complementary classifiers for the classification task trained on a small starting set of labeled examples, which
are then used to annotate new unlabeled examples. From these new examples only the most confident
predictions are added to the set of labeled examples. Each classifier is retrained with the additional training
examples given by the other classifier, and the process repeats. The two complementary classifiers are
constructed by considering two different views of the data (i.e., two different feature codifications), which must
be conditionally independent given the class label. In several NLP tasks, co-training has provided moderate
improvements with respect to not using additional unlabeled examples.
One important aspect of co-training consists on the use of different views to train different classifiers
during the iterative process. While Blum and Mitchell (1998) stated the conditional independence of the views
as a requirement, Abney (2002) shows that this requirement can be relaxed. Moreover, Clark et al. (2003) show
that simply re-training on all newly labeled data can, in some cases, yield comparable results to agreement based
co-training, with only a fraction of the computational cost.
Self-training starts with a set of labeled data, and builds a unique classifier (there are no different views
of the data), which is then used on the unlabeled data. Only those examples with a confidence score over certain
threshold are included in the new labeled set. The classifier is re-trained and the procedure repeated. Note the
classifier uses its own predictions to teach itself. The procedure is also called self-teaching. Self-training has
been applied to several natural language processing tasks. Yarowsky (1995) uses self-training for word sense
disambiguation, e.g. deciding whether the word „plant‟ means a living organism or a factory in a give context.
Self-training is a wrapper algorithm, and is hard to analyze in general.
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
www.iosrjournals.org 65 | Page
Mihalcea (2004) introduced a new bootstrapping scheme that combines co-training with majority
voting, with the effect of smoothing the bootstrapping learning curves and improving the average performance.
However, this approach assumes a comparable distribution of classes between both labeled and unlabeled data.
At each iteration, the class distribution in the labeled data is maintained by keeping a constant ratio across
classes between already labeled examples and newly added examples. This requires one to know a priori the
distribution of sense classes in the unlabeled corpus, which seems unrealistic.
Pham et al. (2005) also experimented with a number of co-training variants on the Senseval-2 lexical
sample and all-words tasks, including the ones in Mihalcea (2004). Although the original co-training algorithm
did not provide any advantage over using only labeled examples, all the sophisticated co-training variants
obtained significant improvements (taking Naïve Bayes as the base learning method). The best reported method
was Spectral Graph Transduction Co-training.
3.1.1 Yarowsky Bootstrapping Algorithm
The Yarowsky algorithm (Yarowsky 1995) was, probably, one of the first and more successful
applications of the bootstrapping approach to NLP tasks.
The Yarowsky algorithm is a simple iterative and incremental algorithm which eliminates the need for
a large training set by relying on a relatively small number of instances of each sense for each lexeme of
interest. These labeled instances are used as seeds to train an initial classifier using any of the supervised
learning methods (decision lists, in this particular case). This initial classifier is then used to extract a larger
training set from the remaining untagged corpus. Only those examples that are classified with a confidence
above a certain threshold are kept as additional training examples for the next iteration. The algorithm repeats
this retraining and re-labeling procedure until convergence (i.e., when no changes are observed from the
previous iteration).
The key to this approach lies in its ability to create a larger training set from a small set of seeds
Regarding the initial seed set, Yarowsky (1995a) discusses several alternatives to find them, ranging from fully
automatic to manually supervised procedures. This initial labeling may have very low coverage (and, thus, low
recall) but it is intended to have extremely high precision. As iterations proceed, the set of training examples
tends to increase, while the pool of unlabeled examples shrinks. In terms of performance, recall improves with
iterations, while precision tends to decrease slightly. Ideally, at convergence, most of the examples will be
labeled with high confidence.
Discourse Properties: The Yarowsky Bootstrapping Algorithm Some well-known discourse properties are at
the core of the learning process and allow the algorithm to generalize to label new examples with confidence.
We refer to: one-sense-per-collocation, language redundancy, and one-sense-per-discourse. First, the
one-sense-per-collocation heuristic gives a good justification for using a decision list as the base learner, since a
DL uses a single rule, based on a single contextual feature, to classify each new example. Second, we know that
language is very redundant. This means that the sense of a concrete example is over determined by a set of
multiple relevant contextual features (or collocations). Some of these collocations are shared among other
examples of the same sense. These intersections allow the algorithm to learn to classify new examples, and, by
transitivity, to increase recall with each iteration. This is the key point in the algorithm for achieving
generalization. For instance, borrowing Yarowsky‟s (1995) original examples, a seed rule may establish that all
the examples of the word plant in the collocation plant life should be labeled with the vegetal sense of plant (as
opposed to its industrial sense). If we run a DL on the set of seed examples determined by this collocation, we
may obtain many other relevant collocations for the same sense in the list of rules, for instance, “presence of the
word animal in a ±10-word window”. This rule would allow the classification of some new examples at the
second iteration that were left unlabeled by the seed rule, for instance, the example contains a varied plant and
animal life. Third, Yarowsky also applied the one-sense-per-discourse heuristic as a post-process at each
iteration to uniformly label all the examples in the same discourse with the majority sense. This has a double
effect. On the one hand, it allows the algorithm to extend the number of labeled examples, which, in turn, will
provide new “bridge” collocations that cannot be captured directly from intersections among currently labeled
examples. On the other hand, it allows the algorithm to correct some misclassified examples in a particular
discourse.
Performance: Yarowsky‟s (1995) evaluation showed that, with a minimum set of annotated seed examples, this
algorithm obtained comparable results to a fully supervised system (again, using DLs). The evaluation
framework consisted of a small set of words limited to binary homographic sense distinctions.
Apart from simplicity, we would like to highlight another good property of the Yarowsky algorithm,
which is the ability of recovering from initial misclassifications. The fact is that at each iteration all the
examples are relabeled makes it possible that an initial wrong prediction for a concrete example may lower its
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
www.iosrjournals.org 66 | Page
strength in subsequent iterations (due to the more informative training sets) until the confidence for that
collocation falls below the threshold. In other words, we might say that language redundancy makes the
Yarowsky algorithm self- correcting.
As a drawback, this bootstrapping approach has been theoretically poorly understood since its
appearance in 1995. Recently, Abney (2004) performed some advances in this direction by analyzing a number
of variants of the Yarowsky algorithm, showing that they optimize natural objective functions. Another criticism
refers to real applicability, since Martínez and Agirre (2000) observed a far less predictive power of the one-
sense per- discourse and one-sense-per-collocation heuristics when tested in a real domain with highly
polysemous words.
3.1.2 Bilingual Bootstrapping Methods
A new method for word sense disambiguation, one that uses a machine learning technique called bilingual
bootstrapping. In learning to disambiguate words to be translated, bilingual bootstrapping makes use of a small
amount of classified data and a large amount of unclassified data in both the source and the target languages.
The data in the two languages should be from the same domain but are not required to be exactly in
parallel. It repeatedly constructs classifiers in the two languages in parallel by repeating the following two steps:
(1) Construct a classifier for each of the languages on the basis of classified data in both languages, and (2) use
the constructed classifier for each language to classify unclassified data, which are then added to the classified
data of the language. We can use classified data in both languages in step (1), because words in one language
have translations in the other, and we can transform data from one language into the other. It boosts the
performance of the classifiers by classifying unclassified data in the two languages and by exchanging
information regarding classified data between the two languages.
The performance of bilingual bootstrapping been experimentally evaluated in word translation
disambiguation, and all of their results indicate that bilingual bootstrapping consistently and significantly
outperforms monolingual bootstrapping. The higher performance of bilingual bootstrapping can be attributed to
its effective use of the asymmetric relationship between the ambiguous words in the two languages.
3.2 Label Propagation Algorithm
This algorithm works by representing labeled and unlabeled examples as vertices in a connected graph,
then propagating the label information from any vertex to nearby vertices through weighted edges iteratively,
finally inferring the labels of unlabeled examples after the propagation process converges.
3.2.1 Problem Setup
Let X = {𝑥𝑖}𝑖=1
𝑛
be a set of contexts of occurrences of an ambiguous word w, where xi represents the
context of the 𝑖 − 𝑡ℎ occurrence, and 𝑛 is the total number of this word‟s occurrences. Let S = {𝑠𝑗 }𝑗=1
𝑐
denote
the sense tag set of 𝑤. The first l examples 𝑥 𝑔(1 ≤ 𝑔 ≤ 𝑙) are labeled as 𝑦𝑔 (𝑦𝑔 ∈ 𝑆) and other 𝑢 (𝑙 + 𝑢 =
𝑛) examples 𝑥ℎ (𝑙 + 1 ≤ ℎ ≤ 𝑛) are unlabeled. The goal is to predict the sense of w in context 𝑥ℎ by the use
of label information of 𝑥 𝑔 and similarity information among examples in X.
The cluster structure in X can be represented as a connected graph, where each vertex corresponds to an example
and the edge between any two examples 𝑥𝑖 and 𝑥𝑗 is weighted so that the closer the vertices in some distance
measure, the larger the weight associated with this edge. The weights are defined as follows: 𝑊𝑖𝑗 =
𝑒𝑥𝑝 −
∥𝑥 𝑖−𝑥 𝑗 ∥2
𝛼2 if 𝑖 ≠ 𝑗 and 𝑊𝑖𝑖 = 0 (1 ≤ 𝑖, 𝑗 ≤ 𝑛), where 𝛼 is used to control the weight 𝑊𝑖𝑗 .
3.2.2 Algorithm
In LP algorithm (Zhu and Ghahramani, 2002), label information of any vertex in a graph is propagated
to nearby vertices through weighted edges until a global stable stage is achieved. Larger edge weights allow
labels to travel through easier. Thus closer examples, more likely they have similar labels (the global
consistency assumption). In label propagation process, the soft label of each initial labeled example is clamped
in each iteration to replenish label sources from these labeled data. Thus the labeled data act like sources to push
out labels through unlabeled data. With this push from labeled examples, the class boundaries will be pushed
through edges with large weights and settle in gaps along edges with small weights. If the data structure fits the
classification goal, then LP algorithm can use these unlabeled data to help learning classification plane.
Let 𝑌0
∈ 𝑁 𝑛×𝑐
represent initial soft labels attached to vertices, where 𝑌𝑖𝑗
0
= 1 if 𝑦𝑖 is 𝑠𝑗 and 0
otherwise. Let 𝑌𝐿
0
be the top 𝑙 rows of 𝑌0
and 𝑌𝑢
0
be the remaining 𝑢 rows. 𝑌𝐿
0
is consistent with the labeling in
labeled data, and the initialization of 𝑌𝑢
0
can be arbitrary. Optimally we expect that the value of 𝑊𝑖𝑗 across
different classes is as small as possible and the value of 𝑊𝑖𝑗 within same class is as large as possible. This will
make label propagation to stay within same class.
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
www.iosrjournals.org 67 | Page
Define 𝑛 × 𝑛 probability transition matrix 𝑇𝑖𝑗 = 𝑃 𝑗 → 𝑖 =
𝑊 𝑖𝑗
𝑊 𝑘𝑗
𝑛
𝑘=1
, where 𝑇𝑖𝑗 is the probability to jump
from example 𝑥𝑗 to example 𝑥𝑖. Compute the row-normalized matrix 𝑇 by 𝑇𝑖𝑗 =
𝑃 𝑖𝑗
𝑃 𝑖𝑘
𝑛
𝑘=1
.This normalization is
to maintain the class probability interpretation of Y.
Then LP algorithm is defined as follows:
1. Initially set t=0, where t is iteration index;
2. Propagate the label by 𝑌 𝑡+1
= 𝑇𝑌 𝑡
;
3. Clamp labeled data by replacing the top l row of 𝑌 𝑡+1
with 𝑌𝐿
0
. Repeat from step 2 until 𝑌 𝑡
converges;
4. Assign 𝑥ℎ (𝑙 + 1 ≤ ℎ ≤ 𝑛) with a label 𝑠𝑗,where 𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑗 𝑌ℎ𝑗
This algorithm has been shown to converge to a unique solution, which is
𝑌𝑈 = 𝑙𝑖𝑚𝑡→∞ 𝑌𝑈
𝑡
= 𝐼 − 𝑇𝑢𝑢
−1
𝑇𝑢𝑙 𝑌𝐿
0
(Zhu and Ghahramani, 2002).We can see that this solution can be
obtained without iteration and the initialization of 𝑌𝑈
0
𝑖s not important, since 𝑌𝑈
0
does not affect the estimation
of 𝑌𝑈. 𝐼 is 𝑢 × 𝑢 identity matrix. 𝑇𝑢𝑢 and 𝑇𝑢𝑙 are acquired by splitting matrix 𝑇 after the 𝑙 − 𝑡ℎ row and the
𝑙 − 𝑡ℎ column into 4 sub-matrices.
Performance The performance of LP been experimentally evaluated by taking reduced “interest” corpus
(constructed by retaining four major senses) and complete “line” corpus as evaluation data.
Table 1: Accuracies from (Li and Li, 2004) and average accuracies of LP on “interest” and “line” corpora.
Major is a baseline method in which they always choose the most frequent sense. MB-D denotes monolingual
bootstrapping with decision list as base classifier, MB-B represents monolingual bootstrapping with ensemble of
Naïve Bayes as base classifier, and BB is bilingual bootstrapping with ensemble of Naive Bayes as base
classifier.
Ambiguous
words
Accuracies from (Li and Li, 2004)
Major MB-D MB-B BB
interest
line
54.6%
53.5%
54.7%
55.6%
69.3%
54.1%
75.5%
62.7%
Ambiguous
words
Accuracies from ( Niu, Dong,Tan)
#labeled
examples
LPcosine LPJSMajor
interest
line
4×15=60
6×15=90
80.2±2.0%
60.3±4.5%
79.8±2.0%
59.4±3.9%
Table 1 summarizes the average accuracies of LP on the two corpora. It also lists the accuracies of
monolingual bootstrapping algorithm (MB), bilingual bootstrapping algorithm (BB) on “interest” and “line”
corpora. We can see that LP performs much better than MB-D and MB-B on both “interest” and “line” corpora,
while the performance of LP is comparable to BB on these two corpora.
IV. Discussion
Primary focus of this paper is to discuss the semi-supervised algorithm for word sense disambiguation.
Because use of small amount of labelled data and unlabeled data for semi supervised algorithms is gaining
popularity in WSD filed.
Bootstrapping method (Yarowsky 1995) of semi-supervised learning was the first approach towards the
word sense disambiguation.A great advantage of bootstrap is its simplicity. It is a straightforward way to derive
estimates of standard errors and confidence intervals for complex estimators of complex parameters of the
distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. Moreover, it is an
appropriate way to control and check the stability of the results. But bootstrapping is based on a local
consistency assumption: examples close to labeled examples within same class.
Thus label propagation based semi-supervised learning algorithm for WSD came to the light, which
fully realizes a global consistency assumption: similar examples should have similar labels.
In learning process, the labels of unlabeled examples are determined not only by nearby labeled examples, but
also by nearby unlabeled examples.
Compared with semi-supervised WSD methods in the first and second categories, LP is the corpus
based method and does not need external resources, including WordNet, bilingual lexicon, aligned parallel
corpora.. It achieves better performance than SVM when only very few labeled examples are available, and its
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
www.iosrjournals.org 68 | Page
performance is also better than monolingual bootstrapping and comparable to bilingual bootstrapping. By our
investigation we can say LP is gaining more popularity in field of semi- supervised learning than Bootstrapping.
V. Conclusion
We described the Semi supervised learning methods and their performance in the context of word sense
disambiguation. WSD is the important issue of natural language processing, which can be solved by using semi-
supervised methods. These methods in comparison to other WSD approach are less time-consuming and
inexpensive because they require only small amount of labelled training data and also use unlabeled data which
is easily available.
References
[1] Dagan, I. & Itai A.. 1994. Word Sense Disambiguation Using A Second Language Monolingual Corpus. Computational
Linguistics, Vol. 20(4), pp. 563- 596.
[2] Diab, M., & Resnik. P.. 2002. An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. ACL-2002(pp. 255–
262).
[3] Hearst, M.. 1991. Noun Homograph Disambiguation using Local Context in Large Text Corpora. Proceedings of the 7th Annual
Conference of the UW Centre for the New OED and Text Research: Using Corpora, 24:1, 1–41.
[4] Ide, Nancy M. and Jean Veronis. 1998. Introduction to the special issue on word sense disambiguation: the state of the art.
Computational Linguistics, 24(1), 1–40.
[5] Karov, Y. & Edelman, S.. 1998. Similarity-Based Word Sense Disambiguation. Computational Linguistics, 24(1): 41-59.
[6] Leacock, C., Miller, G.A. & Chodorow, M. 1998. Using Corpus Statistics and WordNet Relations for Sense Identification.
Computational Linguistics, 24:1, 147–165.
[7] Lesk M.. 1986. Automated Word Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an
Ice Cream Cone. Proceedings of the ACM SIGDOC Conference.
[8] Li, H. & Li, C.. 2004. Word Translation Disambiguation Using Bilingual bootstrapping. Computational Linguistics, 30(1), 1-22.
[9] Lin, J. 1991. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory, 37:1, 145–150.
[10] McCarthy, D., Koeling, R., Weeds, J., & Carroll, J.. 2004. Finding Predominan Word Senses in Untagged Text. ACL-2004.
[11] Mihalcea R.. 2004. Co-training and Self-training for Word Sense Disambiguation CoNLL-2004.
[12] Brown P., Stephen, D.P., Vincent, D.P., & Robert, Mercer.. 1991. Word Sense Disambiguation Using Statistical Methods. ACL-
1991.
[13] Mo Yu, Shu Wang, Conghui Zhu, Tiejun Zhao .2011. Semi-supervised Learning for Word Sense Disambiguation using Parallel
Corpora Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin, China
[14] Ng, H.T., Wang, B., & Chan, Y.S.. 2003. Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study. ACL-
2003, pp. 455-462.Park, S.B.,
[15] Schutze, H.. 1998. Automatic Word Sense Discrimination.Computational Linguistics, 24:1, 97–123.
[16] Yarowsky, D.. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. ACL-1995, pp. 189-196.
[17] Yarowsky, D.. 1992. Word Sense Disambiguation Using Statistical Models of Roget‟s Categories Trained on Large Corpora.
COLING-1992, pp. 454-460.
[18] Zhang, B.T., & Kim, Y.T.. 2000. Word Sense Disambiguation by Learning from Unlabeled Data. ACL-2000.
[19] Zheng-Yu Niu, Dong-Hong Ji & Chew Lim Tan. Word Sense Disambiguation Using Label Propagation Based Semi-Supervised
Learning
[20] Zhu, X. & Ghahramani, Z.. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. CMU CALD tech report
CMU-CALD-02-107.

Contenu connexe

Tendances

Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
METHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEYMETHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEYIJDKP
 
Deep Neural Networks in Text Classification using Active Learning
Deep Neural Networks in Text Classification using Active LearningDeep Neural Networks in Text Classification using Active Learning
Deep Neural Networks in Text Classification using Active LearningMirsaeid Abolghasemi
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONTEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
 
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTIONA NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTIONijscai
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classificationraghavr186
 
A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...ijma
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusabilityAlexander Decker
 
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...ijcsit
 
Evaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationEvaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniquesijnlc
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataJ48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataCSCJournals
 

Tendances (19)

Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
METHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEYMETHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEY
 
Deep Neural Networks in Text Classification using Active Learning
Deep Neural Networks in Text Classification using Active LearningDeep Neural Networks in Text Classification using Active Learning
Deep Neural Networks in Text Classification using Active Learning
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONTEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
 
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTIONA NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classification
 
A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...A new keyphrases extraction method based on suffix tree data structure for ar...
A new keyphrases extraction method based on suffix tree data structure for ar...
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusability
 
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...
 
Evaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationEvaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classification
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniques
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataJ48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance Data
 

Similaire à Review: Semi-Supervised Learning Methods for Word Sense Disambiguation

USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
 
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...RSIS International
 
Applying Machine Learning to Agricultural Data
Applying Machine Learning to Agricultural DataApplying Machine Learning to Agricultural Data
Applying Machine Learning to Agricultural Databutest
 
6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdf6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdfchalachew5
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441IJRAT
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docx6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docxpriestmanmable
 
6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docx6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docxsodhi3
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
Semi Automated Text Categorization Using Demonstration Based Term Set
Semi Automated Text Categorization Using Demonstration Based Term SetSemi Automated Text Categorization Using Demonstration Based Term Set
Semi Automated Text Categorization Using Demonstration Based Term SetIJCSEA Journal
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.docbutest
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 

Similaire à Review: Semi-Supervised Learning Methods for Word Sense Disambiguation (20)

USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
 
Ijetr021252
Ijetr021252Ijetr021252
Ijetr021252
 
Applying Machine Learning to Agricultural Data
Applying Machine Learning to Agricultural DataApplying Machine Learning to Agricultural Data
Applying Machine Learning to Agricultural Data
 
6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdf6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdf
 
Viva
VivaViva
Viva
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docx6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docx
 
6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docx6711SafeAssign Originality Report69Total S.docx
6711SafeAssign Originality Report69Total S.docx
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Semi Automated Text Categorization Using Demonstration Based Term Set
Semi Automated Text Categorization Using Demonstration Based Term SetSemi Automated Text Categorization Using Demonstration Based Term Set
Semi Automated Text Categorization Using Demonstration Based Term Set
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.doc
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERINGAN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
 

Plus de IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Dernier

Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 

Dernier (20)

Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 

Review: Semi-Supervised Learning Methods for Word Sense Disambiguation

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 63-68 www.iosrjournals.org www.iosrjournals.org 63 | Page Review: Semi-Supervised Learning Methods for Word Sense Disambiguation Ms. Ankita Sati (M.Tech Student, Banasthali Vidyapith, Banasthali, Jaipur, India) Abstract : Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying the appropriate sense of a word in a sentence, when the word has multiple meanings. Many approaches have been proposed to solve the problem, of which supervised learning approaches are the most successful. However supervised machine learning are limited by the difficulties in defining sense tags, acquiring labeled data for training and cannot utilize raw un-annotated data. Semi- supervised learning has recently become active research area because it require small amount of labeled training data and sometimes able to improve performance using unlabeled data. In this paper, we discuss the methods of semi-supervised learning and their performance. Keywords: -word sense disambiguation, supervised learning, semi-supervised learning. I. Introduction In a language, a word can have many different meanings, or senses. For example, bank in English can either mean a financial institution, or a sloping raised land. The task of word sense disambiguation (WSD) is to assign the correct sense to such ambiguous words based on the surrounding context. Word sense disambiguation (WSD) is a central question in the computational linguistics community. It is fundamental to natural language understanding and once it is improved, many other language processing tasks will benefit from it, such as information retrieval (IR) and machine translation (MT) (Ide and Veronis, 1998). Many methods have been proposed to deal with this problem, including supervised learning algorithms (Leacock et al., 1998), semi- supervised learning algorithms (Yarowsky, 1995), and unsupervised learning algorithms (Schutze, 1998). Among these methods supervised sense disambiguation has been very successful, but it requires a lot of manually sense tagged data (labeled data) and cannot utilize raw unannotated data (unlabeled data) that can be cheaply acquired. Mostly, to achieve good performance, the amount of training data required by supervised learning is quite large. This is undesirable as hand-labeled training data is expensive and only available for a small set of words. Fully unsupervised methods do not need the definition of senses and manually sense-tagged data, but their sense clustering results cannot be directly used in many NLP tasks since there is no sense tag for each instance in clusters. Considering both the availability of a large amount of unlabeled data and direct use of word semi-supervised learning has recently become an active research area. It requires only a small amount of labeled training data and is sometimes able to improve performance using unlabeled data. II. Semi–Supervised Learning Semi-supervised learning is a class of machine learning techniques that use both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. The name “semi-supervised learning” comes from the fact that the data used is between supervised learning (with completely labeled training data) and unsupervised learning (without any labeled training data). Many machine- learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent and the process is expensive and time consuming whereas acquisition of unlabeled data is relatively inexpensive and less time consuming. As in the supervised learning framework, we are given a set of 𝑙 independently identically distributed examples 𝑥1, … … . . , 𝑥𝑙 ∈ 𝑋 with corresponding labels 𝑦1,… . , 𝑦𝑙 ∈ 𝑌 . Additionally; we are given 𝑢 unlabeled examples 𝑥𝑙+1, … . , 𝑥𝑙+𝑢 ∈ 𝑋. Semi-supervised learning attempts to make use of this combined information to surpass the classification performance that could be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning. Semi-supervised learning promises higher accuracies with less annotating effort. It is therefore of great theoretic and practical interest. Submitted date 19 June 2013 Accepted Date: 24 June 2013
  • 2. Review: Semi-Supervised Learning Methods for Word Sense Disambiguation www.iosrjournals.org 64 | Page III. Semi–Supervised Learning Techniques The semi-supervised or minimally supervised methods are gaining popularity because of their ability to get by with only a small amount of annotated reference data while often outperforming totally unsupervised methods on large data sets. Semi-supervised methods for WSD are characterized in terms of exploiting unlabeled data in learning procedure with the requirement of predefined sense inventory for target words. They roughly fall into three categories according to what is used for supervision in learning process: Using external resources, e.g., thesaurus or lexicons, to disambiguate word senses or automatically generate sense-tagged corpus, (Lesk, 1986; Lin, 1997; McCarthy et al., 2004; Yarowsky, 1992) Exploiting the differences between mapping of words to senses in different languages by the use of bilingual corpora (e.g. parallel corpora or untagged monolingual corpora in two languages) (Brown et al., 1991; Dagan and Itai, 1994; Diab and Resnik, 2002; Li and Li, 2004; Ng et al., 2003) Bootstrapping sense tagged seed examples to overcome the bottleneck of acquisition of large sense- tagged data (Hearst, 1991; Karov and Edelman, 1998; Mihalcea, 2004; Park et al., 2000; Yarowsky, 1995). Bootstrapping algorithm is commonly used semi-supervised learning method for WSD. It works by iteratively classifying unlabeled examples and adding confidently classified examples into labeled dataset using a model learned from augmented labeled dataset in previous iteration. It can be found that the affinity information among unlabeled examples is not fully explored in this bootstrapping process. Bootstrapping is based on a local consistency assumption: examples close to labeled examples within same class will have same labels, which is also the assumption underlying many supervised learning algorithms, such as kNN. Recently promising families of graph based semi-supervised learning algorithms are introduced, which can effectively combine unlabeled data with labeled in learning process by exploiting cluster structure in data . Label propagation algorithm is a graph based semi-supervised learning algorithm (LP algorithm) (Zhu and Ghahramani, 2002) for WSD, which works by representing labeled and unlabeled examples as vertices in a connected graph, then iteratively propagating label information from any vertex to nearby vertices through weighted edges, finally inferring the labels of unlabeled examples after this propagation process converges. Compared with bootstrapping, LP algorithm is based on a global consistency assumption. Intuitively, if there is at least one labeled example in each cluster that consists of similar examples, then unlabeled examples will have the same labels as labeled examples in the same cluster by propagating the label information of any example to nearby examples according to their proximity. In Section 3.1 we describe Bootstrapping algorithms .Section 3.2 describe Graph based approach: label propagation algorithm 3.1 Bootstrapping Algorithms To partly overcome the knowledge acquisition bottleneck, some methods have been devised for building sense classifiers when only a few annotated examples are available together with a high quantity of un- annotated data. These methods are often referred to as bootstrapping methods (Abney 2002, 2004). Among them, we can highlight co-training (Blum and Mitchell 1998), their derivatives (Collins and Singer 1999; Abney 2002, 2004), and self-training (Nigam and Ghani 2000). Briefly, co-training algorithms work by learning two complementary classifiers for the classification task trained on a small starting set of labeled examples, which are then used to annotate new unlabeled examples. From these new examples only the most confident predictions are added to the set of labeled examples. Each classifier is retrained with the additional training examples given by the other classifier, and the process repeats. The two complementary classifiers are constructed by considering two different views of the data (i.e., two different feature codifications), which must be conditionally independent given the class label. In several NLP tasks, co-training has provided moderate improvements with respect to not using additional unlabeled examples. One important aspect of co-training consists on the use of different views to train different classifiers during the iterative process. While Blum and Mitchell (1998) stated the conditional independence of the views as a requirement, Abney (2002) shows that this requirement can be relaxed. Moreover, Clark et al. (2003) show that simply re-training on all newly labeled data can, in some cases, yield comparable results to agreement based co-training, with only a fraction of the computational cost. Self-training starts with a set of labeled data, and builds a unique classifier (there are no different views of the data), which is then used on the unlabeled data. Only those examples with a confidence score over certain threshold are included in the new labeled set. The classifier is re-trained and the procedure repeated. Note the classifier uses its own predictions to teach itself. The procedure is also called self-teaching. Self-training has been applied to several natural language processing tasks. Yarowsky (1995) uses self-training for word sense disambiguation, e.g. deciding whether the word „plant‟ means a living organism or a factory in a give context. Self-training is a wrapper algorithm, and is hard to analyze in general.
  • 3. Review: Semi-Supervised Learning Methods for Word Sense Disambiguation www.iosrjournals.org 65 | Page Mihalcea (2004) introduced a new bootstrapping scheme that combines co-training with majority voting, with the effect of smoothing the bootstrapping learning curves and improving the average performance. However, this approach assumes a comparable distribution of classes between both labeled and unlabeled data. At each iteration, the class distribution in the labeled data is maintained by keeping a constant ratio across classes between already labeled examples and newly added examples. This requires one to know a priori the distribution of sense classes in the unlabeled corpus, which seems unrealistic. Pham et al. (2005) also experimented with a number of co-training variants on the Senseval-2 lexical sample and all-words tasks, including the ones in Mihalcea (2004). Although the original co-training algorithm did not provide any advantage over using only labeled examples, all the sophisticated co-training variants obtained significant improvements (taking Naïve Bayes as the base learning method). The best reported method was Spectral Graph Transduction Co-training. 3.1.1 Yarowsky Bootstrapping Algorithm The Yarowsky algorithm (Yarowsky 1995) was, probably, one of the first and more successful applications of the bootstrapping approach to NLP tasks. The Yarowsky algorithm is a simple iterative and incremental algorithm which eliminates the need for a large training set by relying on a relatively small number of instances of each sense for each lexeme of interest. These labeled instances are used as seeds to train an initial classifier using any of the supervised learning methods (decision lists, in this particular case). This initial classifier is then used to extract a larger training set from the remaining untagged corpus. Only those examples that are classified with a confidence above a certain threshold are kept as additional training examples for the next iteration. The algorithm repeats this retraining and re-labeling procedure until convergence (i.e., when no changes are observed from the previous iteration). The key to this approach lies in its ability to create a larger training set from a small set of seeds Regarding the initial seed set, Yarowsky (1995a) discusses several alternatives to find them, ranging from fully automatic to manually supervised procedures. This initial labeling may have very low coverage (and, thus, low recall) but it is intended to have extremely high precision. As iterations proceed, the set of training examples tends to increase, while the pool of unlabeled examples shrinks. In terms of performance, recall improves with iterations, while precision tends to decrease slightly. Ideally, at convergence, most of the examples will be labeled with high confidence. Discourse Properties: The Yarowsky Bootstrapping Algorithm Some well-known discourse properties are at the core of the learning process and allow the algorithm to generalize to label new examples with confidence. We refer to: one-sense-per-collocation, language redundancy, and one-sense-per-discourse. First, the one-sense-per-collocation heuristic gives a good justification for using a decision list as the base learner, since a DL uses a single rule, based on a single contextual feature, to classify each new example. Second, we know that language is very redundant. This means that the sense of a concrete example is over determined by a set of multiple relevant contextual features (or collocations). Some of these collocations are shared among other examples of the same sense. These intersections allow the algorithm to learn to classify new examples, and, by transitivity, to increase recall with each iteration. This is the key point in the algorithm for achieving generalization. For instance, borrowing Yarowsky‟s (1995) original examples, a seed rule may establish that all the examples of the word plant in the collocation plant life should be labeled with the vegetal sense of plant (as opposed to its industrial sense). If we run a DL on the set of seed examples determined by this collocation, we may obtain many other relevant collocations for the same sense in the list of rules, for instance, “presence of the word animal in a ±10-word window”. This rule would allow the classification of some new examples at the second iteration that were left unlabeled by the seed rule, for instance, the example contains a varied plant and animal life. Third, Yarowsky also applied the one-sense-per-discourse heuristic as a post-process at each iteration to uniformly label all the examples in the same discourse with the majority sense. This has a double effect. On the one hand, it allows the algorithm to extend the number of labeled examples, which, in turn, will provide new “bridge” collocations that cannot be captured directly from intersections among currently labeled examples. On the other hand, it allows the algorithm to correct some misclassified examples in a particular discourse. Performance: Yarowsky‟s (1995) evaluation showed that, with a minimum set of annotated seed examples, this algorithm obtained comparable results to a fully supervised system (again, using DLs). The evaluation framework consisted of a small set of words limited to binary homographic sense distinctions. Apart from simplicity, we would like to highlight another good property of the Yarowsky algorithm, which is the ability of recovering from initial misclassifications. The fact is that at each iteration all the examples are relabeled makes it possible that an initial wrong prediction for a concrete example may lower its
  • 4. Review: Semi-Supervised Learning Methods for Word Sense Disambiguation www.iosrjournals.org 66 | Page strength in subsequent iterations (due to the more informative training sets) until the confidence for that collocation falls below the threshold. In other words, we might say that language redundancy makes the Yarowsky algorithm self- correcting. As a drawback, this bootstrapping approach has been theoretically poorly understood since its appearance in 1995. Recently, Abney (2004) performed some advances in this direction by analyzing a number of variants of the Yarowsky algorithm, showing that they optimize natural objective functions. Another criticism refers to real applicability, since Martínez and Agirre (2000) observed a far less predictive power of the one- sense per- discourse and one-sense-per-collocation heuristics when tested in a real domain with highly polysemous words. 3.1.2 Bilingual Bootstrapping Methods A new method for word sense disambiguation, one that uses a machine learning technique called bilingual bootstrapping. In learning to disambiguate words to be translated, bilingual bootstrapping makes use of a small amount of classified data and a large amount of unclassified data in both the source and the target languages. The data in the two languages should be from the same domain but are not required to be exactly in parallel. It repeatedly constructs classifiers in the two languages in parallel by repeating the following two steps: (1) Construct a classifier for each of the languages on the basis of classified data in both languages, and (2) use the constructed classifier for each language to classify unclassified data, which are then added to the classified data of the language. We can use classified data in both languages in step (1), because words in one language have translations in the other, and we can transform data from one language into the other. It boosts the performance of the classifiers by classifying unclassified data in the two languages and by exchanging information regarding classified data between the two languages. The performance of bilingual bootstrapping been experimentally evaluated in word translation disambiguation, and all of their results indicate that bilingual bootstrapping consistently and significantly outperforms monolingual bootstrapping. The higher performance of bilingual bootstrapping can be attributed to its effective use of the asymmetric relationship between the ambiguous words in the two languages. 3.2 Label Propagation Algorithm This algorithm works by representing labeled and unlabeled examples as vertices in a connected graph, then propagating the label information from any vertex to nearby vertices through weighted edges iteratively, finally inferring the labels of unlabeled examples after the propagation process converges. 3.2.1 Problem Setup Let X = {𝑥𝑖}𝑖=1 𝑛 be a set of contexts of occurrences of an ambiguous word w, where xi represents the context of the 𝑖 − 𝑡ℎ occurrence, and 𝑛 is the total number of this word‟s occurrences. Let S = {𝑠𝑗 }𝑗=1 𝑐 denote the sense tag set of 𝑤. The first l examples 𝑥 𝑔(1 ≤ 𝑔 ≤ 𝑙) are labeled as 𝑦𝑔 (𝑦𝑔 ∈ 𝑆) and other 𝑢 (𝑙 + 𝑢 = 𝑛) examples 𝑥ℎ (𝑙 + 1 ≤ ℎ ≤ 𝑛) are unlabeled. The goal is to predict the sense of w in context 𝑥ℎ by the use of label information of 𝑥 𝑔 and similarity information among examples in X. The cluster structure in X can be represented as a connected graph, where each vertex corresponds to an example and the edge between any two examples 𝑥𝑖 and 𝑥𝑗 is weighted so that the closer the vertices in some distance measure, the larger the weight associated with this edge. The weights are defined as follows: 𝑊𝑖𝑗 = 𝑒𝑥𝑝 − ∥𝑥 𝑖−𝑥 𝑗 ∥2 𝛼2 if 𝑖 ≠ 𝑗 and 𝑊𝑖𝑖 = 0 (1 ≤ 𝑖, 𝑗 ≤ 𝑛), where 𝛼 is used to control the weight 𝑊𝑖𝑗 . 3.2.2 Algorithm In LP algorithm (Zhu and Ghahramani, 2002), label information of any vertex in a graph is propagated to nearby vertices through weighted edges until a global stable stage is achieved. Larger edge weights allow labels to travel through easier. Thus closer examples, more likely they have similar labels (the global consistency assumption). In label propagation process, the soft label of each initial labeled example is clamped in each iteration to replenish label sources from these labeled data. Thus the labeled data act like sources to push out labels through unlabeled data. With this push from labeled examples, the class boundaries will be pushed through edges with large weights and settle in gaps along edges with small weights. If the data structure fits the classification goal, then LP algorithm can use these unlabeled data to help learning classification plane. Let 𝑌0 ∈ 𝑁 𝑛×𝑐 represent initial soft labels attached to vertices, where 𝑌𝑖𝑗 0 = 1 if 𝑦𝑖 is 𝑠𝑗 and 0 otherwise. Let 𝑌𝐿 0 be the top 𝑙 rows of 𝑌0 and 𝑌𝑢 0 be the remaining 𝑢 rows. 𝑌𝐿 0 is consistent with the labeling in labeled data, and the initialization of 𝑌𝑢 0 can be arbitrary. Optimally we expect that the value of 𝑊𝑖𝑗 across different classes is as small as possible and the value of 𝑊𝑖𝑗 within same class is as large as possible. This will make label propagation to stay within same class.
  • 5. Review: Semi-Supervised Learning Methods for Word Sense Disambiguation www.iosrjournals.org 67 | Page Define 𝑛 × 𝑛 probability transition matrix 𝑇𝑖𝑗 = 𝑃 𝑗 → 𝑖 = 𝑊 𝑖𝑗 𝑊 𝑘𝑗 𝑛 𝑘=1 , where 𝑇𝑖𝑗 is the probability to jump from example 𝑥𝑗 to example 𝑥𝑖. Compute the row-normalized matrix 𝑇 by 𝑇𝑖𝑗 = 𝑃 𝑖𝑗 𝑃 𝑖𝑘 𝑛 𝑘=1 .This normalization is to maintain the class probability interpretation of Y. Then LP algorithm is defined as follows: 1. Initially set t=0, where t is iteration index; 2. Propagate the label by 𝑌 𝑡+1 = 𝑇𝑌 𝑡 ; 3. Clamp labeled data by replacing the top l row of 𝑌 𝑡+1 with 𝑌𝐿 0 . Repeat from step 2 until 𝑌 𝑡 converges; 4. Assign 𝑥ℎ (𝑙 + 1 ≤ ℎ ≤ 𝑛) with a label 𝑠𝑗,where 𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑗 𝑌ℎ𝑗 This algorithm has been shown to converge to a unique solution, which is 𝑌𝑈 = 𝑙𝑖𝑚𝑡→∞ 𝑌𝑈 𝑡 = 𝐼 − 𝑇𝑢𝑢 −1 𝑇𝑢𝑙 𝑌𝐿 0 (Zhu and Ghahramani, 2002).We can see that this solution can be obtained without iteration and the initialization of 𝑌𝑈 0 𝑖s not important, since 𝑌𝑈 0 does not affect the estimation of 𝑌𝑈. 𝐼 is 𝑢 × 𝑢 identity matrix. 𝑇𝑢𝑢 and 𝑇𝑢𝑙 are acquired by splitting matrix 𝑇 after the 𝑙 − 𝑡ℎ row and the 𝑙 − 𝑡ℎ column into 4 sub-matrices. Performance The performance of LP been experimentally evaluated by taking reduced “interest” corpus (constructed by retaining four major senses) and complete “line” corpus as evaluation data. Table 1: Accuracies from (Li and Li, 2004) and average accuracies of LP on “interest” and “line” corpora. Major is a baseline method in which they always choose the most frequent sense. MB-D denotes monolingual bootstrapping with decision list as base classifier, MB-B represents monolingual bootstrapping with ensemble of Naïve Bayes as base classifier, and BB is bilingual bootstrapping with ensemble of Naive Bayes as base classifier. Ambiguous words Accuracies from (Li and Li, 2004) Major MB-D MB-B BB interest line 54.6% 53.5% 54.7% 55.6% 69.3% 54.1% 75.5% 62.7% Ambiguous words Accuracies from ( Niu, Dong,Tan) #labeled examples LPcosine LPJSMajor interest line 4×15=60 6×15=90 80.2±2.0% 60.3±4.5% 79.8±2.0% 59.4±3.9% Table 1 summarizes the average accuracies of LP on the two corpora. It also lists the accuracies of monolingual bootstrapping algorithm (MB), bilingual bootstrapping algorithm (BB) on “interest” and “line” corpora. We can see that LP performs much better than MB-D and MB-B on both “interest” and “line” corpora, while the performance of LP is comparable to BB on these two corpora. IV. Discussion Primary focus of this paper is to discuss the semi-supervised algorithm for word sense disambiguation. Because use of small amount of labelled data and unlabeled data for semi supervised algorithms is gaining popularity in WSD filed. Bootstrapping method (Yarowsky 1995) of semi-supervised learning was the first approach towards the word sense disambiguation.A great advantage of bootstrap is its simplicity. It is a straightforward way to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. Moreover, it is an appropriate way to control and check the stability of the results. But bootstrapping is based on a local consistency assumption: examples close to labeled examples within same class. Thus label propagation based semi-supervised learning algorithm for WSD came to the light, which fully realizes a global consistency assumption: similar examples should have similar labels. In learning process, the labels of unlabeled examples are determined not only by nearby labeled examples, but also by nearby unlabeled examples. Compared with semi-supervised WSD methods in the first and second categories, LP is the corpus based method and does not need external resources, including WordNet, bilingual lexicon, aligned parallel corpora.. It achieves better performance than SVM when only very few labeled examples are available, and its
  • 6. Review: Semi-Supervised Learning Methods for Word Sense Disambiguation www.iosrjournals.org 68 | Page performance is also better than monolingual bootstrapping and comparable to bilingual bootstrapping. By our investigation we can say LP is gaining more popularity in field of semi- supervised learning than Bootstrapping. V. Conclusion We described the Semi supervised learning methods and their performance in the context of word sense disambiguation. WSD is the important issue of natural language processing, which can be solved by using semi- supervised methods. These methods in comparison to other WSD approach are less time-consuming and inexpensive because they require only small amount of labelled training data and also use unlabeled data which is easily available. References [1] Dagan, I. & Itai A.. 1994. Word Sense Disambiguation Using A Second Language Monolingual Corpus. Computational Linguistics, Vol. 20(4), pp. 563- 596. [2] Diab, M., & Resnik. P.. 2002. An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. ACL-2002(pp. 255– 262). [3] Hearst, M.. 1991. Noun Homograph Disambiguation using Local Context in Large Text Corpora. Proceedings of the 7th Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, 24:1, 1–41. [4] Ide, Nancy M. and Jean Veronis. 1998. Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics, 24(1), 1–40. [5] Karov, Y. & Edelman, S.. 1998. Similarity-Based Word Sense Disambiguation. Computational Linguistics, 24(1): 41-59. [6] Leacock, C., Miller, G.A. & Chodorow, M. 1998. Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics, 24:1, 147–165. [7] Lesk M.. 1986. Automated Word Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. Proceedings of the ACM SIGDOC Conference. [8] Li, H. & Li, C.. 2004. Word Translation Disambiguation Using Bilingual bootstrapping. Computational Linguistics, 30(1), 1-22. [9] Lin, J. 1991. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory, 37:1, 145–150. [10] McCarthy, D., Koeling, R., Weeds, J., & Carroll, J.. 2004. Finding Predominan Word Senses in Untagged Text. ACL-2004. [11] Mihalcea R.. 2004. Co-training and Self-training for Word Sense Disambiguation CoNLL-2004. [12] Brown P., Stephen, D.P., Vincent, D.P., & Robert, Mercer.. 1991. Word Sense Disambiguation Using Statistical Methods. ACL- 1991. [13] Mo Yu, Shu Wang, Conghui Zhu, Tiejun Zhao .2011. Semi-supervised Learning for Word Sense Disambiguation using Parallel Corpora Laboratory of Natural Language Processing and Speech Harbin Institute of Technology Harbin, China [14] Ng, H.T., Wang, B., & Chan, Y.S.. 2003. Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study. ACL- 2003, pp. 455-462.Park, S.B., [15] Schutze, H.. 1998. Automatic Word Sense Discrimination.Computational Linguistics, 24:1, 97–123. [16] Yarowsky, D.. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. ACL-1995, pp. 189-196. [17] Yarowsky, D.. 1992. Word Sense Disambiguation Using Statistical Models of Roget‟s Categories Trained on Large Corpora. COLING-1992, pp. 454-460. [18] Zhang, B.T., & Kim, Y.T.. 2000. Word Sense Disambiguation by Learning from Unlabeled Data. ACL-2000. [19] Zheng-Yu Niu, Dong-Hong Ji & Chew Lim Tan. Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning [20] Zhu, X. & Ghahramani, Z.. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. CMU CALD tech report CMU-CALD-02-107.