SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011




    Effective Approach for Disambiguating Chinese
                Polyphonic Ambiguity
                                                        Feng-Long Huang
                                   Department of Computer Science and Information Engineering
                                                   National United University
                                           No. 1, Lienda, Miaoli, Taiwan, 36003
                                                       flhuang@nuu.edu.tw

Abstract:-One of the difficult tasks on Natural Language                shown and then analyzed furthermore in Section 4.
Processing (NLP) is to resolve the sense ambiguity of                   Conclusions and future works are listed in last section.
characters or words on text, such as polyphones, homonymy,                           II.       RELATED WORKS
and homograph. The paper addresses the ambiguity issue of
Chinese character polyphones and disambiguity approach for                  Resolving automatically the word sense ambiguity can
such issues. Three methods, dictionary matching, language               enhance the language understanding, which will used on several
models and voting scheme, are used to disambiguate the                  fields, such as information retrieval, document category, grammar
prediction of polyphones. Compared with the well-known MS               analysis, speech processing and text preprocessing, and so on. In
Word 2007 and language models (LMs), our approach is                    the past decades, ambiguity issues are always considered as
superior to these two methods for the issue. The final precision        AI-complete, that is, a problem which can be solved only by
rate is enhanced up to 92.75%. Based on the proposed                    first resolving all the difficult problems in artificial
approaches, we have constructed the e-learning system in                intelligence (AI), such as the representation of common
which several related functions of Chinese transliteration are          sense and encyclopedic knowledge. Sense disambiguation
integrated.
                                                                        is required for correct phonetization of words in speech
Keywords:-Natural Language Processing, Sense Disambiguity,
                                                                        synthesis [13], and also for word segmentation and
Language Model, Voting Scheme,                                          homophone discrimination in speech recognition.
               I.        INTRODUCTION                                        It is essential for language understanding applications
                                                                        suchas        message       understanding,     man-machine
    In recent years, natural language processing (NLP) has              communication, etc. WSD can be applied into many fields
been studied and discussed on many fields, such as machine              of natural language processing [10], such as machine
translation, speech processing, lexical analysis, information           translation, information retrieval (IR), speech processing
retrieval, spelling prediction, hand-writing recognition, and           and text processing.
so on [1][2]. In the computational models, syntax models                     The approaches on WSD are categorized as follows:
parsing, word segmentation and generation of statistical                  A. Machine-Readable Dictionaries (MRD):
language models have been the focus tasks.                                  Relying on the word information in dictionary for sense
     In general, no matter what kinds of natural languages,                 ambiguity, such as WordNet or Academia Sinica
there will be always a phenomenon of ambiguity among                        Chinese Electronic Dictionary (ASCED) [17].
characters or words in text, such as polyphone, homonymy,                 B. Computational Lexicons:
homograph, and the combination of them. It is of necessary
to accomplish most natural language processing                              Employing the lexical information in thesaurus, such as
applications. One of the difficult tasks on NLP is to resolve               the well-known WordNet [11, 14], which contains the
the word’s sense ambiguity. It is so-called word sense                      lexical clues of characters and lattice among related
dsiambiguity (WSD) [3, 4].                                                  characters.
     Disambiguating the sense ambiguity can alleviate the                C. Corpus-based methods
problems in NLP. The paper address the dictionary                           Depending on the statistical results in corpus, such as
matching, statistical N-gram language model (LMs) and                       term’s occurrences, part-of-speech (POS) and location
voting scheme, which includes two methods: preference                       of characters and words [12, 15].
and winner-take-all scoring, to retrieve Chinese lexical                D. Neural Networks:
knowledge, employed to process WSD on Chinese                               The approach is based on the concept codes of thesaurus
polyphonic characters. There are near 5700 frequent unique                  or features of lexical words [16, 17].
characters and among them more than 1300 characters have                There are many works addressing WSD and several
more than 2 different pronunciations, they are called                   methods have been proposed so far. Because of the unique
polyphonic characters. The problem predicting correct                   features of Chinese language-Chinese word segmentation,
polyphonic categories can be regarded as the issue of WSD.              more than two different features will be employed to
     The paper is organized as following: the related works             achieve higher prediction for WSD issues. Therefore, two
on WSD are presented in Section 2. Three methods will                   methods will be arranged furthermore.
first be described in Section 3 and experimental results are

                                                                   20
© 2011 ACEEE
DOI: 01.IJSIP.02.01.211
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



           III. DESCRIPTION OF PROPOSED METHODS                                     by methods in the following phase.
                                                                            B.   Language Models - LMs
   In this paper, several methods are first proposed to
                                                                                In recent years, the statistical language models have
disambiguate the sense category of Chinese polyphones;
                                                                            been adopted in NLP. Supoosed that W=w1,w2,w3,…wn,
dictionary matching, n-gram language models and voting
                                                                            where wi and n denote the the ith Chinese character and
scheme. In the following, each will be explained in details.
 A. Dictionary Matching                                                     number of characters in sentence (0         )。
     In order to predict correctly the sense category of                    P(W)=P(w1 ,w2 .....,wn ), //using chain rules.
polyphones, dictionary matching will be exploited for the
ambiguity issue. Within a Chinese sentence, the location p                  P(     )= P(w1)P(w2|w1)P(w3|     )...P(wn|   )
of polyphonic character wp is set as the centre, we extract                            =∏                                        (1)
the right and left substring based on the centre p. Two
substrings are denoted as CHL and CHR, as shown in Fig. 1.                  where           denotes string w1,w2,w3,…wk-1.
In a window size, all possible substrings in CHL and CHR
                                                                                  In Eq(1), the probability                       can be
will be segmented and then match the lexicons in
dictionary.                                                                 calculated, starting at w1, by using w1,w2,w3…wk-1 substring
                                                                            to predict the occurrence probability of wk. In case of longer
                                                                            string, it is necessary for large amount of corpus to train the
       w1 ,w2      (CHL)   wp                                               language model with better performance. It will lead to
       w1 ,w2 ……            wp wp+1 ,wp+2                 wn
                                                                            spending much labor and time extensive.
                                            ……
                                                                                  In general, unigram, bigram and trigram (3<=N) [5][6]
                           wp   wp+1 ,wp+2 (CHR)          wn                are generated. N-gram model calculates probability P(.) of
        Fig. 1: A sentence with target polyphonic character wp. will        N th events by the preceding N-1 events, rather than string
                           divided into two substrings.                     w1,w2,w3…wN-1.
                                                                               In short, N-gram is so-called N-1)th-order Markov model,
      If the words are existed on both substrings, then we
                                                                            which calculate conditional probability of successive events:
can decide the pronunciation of polyphone based on the
                                                                            calculate the probability of N th event while preceding (N-1)
priority of longest word and highest frequency of word;
                                                                            event occurs. Basically, N-gram Language Model is
length of word first and then frequency of word secondly.
                                                                            expressed as follows:
In the paper, window size=6 Chinese characters; that means
LEN(CHL)= LEN(CHR)=6。                                                       P(     )    ∏        |                                (2)
      The Chinese dictionary is available and contains near                      N=1, unigram or zero-order markov model.
  130K Chinese words. Each Chinese word may be                                   N=2, bigram or first-order markov model.
  composed from 2 to 12 Chinese characters. All the words                        N=3, trigram or second-order markov model.
  in dictionary contain its frequency, part-of-speech (POS),
  transliteration 1 ; in which correctly pronunciation for                      In Eq(2), the relative frequency will be used for
  polyphonic character in the word may be decided.                          calculating the P(.):
      The algorithm of dictionary matching is described as
follows:                                                                    P(     |        )=           ,                       (3)

step 1. Read in the sentence and find the location p of                     where C(w) denotes the count of event w occurring in
        polyphone target wp.                                                training corpus.
step 2. Based on the of wp, all the possible substring of CHL
        and CHR within window (size=6) will be segmented                         In Eq(3), the obtained probability P( . ) is called
        and extracted, then compared with lexicons in                       Maximum Likelihood Estimation (MLE). While predicting
        Chinese dictionary.                                                 the pronunciation category of polyphones, we can predict
step 3. If any Chinese word can be found on both substring                  based on the probability on each category t (1       ), T
           goto step 4,                                                     denotes the number of categories of polyphone. The
        else                                                                category with maximum probability Pmax(W) with respect to
          goto step 5.                                                      the sentence W will be the target and then the correct
step 4. Decide the sense category of pronunciation for                      pronunciation of polyphone can be decided.
        polyphone based on the priority scheme of longest                   C. Voting Scheme
        word and highest frequency of word. Then the                             In contrast to the N-gram models above, we proposed
        process ends.                                                       voting scheme with similar concept for use to select in
step 5. The pronunciation of polyphone wp will be predicted                 human being society. Basically, we vote for one candidate
                                                                            and the candidates with maximum votes will be the winner.
                                                                            In real world, maybe more than one candidate will win the
1
    Zhuyin Fuhau (注音符號) can be found in the dictionary.
                                                                            section game while disambiguation process only one

                                                                       21
© 2011 ACEEE
DOI: 01.IJSIP.02.01.211
=
p   δ
    +
    c
    N
    (
    )




                                                            ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



        category of polyphone will be the final target with respect                 Table 1: example for two scoring scheme of voting.
        to the pronunciation.
              The voting scheme can be described as follows: each                    category         count      preference    w-t-all
        token in sentence play the voter for vote for favorite                     1 ㄐㄩㄢ 4             26       26/40=0.65    40/40=1
        candidate based on the probability calculated by the lexical               2 ㄐㄩㄢ 3             11       11/40=0.275    0/40=0
        features of tokens. The total score S(W) accumulated from                  3 ㄑㄩㄢ 2             3        3/40=0.075     0/40=0
        all voters for each category will be obtained, and the
        candidate category with highest score is the final winner. In              Total ∑ C           40       1 score        1 score
        the paper, there are two voting methods:                              ps. w-t-all denotes winner-take-all scoring
                                                                             D.    Unknown events-Zero Count Issue
        1)    Winner-Take-All:
             In the voting method, the probability is calculated as               In certain cases, C(•) of a novel (unknown word),
        follows:                                                             which don’t occur in the training corpus, may be zero
                     ,                                                       because of the limited training data and infinite language. It
                                                             (4)
                                                                             is always hard for us to collect sufficient datum. The
        where C(wi) denotes the occurrences of wi in training                potential issue of MLE is the probability for unseen events
        corpus, and C(wi, t) denotes the occurrences of wi for sense         is exactly zero. This is so-called the zero-count problem and
        category t in training corpus.                                       will degrade the performance of system.
             In Eq(4) above,           is regarded as the probability             It is obvious that zero count will lead to the zero
        of      on category t. In winner take all scoring, the
                                                                             probability of P(•) in Eqs(2), (3) and (4). There are many
        category with maximum probability will win the ticket. On
        the other hand, it win one ticket (1 score) while all other          smoothing works in [7, 8, 9]. The paper adopted the
        categories can’t be assigned any ticket (0 score). Therefore,        additive discounting for calculating P * as follows:
        each voter has just one ticket for voting. The                                                                              (7)
        winner-take-all scoring for tolen wi can be defined as
                                                                             where δ denotes a small value ( δ <=0.5); which will be
        follows:
                                                                             added into all the known and unknown events. The
                  1 if       max . among all categories                      smoothing method will alleviate the zero count issue in
                                                             (5)
                  0                  all other categories
                                                                             language model.
             According to Eq(5), the total score for each categories         E. Classifier-Predicting the Categories
         can be accumulated for all tokens in sentence:                           Supposed that polyphone has T categories, 1         ,
        S(W) =P(w1)+P(w2)+P(w3)+……+P(wn)                                     how can we predict the correct target ̂ ? As shown in
                                                                             Eq(8), the category with maximum probability or score will
             =∑                                             (6)
                                                                             be the most possible target:
        2) Preference Scoring:                                                ̂=         Pt (W), or
             Another voting method is called as preference. For a
        token in sentence, the summation of the probability for all           ̂=         St (W),                                   (8)
        the categories of a polyphone character will be equal to 1.          where Pt (W) is the probability of W in category t, which
        Let us show an Chinese’s example (E1) for two voting
                                                                             can be obtained from Eq(1) for LMs and St (W) is the total
        methods. Note that sentence (E1’) is the translation for
                                                                             score based on the voting scheme from Eq(6).
        example (E1). As presented in Table 1, the polyphone
                                                                                              IV. EXPERIMENT RESULTS
        character 卷 has three different pronunciations, 1. ㄐㄩㄢˋ,
                                                                              In the paper, 10 Chinese polyphones are selected
        2. ㄐㄩㄢˇ and 3. ㄑㄩㄢˊ. Supposed that the occurrence                 randomly from more than 1300 polyphones in Chinese. All
        of token 白 卷 (blank examination) in these phonetic                the promising pronunciations of these selected polyphones
        categories are 26, 11 and 3, total occurrence is 40.              are list in Table 2; one polyphone “著” has 5 categories, 3
        Therefore, the score for each category by two scoring             polyphone have 2 categories.
        methods can be calculated.                                       A. Dictionary and Corpus
                                                                                Academic Sinica Chinese Electronic dictionary,
        教育社會方面都繳了白卷                                         (E1)          ASCED) contains more than 130K Chinese words,
        Government handed over a blank examination paper in               composing of 2 to 11 characters. The word in ASCED is
        education and society.                              (E1’)
                                                                          with Part-of-speech (POS), frequency and pronunciation for
                                                                          each character.
                                                                                The experimental data are collected from the corpus of
                                                                          ASBC (Academia Sinica Balanced Corpus) and web news
                                                                          of China Times. The sentences with one of 10 polyphones
                                                                          are collected randomly. There are totally 9070 sentences,

                                                                        22
        © 2011 ACEEE
        DOI: 01.IJSIP.02.01.211
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



which are divided into two parts: 8030 (88.5%) and 1040
                                                                      Chinese words in CHL       CHR
(11.5%) sentences for training and outside testing,
respectively.                                                         NULL                       中央              2979    ㄓㄨㄥ
 B. Experiment Results
                                                                                               中央研究院               50    ㄓㄨㄥ
    Three LMs models are generated: unigram, bigram and
trigram. Precision Rate (PR) can be defined as:                        In example (E5), only CHR contains the segmented words.
        NO.                                                          On the other hand, there are no any word in CHL
PR =                                                    (9)
                                                                     Method 2: Language Model (LMs)
Method 1: Dictionary Matching
                                                                          The experiment results of three models unigram,
     There are 69 sentences processed by the word
                                                                     bigram, trigram are listed in Table 3. Bigram LMs achieves
matching phase and 7 sentences are wrongly predicted. The
                                                                     92.58%, which is highest rate among three models.
average PR achieves 89.86%.
                                                                     Method 3: Voting Scheme
     In the followings, several examples are presented and
                                                                     1)Winner take all: Three models; unitoken, betoken and
explained the matching phase of dictionary matching:
                                                                        tritoken are generated. As shown in Table 4. Bitoken
我們回頭看看中國人的歷史。                                          (E2)             achieves highest PR of 90.17%.
We look back the history of Chinese.                   (E2’)         2)Preference: Three models; unitoken, bitoken and tritoken
   Based on the matching algorithm,     two substring CHL and            are generated. As shown in Table 5. Bitoken preference
                                                                         scoring can achieves highest PR of 92.72% in average.
CHR of polyphone target(wp =中) for sentence (E2);
                                                                     C. Word 2007 precision rate
   CHL =”們回頭看看中”,                                                       MS Office is a famous and well-known editing package
                                                                     around world. In our experiments, MS Word 2007 is used
   CHR=”中國人的歷史”.                                                     to process the transcription on same testing sentences. PR
                                                                     achieves 89.8% in average, as shown in Table 6.
   Upon the word segmentation, the Chinese word and                  D. Results Analysis
pronunciation are as follows:
                                                                          In the paper, voting scheme of preference and
 CHL                          CHR                                    winner-take-all scoring, and statistical language Model
                                                                     have been proposed and employed to resolve the issue of
 看中           83   ㄓㄨㄥ 4      中國         3542        ㄓㄨㄥ             polyphone ambiguity. We compare these methods with MS
                                                                     Word 2007. Preference bitoken scheme achieves highest
                              中國人        487         ㄓㄨㄥ
                                                                     PR among these models and achieves 92.72%. It is apparent
                                                                     that all our proposed methods are superior to MS Word
       According the priority of length of word first,中國人            2007.
(Chinese people) will decide the pronunciation of 中 as ㄓ                  In the following, two examples are shown for correct
                                                                     and wrong prediction by Word 2007.
ㄨㄥ.                                                                      ˋ ˋ  ˋ   ˋ      ˋ      ˇ                          ˊ   ˋ
                                                                     ㄐㄧㄠ ㄩ ㄕㄜ ㄏㄨㄟ ㄈㄤ ㄇㄧㄢ ㄉㄡ ㄐㄧㄠ ˙                       ㄅㄞ ㄐㄩㄢ
                                                                      教    育    社    會     方     面     都     繳     了    白     卷
看中文再用廣東話來發音。                                            (E3)
Read the Chinese and then pronounce in Canton.         (E3’)         Government handed over a blank examination paper in education
                                                                     and society. (correct prediction)
 Chinese words in CHL           Chinese words in CHR                        ˋ ˊ  ˊ    ˋ  ˊ ˋ ˇ
                                                                     ㄅㄤ ㄖㄨㄛ ㄨ ㄖㄣ ㄅㄢ ㄗ ㄧㄢ ㄗ ㄩ
 看中           83   ㄓㄨㄥ 4        中文        343        ㄓㄨㄥ             傍    若 無 人 般 自 言 自 語

                                                                     Talking to oneself as if nobody is around.(wrong prediction)
峰迴路轉再看中國方面.                                            (E4)
                                                                         We have constructed an intelligent e-learning system
The path winds along mountain ridges, then
watch the reflection of China.                         (E4’)         [18] based on the unify approach proposed in the paper. The
                                                                     system provides the function of Chinese Synthesized
 Chinese words in CHL           Chinese words in CHR                 speech and display sereral useful lexical information, such
                                                                     as transliteration, Zhuyin and 2 pinyins for learning
 看中           83   ㄓㄨㄥˋ         中國        3542       ㄓㄨㄥ             Chinese.
                                                                     All the functions such as Chinese polyphones prediction
中央研究院未來的展望。                                             (E5)         addressed in the paper, transliteration and transcription
The future forecast of Academic Sinica of Chinese.     (E5’)         described above are integrated together in the e-learning
                                                                     website to provide online searching and translation through

                                                                23
© 2011 ACEEE
DOI: 01.IJSIP.02.01.211
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



Internet. If the predicted category is wrong, user may                      [5] Jurafsky D. and Martin J. H., 2000, Speech and Language
feedback the right category of polyphone to online                              Processing, Prentice Hall.
gradually adapt the system’s prediction for Chinese                         [6] Jui-Feng Yeh, Chung-Hsien Wu and Mao-Zhu Yang, 2006,
polyphones.                                                                     Stochastic Discourse Modeling in Spoken Dialogue Systems
                                                                                Using Semantic Dependency Graphs, Proceedings of the
                         V. CONCLUSION                                          COLING/ACL 2006 Main Conference Poster Sessions, pages
                                                                                937–944.
      In the paper, we used several methods to address the
                                                                            [7] Standley F. Chen and Ronald Rosenfeld, Jan. 2000, A Survey
issue of ambiguity of Chinese polyphones. First, three                          of Smoothing Techniques, for ME Models, IEEE Transactions
methods are employed to predict the category of polyphone:                      on Speech and Audio Processing, Vol. 8, No. 1, pp. 37-50.
dictionary matching, language models and voting scheme;                     [8] Church K. W. and Gale W. A., 1991, A Comparison of the
the last method has two different scoring schemes:                              Enhanced Good-Turing and Deleted Estimation Methods for
winner-take-all and preference scoring. Furthermore we                          Estimating Probabilies of English Bigrams, Computer Speech
propose the effective unify approaches, which unify the                         and Language, Vol. 5, pp 19-54.
several methods and then adopt better alternatives triggered                [9] Chen Standy F. and Goodman Joshua, 1999, An Empirical
based on a threshold, to improve the prediction.                                study of smoothing Techniques for Language Modeling,
      Our approach outperforms MS Word 2007 and                                 Computer Speech and Language, Vol. 13, pp. 359-394.
statistical language models, and the best result of final                   [10] Nancy Ide and Jean Véronis, 1998, Word Sense
outside testing achieves 92.72%. The proposed approach                           Disambiguation, The state of the art Computational
can be applied to related issues on other language.                              Linguistics Vol. 24, NO. 1, pp. 1-41.
      Based on the proposed unify approach, we have                         [11] Miller, George A.; Beckwith, Richard T. Fellbaum,
constructed the e-learning system in which several related                       Christiane D.; Gross, Derek; and Miller, Katherine J. (1990).
functions of Chinese text transliteration are integrated to                      WordNet: A non-line lexical database. International Journal
provide on-line searching and translation through Internet.                      of Lexicography, 3(4), 235-244.
In future, several related issues should be studied                         [12] Church, Kenneth W. and Mercer, Robert L.(1993).
furthermore:                                                                    Introduction to the Special Issue on Computational Linguistics
      1.     Collecting more corpus and extend the proposed                     using Large Corpora. Computational Linguistics, 19(1), 1-24.
                                                                            [13]Yarowsky D., Homograph disambiguation in speech synthesis.
             methods to other Chinese polyphones.
                                                                                In J. van Santen, R. Sproat, J. Olive and J. Hirschberg, Progess
      2.     More lexical features, such as location and
                                                                                in Speech Synthesis, Springer-Verlag, 1997, pp. 159–175.
             semantic information, used to enhance the
                                                                            [14] Feng-Long Huang, Shu-Yu Ke and Qiong-Wen Fan, 2008,
             precision rate of prediction.                                      Predicting Effectively the Pronunciation of Chinese
      3.     Improving the smoothing techniques for                             Polyphones by Extracting the Lexical Information, Advances
             unknown words.                                                     in Computer and Information Sciences and Engineering,
      4.     Bilingual translation for English and Chinese.                     Springer Science, pp. 159–165.
                       ACKNOWLEDGEMENT                                      [15]Francisco Joao Pinto, Antonio Farina Martinez, Carme
                                                                                Fernandez Perez-Sanjulian, 2008, Joining automatic query
  The paper is supported partially under the Project of                         expansion based on thesaurus and word sense disambiguation
NCS, Taiwan.                                                                    using WordNet, International Journal of Computer
                    REFERENCE                                                   Applications in Technology Vol. 33, No. 4, pp. 271 – 279.
[1] Yan Wu, Xiukun Li and Caesar Lun, 2006, A Structural Based              [16]You-Jin Chung et.al., 2002, Word sense disambiguation in a
   Approach to Cantonese-English Machine Translation,                           Korean-to-Japanese MT system using neural networks,
   Computational Linguistics and Chinese Language Processing,                   International Conference On Computational Linguistics
   Vol. 11, No. 2, June 2006, pp. 137-158.                                      archive COLING-02 on Machine translation in Asia – Vol. 16,
[2] Brian D. Davison, Marc Najork, Tim Converse, 2006, SIGIR                    pp.1-7.
   Workshop Report, Vol. 40 No. 2.                                          [17] Jean Veronis and Nancy M. Ide, 1990, Word sense
[3] Oliveira, F.; Wong, F.; Li, Y.-P., 2005, Machine Learning and                disambiguation with very large neural networks extracted
    Cybernetics, Proceedings of 2005 International Conference on                 from machine readable dictionaries, International Conference
    Volume 6, Issue , 18-21 Aug. 2005 Vol. 6, An unsupervised &                  On Computational Linguistics Proceedings of the 13th
    statistical word sense tagging using bilingual sources, Page(s):
                                                                                 conference on Computational linguistics – Vol. 2, pp. 389
    3749 - 3754
[4] Agirre E., Edmonds P., 2006, Word Sense Disambiguation                       -394.
    Algorithms and Applications, Springer.                                  [18] http://203.64.183.226/public2/word1.php




                                                                       24
© 2011 ACEEE
DOI: 01.IJSIP.02.01.211
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



                             Table 2: 10 Chinese polyphonic characters; its category and meanings.

                          target       Zhuyin Fuhau       Chinese word        hanyu pinyin                  English

                           中           ㄓㄨㄥ               中心               zhong xin          center
                                       ㄓㄨㄥˋ              中毒               zhong du           poison
                           乘           ㄔㄥˊ               乘法               cheng fa           multiplication
                                       ㄕㄥˋ               大乘               da sheng
                           乾           ㄍㄢ                乾淨               gan jing           clean
                                       ㄑㄧㄢˊ              乾坤               qian kun           the universe
                           了           ㄌㄜ˙               為了               wei le             in order to
                                       ㄌㄧㄠˇ              了解               liao jie           understand
                           傍           ㄆㄤˊ               傍邊               pang bian          beside
                                       ㄅㄤ                傍晚               bang wan           nightfall
                                       ㄅㄤˋ               依山傍水             yi shan bang       near the mountain and by the
                                                                          shui               river
                           作           ㄗㄨㄛˋ              工作               gong zuo           work
                                       ㄗㄨㄛ               作揖               zuo yi
                                       ㄗㄨㄛˊ              作興               zuo xing
                           著           ㄓㄜ˙               忙著               mang zhe           busy
                                       ㄓㄠ                著急               zhao ji            anxious
                                       ㄓㄠˊ               著想               zhao xiang         to bear in mind the interest of
                                       ㄓㄨˋ               著名               zhu                famous
                                       ㄓㄨㄛˊ              執著               zhuo               inflexible
                           卷           ㄐㄩㄢˋ              考卷               kao juan           a test paper
                                       ㄐㄩㄢˇ              卷髮               Juan fa            curly hair
                                       ㄑㄩㄢˊ              卷曲               quan qu            curl
                           咽           ㄧㄢ                咽喉               yan hou            the throat
                                       ㄧㄢˋ               吞咽               tun yan            swallow
                                       ㄧㄝˋ               哽咽               geng ye            to choke
                                       ㄘㄨㄥˊ              從事               cong shi           to devote oneself
                           從           ㄗㄨㄥˋ              僕從               pu zong            servant
                                       ㄘㄨㄥ               從容               cong rong          calm; unhurried
                                       ㄗㄨㄥ               從橫               zong heng          in length and breadth

                                              Table 3:PR of outside testing on Language Model.
    token           中              乘         乾          了         傍        作           著       卷        咽             從    avg.

    unigram       95.88        86.84     92.31        70.21    85.71     96.23       75.32   100       98         91.67   89.98

    bigram        96.75        84.21     96.15        85.11    92.86     94.34       81.17   96.30     100        93.52   92.58*

    trigram       80.04        57.89     61.54        58.51    78.57     52.83       60.39   62.96     88         71.30   70.50

    ps: * denotes the best PR among three n-gram models.
                                          Table 4: PR of outside testing on Winner-take-all scoring.
    token           中              乘         乾          了        傍         作           著       卷        咽             從    avg.

    unitoken      96.96        84.21     80.77        57.45    71.43     94.34       58.44   85.19     84         87.04   84.69

    bitoken       96.75        86.84     96.15        79.79    85.71     92.45       68.83   100       98         93.52   90.17*

    tritoken      79.83        60.53     61.54        60.64    78.57     52.83       59.74   66.67     88         71.3    70.69

    ps: * denotes the best PR among three n-gram models.




                                                                         25
© 2011 ACEEE
DOI: 01.IJSIP.02.01.211
ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011



                                           Table 4: PR of outside testing on Preference scoring.
    token             中         乘         乾         了         傍       作        著         卷         咽       從      avg.

    unitoken        96.96     84.21     80.77     70.21     71.43   94.34    70.13     85.19       88    87.96   87.76

    bitoken.        96.75     86.84     96.15     87.23     85.71   93.40    81.17     100         98    93.52   92.72*

    tritoken.       80.04     60.53     61.54     60.64     78.57   52.83    59.74     66.67       88    71.30   70.78

   ps: * denotes the best PR among three n-gram models.




                                           Table 6: PR of Word 2007 on same testing sentences.

    token             中         乘          乾         了        傍       作         著        卷          咽      從      avg.


    word 2007       93.37     76.47     76.67     83.65     78.57   93.70     78.33    82.76       100   91.51   89.80




                                                                    26
© 2011 ACEEE
DOI: 01.IJSIP.02.01.211

Contenu connexe

Tendances

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguationAshwin Perti
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEAbdurrahimDerric
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognitionVipul Munot
 
The effect of bilingualism on syntactic and semantic recognition in children
The effect of bilingualism on syntactic and semantic recognition in childrenThe effect of bilingualism on syntactic and semantic recognition in children
The effect of bilingualism on syntactic and semantic recognition in childrenChuluundorj Begz
 
Reading Automaticity by David LaBerge and S Jay Samuels
Reading Automaticity by David LaBerge  and S Jay SamuelsReading Automaticity by David LaBerge  and S Jay Samuels
Reading Automaticity by David LaBerge and S Jay SamuelsDoha Zallag
 
A word sense disambiguation technique for sinhala
A word sense disambiguation technique  for sinhalaA word sense disambiguation technique  for sinhala
A word sense disambiguation technique for sinhalaVijayindu Gamage
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsijnlc
 
Neural correlates of second language word learning: minimal instruction produ...
Neural correlates of second language word learning: minimal instruction produ...Neural correlates of second language word learning: minimal instruction produ...
Neural correlates of second language word learning: minimal instruction produ...Gabriel Guillén
 
Exploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language textExploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language texteSAT Journals
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 ReadingTooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 Readingengedukamall
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...eSAT Journals
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 

Tendances (18)

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguation
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
The effect of bilingualism on syntactic and semantic recognition in children
The effect of bilingualism on syntactic and semantic recognition in childrenThe effect of bilingualism on syntactic and semantic recognition in children
The effect of bilingualism on syntactic and semantic recognition in children
 
Reading Automaticity by David LaBerge and S Jay Samuels
Reading Automaticity by David LaBerge  and S Jay SamuelsReading Automaticity by David LaBerge  and S Jay Samuels
Reading Automaticity by David LaBerge and S Jay Samuels
 
A word sense disambiguation technique for sinhala
A word sense disambiguation technique  for sinhalaA word sense disambiguation technique  for sinhala
A word sense disambiguation technique for sinhala
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
 
Neural correlates of second language word learning: minimal instruction produ...
Neural correlates of second language word learning: minimal instruction produ...Neural correlates of second language word learning: minimal instruction produ...
Neural correlates of second language word learning: minimal instruction produ...
 
Exploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language textExploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language text
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 ReadingTooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 

En vedette

Indianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural SectorIndianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural Sectorwittylama
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaMaxim Grinev
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesAutomated Insights
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character RecognitionKamakhya Gupta
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindiRAJENDRA VERMA
 
Automatic Document Summarization
Automatic Document SummarizationAutomatic Document Summarization
Automatic Document SummarizationFindwise
 
Natural Language Generation from First-Order Expressions
Natural Language Generation from First-Order ExpressionsNatural Language Generation from First-Order Expressions
Natural Language Generation from First-Order ExpressionsThomas Mathew
 
Machine Translation=Google Translator
Machine Translation=Google TranslatorMachine Translation=Google Translator
Machine Translation=Google TranslatorNerea
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Speech acts
Speech actsSpeech acts
Speech actsangegamg
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering SystemDhwaj Raj
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisMercy Livingstone
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translationRushdi Shams
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 

En vedette (20)

Indianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural SectorIndianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural Sector
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by Wikipedia
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization Opportunities
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character Recognition
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindi
 
Automatic Document Summarization
Automatic Document SummarizationAutomatic Document Summarization
Automatic Document Summarization
 
Natural Language Generation from First-Order Expressions
Natural Language Generation from First-Order ExpressionsNatural Language Generation from First-Order Expressions
Natural Language Generation from First-Order Expressions
 
Machine Translation=Google Translator
Machine Translation=Google TranslatorMachine Translation=Google Translator
Machine Translation=Google Translator
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Machine translation
Machine translationMachine translation
Machine translation
 
Speech acts
Speech actsSpeech acts
Speech acts
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and Analysis
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Text summarization
Text summarizationText summarization
Text summarization
 

Similaire à Effective Approach for Disambiguating Chinese Polyphonic Ambiguity

Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challengesantonellarose
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIDES Editor
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)IT Industry
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
Processing Written English
Processing Written EnglishProcessing Written English
Processing Written EnglishRuel Montefolka
 
A new framework based on KNN and DT for speech identification through emphat...
A new framework based on KNN and DT for speech  identification through emphat...A new framework based on KNN and DT for speech  identification through emphat...
A new framework based on KNN and DT for speech identification through emphat...nooriasukmaningtyas
 
Processing of Written Language
Processing of Written LanguageProcessing of Written Language
Processing of Written LanguageHome and School
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlpLaraOlmosCamarena
 

Similaire à Effective Approach for Disambiguating Chinese Polyphonic Ambiguity (20)

Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for Lipreading
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Processing Written English
Processing Written EnglishProcessing Written English
Processing Written English
 
A new framework based on KNN and DT for speech identification through emphat...
A new framework based on KNN and DT for speech  identification through emphat...A new framework based on KNN and DT for speech  identification through emphat...
A new framework based on KNN and DT for speech identification through emphat...
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 
Processing of Written Language
Processing of Written LanguageProcessing of Written Language
Processing of Written Language
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 

Plus de IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

Plus de IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Dernier

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Effective Approach for Disambiguating Chinese Polyphonic Ambiguity

  • 1. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 Effective Approach for Disambiguating Chinese Polyphonic Ambiguity Feng-Long Huang Department of Computer Science and Information Engineering National United University No. 1, Lienda, Miaoli, Taiwan, 36003 flhuang@nuu.edu.tw Abstract:-One of the difficult tasks on Natural Language shown and then analyzed furthermore in Section 4. Processing (NLP) is to resolve the sense ambiguity of Conclusions and future works are listed in last section. characters or words on text, such as polyphones, homonymy, II. RELATED WORKS and homograph. The paper addresses the ambiguity issue of Chinese character polyphones and disambiguity approach for Resolving automatically the word sense ambiguity can such issues. Three methods, dictionary matching, language enhance the language understanding, which will used on several models and voting scheme, are used to disambiguate the fields, such as information retrieval, document category, grammar prediction of polyphones. Compared with the well-known MS analysis, speech processing and text preprocessing, and so on. In Word 2007 and language models (LMs), our approach is the past decades, ambiguity issues are always considered as superior to these two methods for the issue. The final precision AI-complete, that is, a problem which can be solved only by rate is enhanced up to 92.75%. Based on the proposed first resolving all the difficult problems in artificial approaches, we have constructed the e-learning system in intelligence (AI), such as the representation of common which several related functions of Chinese transliteration are sense and encyclopedic knowledge. Sense disambiguation integrated. is required for correct phonetization of words in speech Keywords:-Natural Language Processing, Sense Disambiguity, synthesis [13], and also for word segmentation and Language Model, Voting Scheme, homophone discrimination in speech recognition. I. INTRODUCTION It is essential for language understanding applications suchas message understanding, man-machine In recent years, natural language processing (NLP) has communication, etc. WSD can be applied into many fields been studied and discussed on many fields, such as machine of natural language processing [10], such as machine translation, speech processing, lexical analysis, information translation, information retrieval (IR), speech processing retrieval, spelling prediction, hand-writing recognition, and and text processing. so on [1][2]. In the computational models, syntax models The approaches on WSD are categorized as follows: parsing, word segmentation and generation of statistical A. Machine-Readable Dictionaries (MRD): language models have been the focus tasks. Relying on the word information in dictionary for sense In general, no matter what kinds of natural languages, ambiguity, such as WordNet or Academia Sinica there will be always a phenomenon of ambiguity among Chinese Electronic Dictionary (ASCED) [17]. characters or words in text, such as polyphone, homonymy, B. Computational Lexicons: homograph, and the combination of them. It is of necessary to accomplish most natural language processing Employing the lexical information in thesaurus, such as applications. One of the difficult tasks on NLP is to resolve the well-known WordNet [11, 14], which contains the the word’s sense ambiguity. It is so-called word sense lexical clues of characters and lattice among related dsiambiguity (WSD) [3, 4]. characters. Disambiguating the sense ambiguity can alleviate the C. Corpus-based methods problems in NLP. The paper address the dictionary Depending on the statistical results in corpus, such as matching, statistical N-gram language model (LMs) and term’s occurrences, part-of-speech (POS) and location voting scheme, which includes two methods: preference of characters and words [12, 15]. and winner-take-all scoring, to retrieve Chinese lexical D. Neural Networks: knowledge, employed to process WSD on Chinese The approach is based on the concept codes of thesaurus polyphonic characters. There are near 5700 frequent unique or features of lexical words [16, 17]. characters and among them more than 1300 characters have There are many works addressing WSD and several more than 2 different pronunciations, they are called methods have been proposed so far. Because of the unique polyphonic characters. The problem predicting correct features of Chinese language-Chinese word segmentation, polyphonic categories can be regarded as the issue of WSD. more than two different features will be employed to The paper is organized as following: the related works achieve higher prediction for WSD issues. Therefore, two on WSD are presented in Section 2. Three methods will methods will be arranged furthermore. first be described in Section 3 and experimental results are 20 © 2011 ACEEE DOI: 01.IJSIP.02.01.211
  • 2. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 III. DESCRIPTION OF PROPOSED METHODS by methods in the following phase. B. Language Models - LMs In this paper, several methods are first proposed to In recent years, the statistical language models have disambiguate the sense category of Chinese polyphones; been adopted in NLP. Supoosed that W=w1,w2,w3,…wn, dictionary matching, n-gram language models and voting where wi and n denote the the ith Chinese character and scheme. In the following, each will be explained in details. A. Dictionary Matching number of characters in sentence (0 )。 In order to predict correctly the sense category of P(W)=P(w1 ,w2 .....,wn ), //using chain rules. polyphones, dictionary matching will be exploited for the ambiguity issue. Within a Chinese sentence, the location p P( )= P(w1)P(w2|w1)P(w3| )...P(wn| ) of polyphonic character wp is set as the centre, we extract =∏ (1) the right and left substring based on the centre p. Two substrings are denoted as CHL and CHR, as shown in Fig. 1. where denotes string w1,w2,w3,…wk-1. In a window size, all possible substrings in CHL and CHR In Eq(1), the probability can be will be segmented and then match the lexicons in dictionary. calculated, starting at w1, by using w1,w2,w3…wk-1 substring to predict the occurrence probability of wk. In case of longer string, it is necessary for large amount of corpus to train the w1 ,w2 (CHL) wp language model with better performance. It will lead to w1 ,w2 …… wp wp+1 ,wp+2 wn spending much labor and time extensive. …… In general, unigram, bigram and trigram (3<=N) [5][6] wp wp+1 ,wp+2 (CHR) wn are generated. N-gram model calculates probability P(.) of Fig. 1: A sentence with target polyphonic character wp. will N th events by the preceding N-1 events, rather than string divided into two substrings. w1,w2,w3…wN-1. In short, N-gram is so-called N-1)th-order Markov model, If the words are existed on both substrings, then we which calculate conditional probability of successive events: can decide the pronunciation of polyphone based on the calculate the probability of N th event while preceding (N-1) priority of longest word and highest frequency of word; event occurs. Basically, N-gram Language Model is length of word first and then frequency of word secondly. expressed as follows: In the paper, window size=6 Chinese characters; that means LEN(CHL)= LEN(CHR)=6。 P( ) ∏ | (2) The Chinese dictionary is available and contains near N=1, unigram or zero-order markov model. 130K Chinese words. Each Chinese word may be N=2, bigram or first-order markov model. composed from 2 to 12 Chinese characters. All the words N=3, trigram or second-order markov model. in dictionary contain its frequency, part-of-speech (POS), transliteration 1 ; in which correctly pronunciation for In Eq(2), the relative frequency will be used for polyphonic character in the word may be decided. calculating the P(.): The algorithm of dictionary matching is described as follows: P( | )= , (3) step 1. Read in the sentence and find the location p of where C(w) denotes the count of event w occurring in polyphone target wp. training corpus. step 2. Based on the of wp, all the possible substring of CHL and CHR within window (size=6) will be segmented In Eq(3), the obtained probability P( . ) is called and extracted, then compared with lexicons in Maximum Likelihood Estimation (MLE). While predicting Chinese dictionary. the pronunciation category of polyphones, we can predict step 3. If any Chinese word can be found on both substring based on the probability on each category t (1 ), T goto step 4, denotes the number of categories of polyphone. The else category with maximum probability Pmax(W) with respect to goto step 5. the sentence W will be the target and then the correct step 4. Decide the sense category of pronunciation for pronunciation of polyphone can be decided. polyphone based on the priority scheme of longest C. Voting Scheme word and highest frequency of word. Then the In contrast to the N-gram models above, we proposed process ends. voting scheme with similar concept for use to select in step 5. The pronunciation of polyphone wp will be predicted human being society. Basically, we vote for one candidate and the candidates with maximum votes will be the winner. In real world, maybe more than one candidate will win the 1 Zhuyin Fuhau (注音符號) can be found in the dictionary. section game while disambiguation process only one 21 © 2011 ACEEE DOI: 01.IJSIP.02.01.211
  • 3. = p δ + c N ( ) ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 category of polyphone will be the final target with respect Table 1: example for two scoring scheme of voting. to the pronunciation. The voting scheme can be described as follows: each category count preference w-t-all token in sentence play the voter for vote for favorite 1 ㄐㄩㄢ 4 26 26/40=0.65 40/40=1 candidate based on the probability calculated by the lexical 2 ㄐㄩㄢ 3 11 11/40=0.275 0/40=0 features of tokens. The total score S(W) accumulated from 3 ㄑㄩㄢ 2 3 3/40=0.075 0/40=0 all voters for each category will be obtained, and the candidate category with highest score is the final winner. In Total ∑ C 40 1 score 1 score the paper, there are two voting methods: ps. w-t-all denotes winner-take-all scoring D. Unknown events-Zero Count Issue 1) Winner-Take-All: In the voting method, the probability is calculated as In certain cases, C(•) of a novel (unknown word), follows: which don’t occur in the training corpus, may be zero , because of the limited training data and infinite language. It (4) is always hard for us to collect sufficient datum. The where C(wi) denotes the occurrences of wi in training potential issue of MLE is the probability for unseen events corpus, and C(wi, t) denotes the occurrences of wi for sense is exactly zero. This is so-called the zero-count problem and category t in training corpus. will degrade the performance of system. In Eq(4) above, is regarded as the probability It is obvious that zero count will lead to the zero of on category t. In winner take all scoring, the probability of P(•) in Eqs(2), (3) and (4). There are many category with maximum probability will win the ticket. On the other hand, it win one ticket (1 score) while all other smoothing works in [7, 8, 9]. The paper adopted the categories can’t be assigned any ticket (0 score). Therefore, additive discounting for calculating P * as follows: each voter has just one ticket for voting. The (7) winner-take-all scoring for tolen wi can be defined as where δ denotes a small value ( δ <=0.5); which will be follows: added into all the known and unknown events. The 1 if max . among all categories smoothing method will alleviate the zero count issue in (5) 0 all other categories language model. According to Eq(5), the total score for each categories E. Classifier-Predicting the Categories can be accumulated for all tokens in sentence: Supposed that polyphone has T categories, 1 , S(W) =P(w1)+P(w2)+P(w3)+……+P(wn) how can we predict the correct target ̂ ? As shown in Eq(8), the category with maximum probability or score will =∑ (6) be the most possible target: 2) Preference Scoring: ̂= Pt (W), or Another voting method is called as preference. For a token in sentence, the summation of the probability for all ̂= St (W), (8) the categories of a polyphone character will be equal to 1. where Pt (W) is the probability of W in category t, which Let us show an Chinese’s example (E1) for two voting can be obtained from Eq(1) for LMs and St (W) is the total methods. Note that sentence (E1’) is the translation for score based on the voting scheme from Eq(6). example (E1). As presented in Table 1, the polyphone IV. EXPERIMENT RESULTS character 卷 has three different pronunciations, 1. ㄐㄩㄢˋ, In the paper, 10 Chinese polyphones are selected 2. ㄐㄩㄢˇ and 3. ㄑㄩㄢˊ. Supposed that the occurrence randomly from more than 1300 polyphones in Chinese. All of token 白 卷 (blank examination) in these phonetic the promising pronunciations of these selected polyphones categories are 26, 11 and 3, total occurrence is 40. are list in Table 2; one polyphone “著” has 5 categories, 3 Therefore, the score for each category by two scoring polyphone have 2 categories. methods can be calculated. A. Dictionary and Corpus Academic Sinica Chinese Electronic dictionary, 教育社會方面都繳了白卷 (E1) ASCED) contains more than 130K Chinese words, Government handed over a blank examination paper in composing of 2 to 11 characters. The word in ASCED is education and society. (E1’) with Part-of-speech (POS), frequency and pronunciation for each character. The experimental data are collected from the corpus of ASBC (Academia Sinica Balanced Corpus) and web news of China Times. The sentences with one of 10 polyphones are collected randomly. There are totally 9070 sentences, 22 © 2011 ACEEE DOI: 01.IJSIP.02.01.211
  • 4. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 which are divided into two parts: 8030 (88.5%) and 1040 Chinese words in CHL CHR (11.5%) sentences for training and outside testing, respectively. NULL 中央 2979 ㄓㄨㄥ B. Experiment Results 中央研究院 50 ㄓㄨㄥ Three LMs models are generated: unigram, bigram and trigram. Precision Rate (PR) can be defined as: In example (E5), only CHR contains the segmented words. NO. On the other hand, there are no any word in CHL PR = (9) Method 2: Language Model (LMs) Method 1: Dictionary Matching The experiment results of three models unigram, There are 69 sentences processed by the word bigram, trigram are listed in Table 3. Bigram LMs achieves matching phase and 7 sentences are wrongly predicted. The 92.58%, which is highest rate among three models. average PR achieves 89.86%. Method 3: Voting Scheme In the followings, several examples are presented and 1)Winner take all: Three models; unitoken, betoken and explained the matching phase of dictionary matching: tritoken are generated. As shown in Table 4. Bitoken 我們回頭看看中國人的歷史。 (E2) achieves highest PR of 90.17%. We look back the history of Chinese. (E2’) 2)Preference: Three models; unitoken, bitoken and tritoken Based on the matching algorithm, two substring CHL and are generated. As shown in Table 5. Bitoken preference scoring can achieves highest PR of 92.72% in average. CHR of polyphone target(wp =中) for sentence (E2); C. Word 2007 precision rate CHL =”們回頭看看中”, MS Office is a famous and well-known editing package around world. In our experiments, MS Word 2007 is used CHR=”中國人的歷史”. to process the transcription on same testing sentences. PR achieves 89.8% in average, as shown in Table 6. Upon the word segmentation, the Chinese word and D. Results Analysis pronunciation are as follows: In the paper, voting scheme of preference and CHL CHR winner-take-all scoring, and statistical language Model have been proposed and employed to resolve the issue of 看中 83 ㄓㄨㄥ 4 中國 3542 ㄓㄨㄥ polyphone ambiguity. We compare these methods with MS Word 2007. Preference bitoken scheme achieves highest 中國人 487 ㄓㄨㄥ PR among these models and achieves 92.72%. It is apparent that all our proposed methods are superior to MS Word According the priority of length of word first,中國人 2007. (Chinese people) will decide the pronunciation of 中 as ㄓ In the following, two examples are shown for correct and wrong prediction by Word 2007. ㄨㄥ. ˋ ˋ ˋ ˋ ˋ ˇ ˊ ˋ ㄐㄧㄠ ㄩ ㄕㄜ ㄏㄨㄟ ㄈㄤ ㄇㄧㄢ ㄉㄡ ㄐㄧㄠ ˙ ㄅㄞ ㄐㄩㄢ 教 育 社 會 方 面 都 繳 了 白 卷 看中文再用廣東話來發音。 (E3) Read the Chinese and then pronounce in Canton. (E3’) Government handed over a blank examination paper in education and society. (correct prediction) Chinese words in CHL Chinese words in CHR ˋ ˊ ˊ ˋ ˊ ˋ ˇ ㄅㄤ ㄖㄨㄛ ㄨ ㄖㄣ ㄅㄢ ㄗ ㄧㄢ ㄗ ㄩ 看中 83 ㄓㄨㄥ 4 中文 343 ㄓㄨㄥ 傍 若 無 人 般 自 言 自 語 Talking to oneself as if nobody is around.(wrong prediction) 峰迴路轉再看中國方面. (E4) We have constructed an intelligent e-learning system The path winds along mountain ridges, then watch the reflection of China. (E4’) [18] based on the unify approach proposed in the paper. The system provides the function of Chinese Synthesized Chinese words in CHL Chinese words in CHR speech and display sereral useful lexical information, such as transliteration, Zhuyin and 2 pinyins for learning 看中 83 ㄓㄨㄥˋ 中國 3542 ㄓㄨㄥ Chinese. All the functions such as Chinese polyphones prediction 中央研究院未來的展望。 (E5) addressed in the paper, transliteration and transcription The future forecast of Academic Sinica of Chinese. (E5’) described above are integrated together in the e-learning website to provide online searching and translation through 23 © 2011 ACEEE DOI: 01.IJSIP.02.01.211
  • 5. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 Internet. If the predicted category is wrong, user may [5] Jurafsky D. and Martin J. H., 2000, Speech and Language feedback the right category of polyphone to online Processing, Prentice Hall. gradually adapt the system’s prediction for Chinese [6] Jui-Feng Yeh, Chung-Hsien Wu and Mao-Zhu Yang, 2006, polyphones. Stochastic Discourse Modeling in Spoken Dialogue Systems Using Semantic Dependency Graphs, Proceedings of the V. CONCLUSION COLING/ACL 2006 Main Conference Poster Sessions, pages 937–944. In the paper, we used several methods to address the [7] Standley F. Chen and Ronald Rosenfeld, Jan. 2000, A Survey issue of ambiguity of Chinese polyphones. First, three of Smoothing Techniques, for ME Models, IEEE Transactions methods are employed to predict the category of polyphone: on Speech and Audio Processing, Vol. 8, No. 1, pp. 37-50. dictionary matching, language models and voting scheme; [8] Church K. W. and Gale W. A., 1991, A Comparison of the the last method has two different scoring schemes: Enhanced Good-Turing and Deleted Estimation Methods for winner-take-all and preference scoring. Furthermore we Estimating Probabilies of English Bigrams, Computer Speech propose the effective unify approaches, which unify the and Language, Vol. 5, pp 19-54. several methods and then adopt better alternatives triggered [9] Chen Standy F. and Goodman Joshua, 1999, An Empirical based on a threshold, to improve the prediction. study of smoothing Techniques for Language Modeling, Our approach outperforms MS Word 2007 and Computer Speech and Language, Vol. 13, pp. 359-394. statistical language models, and the best result of final [10] Nancy Ide and Jean Véronis, 1998, Word Sense outside testing achieves 92.72%. The proposed approach Disambiguation, The state of the art Computational can be applied to related issues on other language. Linguistics Vol. 24, NO. 1, pp. 1-41. Based on the proposed unify approach, we have [11] Miller, George A.; Beckwith, Richard T. Fellbaum, constructed the e-learning system in which several related Christiane D.; Gross, Derek; and Miller, Katherine J. (1990). functions of Chinese text transliteration are integrated to WordNet: A non-line lexical database. International Journal provide on-line searching and translation through Internet. of Lexicography, 3(4), 235-244. In future, several related issues should be studied [12] Church, Kenneth W. and Mercer, Robert L.(1993). furthermore: Introduction to the Special Issue on Computational Linguistics 1. Collecting more corpus and extend the proposed using Large Corpora. Computational Linguistics, 19(1), 1-24. [13]Yarowsky D., Homograph disambiguation in speech synthesis. methods to other Chinese polyphones. In J. van Santen, R. Sproat, J. Olive and J. Hirschberg, Progess 2. More lexical features, such as location and in Speech Synthesis, Springer-Verlag, 1997, pp. 159–175. semantic information, used to enhance the [14] Feng-Long Huang, Shu-Yu Ke and Qiong-Wen Fan, 2008, precision rate of prediction. Predicting Effectively the Pronunciation of Chinese 3. Improving the smoothing techniques for Polyphones by Extracting the Lexical Information, Advances unknown words. in Computer and Information Sciences and Engineering, 4. Bilingual translation for English and Chinese. Springer Science, pp. 159–165. ACKNOWLEDGEMENT [15]Francisco Joao Pinto, Antonio Farina Martinez, Carme Fernandez Perez-Sanjulian, 2008, Joining automatic query The paper is supported partially under the Project of expansion based on thesaurus and word sense disambiguation NCS, Taiwan. using WordNet, International Journal of Computer REFERENCE Applications in Technology Vol. 33, No. 4, pp. 271 – 279. [1] Yan Wu, Xiukun Li and Caesar Lun, 2006, A Structural Based [16]You-Jin Chung et.al., 2002, Word sense disambiguation in a Approach to Cantonese-English Machine Translation, Korean-to-Japanese MT system using neural networks, Computational Linguistics and Chinese Language Processing, International Conference On Computational Linguistics Vol. 11, No. 2, June 2006, pp. 137-158. archive COLING-02 on Machine translation in Asia – Vol. 16, [2] Brian D. Davison, Marc Najork, Tim Converse, 2006, SIGIR pp.1-7. Workshop Report, Vol. 40 No. 2. [17] Jean Veronis and Nancy M. Ide, 1990, Word sense [3] Oliveira, F.; Wong, F.; Li, Y.-P., 2005, Machine Learning and disambiguation with very large neural networks extracted Cybernetics, Proceedings of 2005 International Conference on from machine readable dictionaries, International Conference Volume 6, Issue , 18-21 Aug. 2005 Vol. 6, An unsupervised & On Computational Linguistics Proceedings of the 13th statistical word sense tagging using bilingual sources, Page(s): conference on Computational linguistics – Vol. 2, pp. 389 3749 - 3754 [4] Agirre E., Edmonds P., 2006, Word Sense Disambiguation -394. Algorithms and Applications, Springer. [18] http://203.64.183.226/public2/word1.php 24 © 2011 ACEEE DOI: 01.IJSIP.02.01.211
  • 6. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 Table 2: 10 Chinese polyphonic characters; its category and meanings. target Zhuyin Fuhau Chinese word hanyu pinyin English 中 ㄓㄨㄥ 中心 zhong xin center ㄓㄨㄥˋ 中毒 zhong du poison 乘 ㄔㄥˊ 乘法 cheng fa multiplication ㄕㄥˋ 大乘 da sheng 乾 ㄍㄢ 乾淨 gan jing clean ㄑㄧㄢˊ 乾坤 qian kun the universe 了 ㄌㄜ˙ 為了 wei le in order to ㄌㄧㄠˇ 了解 liao jie understand 傍 ㄆㄤˊ 傍邊 pang bian beside ㄅㄤ 傍晚 bang wan nightfall ㄅㄤˋ 依山傍水 yi shan bang near the mountain and by the shui river 作 ㄗㄨㄛˋ 工作 gong zuo work ㄗㄨㄛ 作揖 zuo yi ㄗㄨㄛˊ 作興 zuo xing 著 ㄓㄜ˙ 忙著 mang zhe busy ㄓㄠ 著急 zhao ji anxious ㄓㄠˊ 著想 zhao xiang to bear in mind the interest of ㄓㄨˋ 著名 zhu famous ㄓㄨㄛˊ 執著 zhuo inflexible 卷 ㄐㄩㄢˋ 考卷 kao juan a test paper ㄐㄩㄢˇ 卷髮 Juan fa curly hair ㄑㄩㄢˊ 卷曲 quan qu curl 咽 ㄧㄢ 咽喉 yan hou the throat ㄧㄢˋ 吞咽 tun yan swallow ㄧㄝˋ 哽咽 geng ye to choke ㄘㄨㄥˊ 從事 cong shi to devote oneself 從 ㄗㄨㄥˋ 僕從 pu zong servant ㄘㄨㄥ 從容 cong rong calm; unhurried ㄗㄨㄥ 從橫 zong heng in length and breadth Table 3:PR of outside testing on Language Model. token 中 乘 乾 了 傍 作 著 卷 咽 從 avg. unigram 95.88 86.84 92.31 70.21 85.71 96.23 75.32 100 98 91.67 89.98 bigram 96.75 84.21 96.15 85.11 92.86 94.34 81.17 96.30 100 93.52 92.58* trigram 80.04 57.89 61.54 58.51 78.57 52.83 60.39 62.96 88 71.30 70.50 ps: * denotes the best PR among three n-gram models. Table 4: PR of outside testing on Winner-take-all scoring. token 中 乘 乾 了 傍 作 著 卷 咽 從 avg. unitoken 96.96 84.21 80.77 57.45 71.43 94.34 58.44 85.19 84 87.04 84.69 bitoken 96.75 86.84 96.15 79.79 85.71 92.45 68.83 100 98 93.52 90.17* tritoken 79.83 60.53 61.54 60.64 78.57 52.83 59.74 66.67 88 71.3 70.69 ps: * denotes the best PR among three n-gram models. 25 © 2011 ACEEE DOI: 01.IJSIP.02.01.211
  • 7. ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 Table 4: PR of outside testing on Preference scoring. token 中 乘 乾 了 傍 作 著 卷 咽 從 avg. unitoken 96.96 84.21 80.77 70.21 71.43 94.34 70.13 85.19 88 87.96 87.76 bitoken. 96.75 86.84 96.15 87.23 85.71 93.40 81.17 100 98 93.52 92.72* tritoken. 80.04 60.53 61.54 60.64 78.57 52.83 59.74 66.67 88 71.30 70.78 ps: * denotes the best PR among three n-gram models. Table 6: PR of Word 2007 on same testing sentences. token 中 乘 乾 了 傍 作 著 卷 咽 從 avg. word 2007 93.37 76.47 76.67 83.65 78.57 93.70 78.33 82.76 100 91.51 89.80 26 © 2011 ACEEE DOI: 01.IJSIP.02.01.211