Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Noun Paraphrasing Based on a Variety of Contexts

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
NLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
Chargement dans…3
×

Consultez-les par la suite

1 sur 26 Publicité

Noun Paraphrasing Based on a Variety of Contexts

Télécharger pour lire hors ligne

Tomoyuki Kajiwara, Kazuhide Yamamoto.
Noun Paraphrasing Based on a Variety of Contexts.
In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), pp.644-649. Phuket, Thailand, December 2014.

Tomoyuki Kajiwara, Kazuhide Yamamoto.
Noun Paraphrasing Based on a Variety of Contexts.
In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), pp.644-649. Phuket, Thailand, December 2014.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (9)

Publicité

Similaire à Noun Paraphrasing Based on a Variety of Contexts (20)

Plus par Tomoyuki Kajiwara (20)

Publicité

Plus récents (20)

Noun Paraphrasing Based on a Variety of Contexts

  1. 1. Noun Paraphrasing Based on a Variety of Contexts Tomoyuki Kajiwara and Kazuhide Yamamoto Nagaoka University of Technology, Japan
  2. 2. Abstract We propose a method to paraphrase nouns in consideration of the contexts. The Characteristic of Our Proposed Method –  It can paraphrase robust without the word frequency. •  Our Number of Differences based method is better than the Co-occurrence Frequency based method. –  It can paraphrase depending on the context. •  e.g. Reduce the burdens on the back. •  NoD : load, stress, damage, exhaustion, tense, etc. •  CoF : cost, expense, actual cost, etc. (money-related) 2
  3. 3. Teacher Instructor Lexical Paraphrasing, Lexical Substitution The different linguistic representation showing the same meaning. 3
  4. 4. Application of the Lexical Paraphrasing •  For Reading Assistance (Lexical Simplification) –  Never judge people by external appearance. –  Never judge people by outside appearance. •  For Machine Translation (pre-editing) –  その本なら書類の下にある It is under the papers if it is the book. –  その本 は 書類の下にある The book is under the papers. ✔ ✔ ✔ 4
  5. 5. Difficulty of the Lexical Paraphrasing •  Force someone to shoulder a huge increase in his financial burdens . –  Force someone to shoulder a huge increase in his financial costs . –  Force someone to shoulder a huge increase in his financial loads . •  Reduce the burdens on the back. –  Reduce the costs on the back. –  Reduce the loads on the back. ✔ ✔ It changes depending on the context whether paraphrasing is possible or impossible. 5
  6. 6. Input: Look for the access to the airport. Output: Look for the way to the airport. Approach restaurant market purpose transfer fee way bus transportation delivery look for the *** *** to the airport 1. way 2. transfer 3. fee To sort by the context similarity 6
  7. 7. Input: Look for the access to the airport. Output: Look for the way to the airport. Approach restaurant market purpose transfer fee way bus transportation delivery look for the *** *** to the airport 1. way 2. transfer 3. fee To sort by the context similarity To generate a proper sentence To select a suitable paraphrase 7
  8. 8. Proposed Method We propose a method to paraphrase nouns in consideration of the contexts. 1.  To extract candidate words used in the same context as the input sentence 2.  To calculate the similarity between the original and candidate words •  The number of differences of the context in the candidate word. •  The number of differences of the common context between the original and the candidate word. 3.  To select a candidate word with the maximum similarity as the paraphrase original → paraphrase 8
  9. 9. Proposed Method We propose a method to paraphrase nouns in consideration of the contexts. 1.  To extract candidate words used in the same context as the input sentence 2.  To calculate the similarity between the original and candidate words •  The number of differences of the context in the candidate word. •  The number of differences of the common context between the original and the candidate word. 3.  To select a candidate word with the maximum similarity as the paraphrase original → paraphrase 9
  10. 10. To extract candidate words •  To extract candidate words used in the same context •  But words used in the completely same context is hardly found ↓ •  On the basis of an object word access , an input sentence is divided into a pre- and a post-context. Look for the access to the airport. look for the *** *** to the airport pre- context post- context restaurant transfer market fee purpose way transfer bus fee transportation way delivery 10
  11. 11. To extract candidate words Look for the access to the airport. look for the *** *** to the airport pre- context post- context restaurant transfer market fee purpose way transfer bus fee transportation way delivery •  Words appearing in common may be used in the input sentence •  We can generate a proper sentence 11
  12. 12. Proposed Method We propose a method to paraphrase nouns in consideration of the contexts. 1.  To extract candidate words used in the same context as the input sentence 2.  To calculate the similarity between the original and candidate words •  The number of differences of the context in the candidate word. •  The number of differences of the common context between the original and the candidate word. 3.  To select a candidate word with the maximum similarity as the paraphrase original → paraphrase 12
  13. 13. To calculate similarity between words The larger number of differences of the common context between the original and the candidate word, the larger paraphrasability. 1 The larger number of differences of the context in the candidate word, the smaller paraphrasability.2 common(A, B): The number of differences of the common context between A and B difference(A): The number of differences of the context in A TNC: The total number of differences of the context 13 similarity(original,candidate) = common(original,candidate)× log( TNC difference(candidate) ) 1 2
  14. 14. tf(w): The number of occurrences of the word df(w): The number of documents occurring the word TND: The total number of documents common(A, B): The number of differences of the common context difference(A): The number of differences of the context TNC: The total number of differences of the context tf (word)× log( TND df (word) ) common(original,candidate)× log( TNC difference(candidate) ) TF-IDF 14 New Statistics: Number of Occurrences → Number of Differences
  15. 15. Proposed Method We propose a method to paraphrase nouns in consideration of the contexts. 1.  To extract candidate words used in the same context as the input sentence 2.  To calculate the similarity between the original and candidate words •  The number of differences of the context in the candidate word. •  The number of differences of the common context between the original and the candidate word. 3.  To select a candidate word with the maximum similarity as the paraphrase original → paraphrase 15
  16. 16. The characteristic of our proposed method •  Extraction –  We can generate a proper sentence based on the common contexts. •  Selection –  We can select a suitable paraphrase based on the number of differences of the context. To compare with the co-occurrence frequency and pointwise mutual information experimentally 16
  17. 17. Comparative Methods •  Marton et al. (2009) Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases. •  Bhagat and Ravichandran (2008) Large Scale Acquisition of Paraphrases for Learning Surface Patterns. 1.  Both of these methods generate a feature vector from contexts of the target word original . 2.  They calculate a cosine similarity between the feature vectors. 3.  They select a word with the maximum similarity as the paraphrase . 17
  18. 18. Comparative Methods •  [Marton 09]:Co-occurrence frequency based method •  [Bhagat 08]: Pointwise mutual information based method 1.  Both of these methods generate a feature vector from contexts of the target word original . 2.  They calculate a cosine similarity between the feature vectors. 3.  They select a word with the maximum similarity as the paraphrase . 18
  19. 19. Experimental setup •  Japanese –  In this experiment, we paraphrase for Japanese nouns. –  This approach is language-independent. •  Definition of a context –  We define the content words in the phrase which is dependency to a noun as context. Look for the access to the airport. 19
  20. 20. Experimental setup •  Web Japanese N-gram: To extract candidate words –  Japanese word N (1-7) grams. (We use 7-gram as sentence.) –  Each N-gram appears more than 20 times in the Web. –  We use 200 sentences in the following 1.3M sentences. •  Noun … Noun(paraphrase target) … Verb(original form). * Japanese is SOV language. •  Kyoto University case frame: To calculate similarity –  Japanese predicate and Japanese noun pairs from the Web. –  It is contained 34k predicates and 824k nouns. (We use all.) –  We define these predicates as context, and we calculate similarity between these nouns. 20
  21. 21. Number of paraphrasable nouns to the 1st place of similarity 21
  22. 22. Number of paraphrasable nouns to the 1st place of similarity High frequent words (e.g. こと(thing)) have a bad influence. Postfix words have a bad influence. (e.g. the word that describe the number of items) 22 The proposed method is robust because we don t depend on the word frequency.
  23. 23. Relationship by rank of similarity and number of paraphrasable nouns 23
  24. 24. Relationship by rank of similarity and number of paraphrasable nouns There are few differences. 24 Many paraphrase appear with rank 1.
  25. 25. Examples of the paraphrasing in consideration of context •  Assign a maximum penalty of N$. –  Comparative method: imprisonment, pecuniary penalty, etc. –  Our method: paying penalty, administrative penalty, etc. •  imprisonment does not appear as a candidate. •  Reduce the burdens on the back. –  Comparative method: cost, expenses, actual cost, etc. •  All of which are money-related. •  Any words listed within the top 10 are not appropriate. –  Our method: load, stress, damage, exhaustion, tense, etc. •  All of which are appropriate paraphrase in the context. 25
  26. 26. Conclusion We propose a method to paraphrase nouns in consideration of the contexts. 26 The Characteristic of Our Proposed Method –  It can paraphrase robust without the word frequency. •  Our Number of Differences based method is better than the Co-occurrence Frequency based method. –  It can paraphrase depending on the context. •  e.g. Reduce the burdens on the back. •  NoD : load, stress, damage, exhaustion, tense, etc. •  CoF : cost, expense, actual cost, etc. (money-related)

×