Tomoyuki Kajiwara, Kazuhide Yamamoto.
Noun Paraphrasing Based on a Variety of Contexts.
In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), pp.644-649. Phuket, Thailand, December 2014.
Tomoyuki Kajiwara, Kazuhide Yamamoto.
Noun Paraphrasing Based on a Variety of Contexts.
In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), pp.644-649. Phuket, Thailand, December 2014.
1.
Noun Paraphrasing
Based on a Variety of Contexts
Tomoyuki Kajiwara and Kazuhide Yamamoto
Nagaoka University of Technology, Japan
2.
Abstract
We propose a method to paraphrase nouns
in consideration of the contexts.
The Characteristic of Our Proposed Method
– It can paraphrase robust without the word frequency.
• Our Number of Differences based method is better
than the Co-occurrence Frequency based method.
– It can paraphrase depending on the context.
• e.g. Reduce the burdens on the back.
• NoD : load, stress, damage, exhaustion, tense, etc.
• CoF : cost, expense, actual cost, etc. (money-related)
2
3.
Teacher
Instructor
Lexical Paraphrasing, Lexical Substitution
The different linguistic representation
showing the same meaning.
3
4.
Application of the Lexical Paraphrasing
• For Reading Assistance (Lexical Simplification)
– Never judge people by external appearance.
– Never judge people by outside appearance.
• For Machine Translation (pre-editing)
– その本なら書類の下にある
It is under the papers if it is the book.
– その本 は 書類の下にある
The book is under the papers. ✔
✔
✔
4
5.
Difficulty of the Lexical Paraphrasing
• Force someone to shoulder a huge increase in his
financial burdens .
– Force someone to shoulder a huge increase in
his financial costs .
– Force someone to shoulder a huge increase in
his financial loads .
• Reduce the burdens on the back.
– Reduce the costs on the back.
– Reduce the loads on the back.
✔
✔
It changes depending on the context
whether paraphrasing is possible or impossible.
5
6.
Input: Look for the access to the airport.
Output: Look for the way to the airport.
Approach
restaurant
market
purpose
transfer
fee
way
bus
transportation
delivery
look for the *** *** to the airport
1. way 2. transfer 3. fee
To sort by the context similarity
6
7.
Input: Look for the access to the airport.
Output: Look for the way to the airport.
Approach
restaurant
market
purpose
transfer
fee
way
bus
transportation
delivery
look for the *** *** to the airport
1. way 2. transfer 3. fee
To sort by the context similarity
To generate a proper sentence
To select a suitable paraphrase
7
8.
Proposed Method
We propose a method to paraphrase nouns
in consideration of the contexts.
1. To extract candidate words
used in the same context as the input sentence
2. To calculate the similarity
between the original and candidate words
• The number of differences of the context
in the candidate word.
• The number of differences of the common context
between the original and the candidate word.
3. To select a candidate word
with the maximum similarity as the paraphrase
original → paraphrase
8
9.
Proposed Method
We propose a method to paraphrase nouns
in consideration of the contexts.
1. To extract candidate words
used in the same context as the input sentence
2. To calculate the similarity
between the original and candidate words
• The number of differences of the context
in the candidate word.
• The number of differences of the common context
between the original and the candidate word.
3. To select a candidate word
with the maximum similarity as the paraphrase
original → paraphrase
9
10.
To extract candidate words
• To extract candidate words used in the same context
• But words used in the completely same context is hardly found
↓
• On the basis of an object word access ,
an input sentence is divided into a pre- and a post-context.
Look for the access to the airport.
look for the *** *** to the airport
pre-
context
post-
context
restaurant transfer
market fee
purpose way
transfer bus
fee transportation
way delivery
10
11.
To extract candidate words
Look for the access to the airport.
look for the *** *** to the airport
pre-
context
post-
context
restaurant transfer
market fee
purpose way
transfer bus
fee transportation
way delivery
• Words appearing in common
may be used in the input sentence
• We can generate a proper sentence
11
12.
Proposed Method
We propose a method to paraphrase nouns
in consideration of the contexts.
1. To extract candidate words
used in the same context as the input sentence
2. To calculate the similarity
between the original and candidate words
• The number of differences of the context
in the candidate word.
• The number of differences of the common context
between the original and the candidate word.
3. To select a candidate word
with the maximum similarity as the paraphrase
original → paraphrase
12
13.
To calculate similarity between words
The larger number of differences of the common context
between the original and the candidate word,
the larger paraphrasability.
1
The larger number of differences of the context
in the candidate word, the smaller paraphrasability.2
common(A, B): The number of differences of the common context between A and B
difference(A): The number of differences of the context in A
TNC: The total number of differences of the context
13
similarity(original,candidate) =
common(original,candidate)× log(
TNC
difference(candidate)
)
1 2
14.
tf(w): The number of occurrences of the word
df(w): The number of documents occurring the word
TND: The total number of documents
common(A, B): The number of differences of the common context
difference(A): The number of differences of the context
TNC: The total number of differences of the context
tf (word)× log(
TND
df (word)
)
common(original,candidate)× log(
TNC
difference(candidate)
)
TF-IDF
14
New Statistics:
Number of Occurrences → Number of Differences
15.
Proposed Method
We propose a method to paraphrase nouns
in consideration of the contexts.
1. To extract candidate words
used in the same context as the input sentence
2. To calculate the similarity
between the original and candidate words
• The number of differences of the context
in the candidate word.
• The number of differences of the common context
between the original and the candidate word.
3. To select a candidate word
with the maximum similarity as the paraphrase
original → paraphrase
15
16.
The characteristic of our proposed method
• Extraction
– We can generate a proper sentence
based on the common contexts.
• Selection
– We can select a suitable paraphrase
based on the number of differences of the context.
To compare with the co-occurrence frequency
and pointwise mutual information experimentally
16
17.
Comparative Methods
• Marton et al. (2009) Improved Statistical Machine Translation
Using Monolingually-Derived Paraphrases.
• Bhagat and Ravichandran (2008) Large Scale Acquisition of
Paraphrases for Learning Surface Patterns.
1. Both of these methods generate a feature vector
from contexts of the target word original .
2. They calculate a cosine similarity
between the feature vectors.
3. They select a word with the maximum similarity
as the paraphrase .
17
18.
Comparative Methods
• [Marton 09]:Co-occurrence frequency based method
• [Bhagat 08]: Pointwise mutual information based method
1. Both of these methods generate a feature vector
from contexts of the target word original .
2. They calculate a cosine similarity
between the feature vectors.
3. They select a word with the maximum similarity
as the paraphrase .
18
19.
Experimental setup
• Japanese
– In this experiment, we paraphrase for Japanese nouns.
– This approach is language-independent.
• Definition of a context
– We define the content words in the phrase which is
dependency to a noun as context.
Look for the access to the airport.
19
20.
Experimental setup
• Web Japanese N-gram: To extract candidate words
– Japanese word N (1-7) grams. (We use 7-gram as sentence.)
– Each N-gram appears more than 20 times in the Web.
– We use 200 sentences in the following 1.3M sentences.
• Noun … Noun(paraphrase target) … Verb(original form).
* Japanese is SOV language.
• Kyoto University case frame: To calculate similarity
– Japanese predicate and Japanese noun pairs from the Web.
– It is contained 34k predicates and 824k nouns. (We use all.)
– We define these predicates as context,
and we calculate similarity between these nouns.
20
21.
Number of paraphrasable nouns
to the 1st place of similarity
21
22.
Number of paraphrasable nouns
to the 1st place of similarity
High frequent words (e.g. こと(thing)) have a bad influence.
Postfix words have a bad influence.
(e.g. the word that describe the number of items)
22
The proposed method is robust
because we don t depend on the word frequency.
23.
Relationship by rank of similarity
and number of paraphrasable nouns
23
24.
Relationship by rank of similarity
and number of paraphrasable nouns
There are few differences.
24
Many paraphrase appear with rank 1.
25.
Examples of the paraphrasing
in consideration of context
• Assign a maximum penalty of N$.
– Comparative method: imprisonment, pecuniary penalty, etc.
– Our method: paying penalty, administrative penalty, etc.
• imprisonment does not appear as a candidate.
• Reduce the burdens on the back.
– Comparative method: cost, expenses, actual cost, etc.
• All of which are money-related.
• Any words listed within the top 10 are not appropriate.
– Our method: load, stress, damage, exhaustion, tense, etc.
• All of which are appropriate paraphrase in the context.
25
26.
Conclusion
We propose a method to paraphrase nouns
in consideration of the contexts.
26
The Characteristic of Our Proposed Method
– It can paraphrase robust without the word frequency.
• Our Number of Differences based method is better
than the Co-occurrence Frequency based method.
– It can paraphrase depending on the context.
• e.g. Reduce the burdens on the back.
• NoD : load, stress, damage, exhaustion, tense, etc.
• CoF : cost, expense, actual cost, etc. (money-related)
Il semblerait que vous ayez déjà ajouté cette diapositive à .
Créer un clipboard
Vous avez clippé votre première diapositive !
En clippant ainsi les diapos qui vous intéressent, vous pourrez les revoir plus tard. Personnalisez le nom d’un clipboard pour mettre de côté vos diapositives.
Créer un clipboard
Partager ce SlideShare
Vous avez les pubs en horreur?
Obtenez SlideShare sans publicité
Bénéficiez d'un accès à des millions de présentations, documents, e-books, de livres audio, de magazines et bien plus encore, sans la moindre publicité.
Offre spéciale pour les lecteurs de SlideShare
Juste pour vous: Essai GRATUIT de 60 jours dans la plus grande bibliothèque numérique du monde.
La famille SlideShare vient de s'agrandir. Profitez de l'accès à des millions de livres numériques, livres audio, magazines et bien plus encore sur Scribd.
Apparemment, vous utilisez un bloqueur de publicités qui est en cours d'exécution. En ajoutant SlideShare à la liste blanche de votre bloqueur de publicités, vous soutenez notre communauté de créateurs de contenu.
Vous détestez les publicités?
Nous avons mis à jour notre politique de confidentialité.
Nous avons mis à jour notre politique de confidentialité pour nous conformer à l'évolution des réglementations mondiales en matière de confidentialité et pour vous informer de la manière dont nous utilisons vos données de façon limitée.
Vous pouvez consulter les détails ci-dessous. En cliquant sur Accepter, vous acceptez la politique de confidentialité mise à jour.