SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
What do Neural Machine
Translation Models Learn
about Morphology?
Yonatan Belinkov, Nadir Durrani, Fahim Dalvi,
Hassan Sajjad and James Glass
@ 8/11 ACL2017 Reading
M1 Hayahide Yamagishi
Introduction
● “Little is known about what and how much NMT models learn
about each language and its features.”
● They try to answer the following questions
1. Which parts of the NMT architecture capture word structure?
2. What is the division of labor between defferent components?
3. How do different word representation help learn better morphology and
modeling of infrequent words?
4. How does the target language affect the learning of word structure?
● Task: Part-of-Speech tagging and morphological tagging
2
Task
● Part-of-Speech (POS) tagging
○ computer → NN
○ computers → NNS
● Morphological tagging
○ he → 3, single, male, subject
○ him → 3, single, male, object
● Task: hidden states → tag
○ They would like to test each hidden state.
○ If the accuracy is high, hidden states learn about the word representation.
3
Methodology
1. Training the NMT models (Bahdanau attention, LSTM)
2. Using the trained models as a feature extractor.
3. Training the feedforward NN using the state-tag pairs
○ 1layer: input layer, hidden layer, output layer
4. Test
● “Our goal is not to beat the state-of-the-art on a given task.”
● “We also experimented with a linear classifier and observed
similar trends to the non-linear case.”
4
Data
● Language Pair:
○ {Arabic, German, French, Czech} - English
○ Arabic - Hebrew (Both languages are morphologically-rich and similar.)
○ Arabic - German (Both languages are morphologically-rich but different.)
● Parallel corpus: TED
● POS annotated data
○ Gold: included in some datasets
○ Predict: from the free taggers
5
Char-base Encoder
● Character-aware Neural Language
Model [Kim+, AAAI2016]
● Character-based Neural Machine
Translation [Costa-jussa and Fonollosa,
ACL2016]
● Character embedding
→ word embedding
● Obtained word embeddings are inputted
into the word-based RNN-LM.
6
Effect of word representation (Encoder)
● Word-based vs. Char-based model
● Char-based models are stronger.
7
Impact of word frequency
● Frequent words don’t need the character information.
● “The char-based model is able to learn character n-gram
patterns that are important for identifying word structure.”
8
Confusion matrices
9
Analyzing specific tags
● Arabic → Determiner “Al-” becomes a prefix.
● Char-based model can distinguish “DT+NNS” from “NNS”.
10
Effect of encoder depth
● LSTM carries the context information → Layer 0 is worse.
● States from layer 1 is more effective than states from layer 2.
11
Effect of encoder depth
● Char-based models have the similar tendencies.
12
Effect of encoder depth
● BLEU: 2-layer NMT > 1-layer NMT
○ word / char : +1.11 / +0.56
● Layer 1 learns the word representation
● Layer 2 learns the word meaning
● Word representation < word representation + word meaning
13
Effect of target language
● Translating into morphologically-rich language is harder.
○ Arabic-English: 24.69
○ English-Arabic: 13.37
● “How does the target language affect the learned source
language representations?”
○ “Does translating into a morphologically-rich language require more
knowledge about source language morphology?”
● Experiment: Arabic - {Arabic, Hebrew, German, English}
○ Arabic-Arabic: Autoencoder
14
Result
15
Effect of target languages
● They expected translating into morph-rich languages would
make the model learn more about morphology. → No
● The accuracy doesn’t correlate with the BLEU score
○ Autoencoder couldn’t learn the morphological representation.
○ If the model only works as a recreator, it doesn’t have to learn it.
○ “A better translation model learns more informative representation.”
● Possible explanation
○ Arabic-English is simply better than -Hebrew and -German.
○ These models may not be able to afford to understand the representations of
word structure.
16
Decoder Analysis
● Similar experiments
○ Decoder’s input is the correct previous word.
○ Char-based decoder’s input is the char-based representation.
○ Char-based decoder’s output is the word-level.
● Arabic-English or English-Arabic
17
Effect of decoder states
● Decoder states doesn’t have a morphological information.
● BLEU doesn’t correlate the accuracy
○ French-English: 37.8 BLEU / 54.26% accuracy
18
Effect of attention
● Encoder states
○ Task: creating a generic, close to language-independent representation of
source sentence .
○ When the attention is attached, these are treated as a memo.
○ When the model translates the noun, the attention sees the noun words.
● Decoder states
○ Task: using encoder’s representation to generate the target sentence in a
specific language.
○ “Without the attention mechanism, the decoder is forced to learn more
informative representations of the target language.”
19
Effect of word representation (Decoder)
● Char-based representations don’t hep the decoder
○ The decoder’s predictions are still done at word level.
○ “In Arabic-English the char-based model reduces the number of generated
unknown words in the MT test set by 25%.”
○ “In English-Arabic the number of unknown words remains roughly the same
between word-based and char-based models.”
20
Conclusion
● Their results lead to the following conclusions
○ Char-based representations are better than word-based ones
○ Lower layers captures morphology, while deeper layers improve translation
performance.
○ Translating into morphologically-poorer languages leads to better source
representations.
○ The attentional decoder learns impoverished representations that do not
carry much information about morphology.
● “Jointly learning translation and morphology can possibly
lead to better representations and improved translation.”
21

Contenu connexe

Tendances

PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 
PL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopePL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopeSchwannden Kuo
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP TasksMasahiro Kaneko
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Automatic text simplification evaluation aspects
Automatic text simplification  evaluation aspectsAutomatic text simplification  evaluation aspects
Automatic text simplification evaluation aspectsiwan_rg
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Family Tree on PROLOG
Family Tree on PROLOGFamily Tree on PROLOG
Family Tree on PROLOGAbdul Rafay
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...Francisco Manuel Rangel Pardo
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 

Tendances (20)

PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
PL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopePL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and Scope
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Automatic text simplification evaluation aspects
Automatic text simplification  evaluation aspectsAutomatic text simplification  evaluation aspects
Automatic text simplification evaluation aspects
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Language models
Language modelsLanguage models
Language models
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Family Tree on PROLOG
Family Tree on PROLOGFamily Tree on PROLOG
Family Tree on PROLOG
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 

Similaire à [ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101Jaemin Cho
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMariYam371004
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesZoltan Varju
 
Shedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and IdiomsShedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and IdiomsMia Mohammad Imran
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS PosterTaha Merghani
 
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxShivaprasad787526
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...Isabelle Augenstein
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 

Similaire à [ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology? (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptx
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Shedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and IdiomsShedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and Idioms
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS Poster
 
Esa act
Esa actEsa act
Esa act
 
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 

Plus de Hayahide Yamagishi

[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...Hayahide Yamagishi
 
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳Hayahide Yamagishi
 
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...Hayahide Yamagishi
 
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...Hayahide Yamagishi
 
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive SummarizationHayahide Yamagishi
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine TranslationHayahide Yamagishi
 
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and ComprehendHayahide Yamagishi
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory RepresentationHayahide Yamagishi
 
[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNsHayahide Yamagishi
 
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析Hayahide Yamagishi
 
Why neural translations are the right length
Why neural translations are  the right lengthWhy neural translations are  the right length
Why neural translations are the right lengthHayahide Yamagishi
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsHayahide Yamagishi
 
ニューラル論文を読む前に
ニューラル論文を読む前にニューラル論文を読む前に
ニューラル論文を読む前にHayahide Yamagishi
 
ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御Hayahide Yamagishi
 
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine TranslationHayahide Yamagishi
 
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...Hayahide Yamagishi
 

Plus de Hayahide Yamagishi (16)

[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
 
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
 
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
 
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
 
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
 
[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs
 
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
 
Why neural translations are the right length
Why neural translations are  the right lengthWhy neural translations are  the right length
Why neural translations are the right length
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
 
ニューラル論文を読む前に
ニューラル論文を読む前にニューラル論文を読む前に
ニューラル論文を読む前に
 
ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御
 
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
 
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
 

Dernier

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 

Dernier (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 

[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?

  • 1. What do Neural Machine Translation Models Learn about Morphology? Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad and James Glass @ 8/11 ACL2017 Reading M1 Hayahide Yamagishi
  • 2. Introduction ● “Little is known about what and how much NMT models learn about each language and its features.” ● They try to answer the following questions 1. Which parts of the NMT architecture capture word structure? 2. What is the division of labor between defferent components? 3. How do different word representation help learn better morphology and modeling of infrequent words? 4. How does the target language affect the learning of word structure? ● Task: Part-of-Speech tagging and morphological tagging 2
  • 3. Task ● Part-of-Speech (POS) tagging ○ computer → NN ○ computers → NNS ● Morphological tagging ○ he → 3, single, male, subject ○ him → 3, single, male, object ● Task: hidden states → tag ○ They would like to test each hidden state. ○ If the accuracy is high, hidden states learn about the word representation. 3
  • 4. Methodology 1. Training the NMT models (Bahdanau attention, LSTM) 2. Using the trained models as a feature extractor. 3. Training the feedforward NN using the state-tag pairs ○ 1layer: input layer, hidden layer, output layer 4. Test ● “Our goal is not to beat the state-of-the-art on a given task.” ● “We also experimented with a linear classifier and observed similar trends to the non-linear case.” 4
  • 5. Data ● Language Pair: ○ {Arabic, German, French, Czech} - English ○ Arabic - Hebrew (Both languages are morphologically-rich and similar.) ○ Arabic - German (Both languages are morphologically-rich but different.) ● Parallel corpus: TED ● POS annotated data ○ Gold: included in some datasets ○ Predict: from the free taggers 5
  • 6. Char-base Encoder ● Character-aware Neural Language Model [Kim+, AAAI2016] ● Character-based Neural Machine Translation [Costa-jussa and Fonollosa, ACL2016] ● Character embedding → word embedding ● Obtained word embeddings are inputted into the word-based RNN-LM. 6
  • 7. Effect of word representation (Encoder) ● Word-based vs. Char-based model ● Char-based models are stronger. 7
  • 8. Impact of word frequency ● Frequent words don’t need the character information. ● “The char-based model is able to learn character n-gram patterns that are important for identifying word structure.” 8
  • 10. Analyzing specific tags ● Arabic → Determiner “Al-” becomes a prefix. ● Char-based model can distinguish “DT+NNS” from “NNS”. 10
  • 11. Effect of encoder depth ● LSTM carries the context information → Layer 0 is worse. ● States from layer 1 is more effective than states from layer 2. 11
  • 12. Effect of encoder depth ● Char-based models have the similar tendencies. 12
  • 13. Effect of encoder depth ● BLEU: 2-layer NMT > 1-layer NMT ○ word / char : +1.11 / +0.56 ● Layer 1 learns the word representation ● Layer 2 learns the word meaning ● Word representation < word representation + word meaning 13
  • 14. Effect of target language ● Translating into morphologically-rich language is harder. ○ Arabic-English: 24.69 ○ English-Arabic: 13.37 ● “How does the target language affect the learned source language representations?” ○ “Does translating into a morphologically-rich language require more knowledge about source language morphology?” ● Experiment: Arabic - {Arabic, Hebrew, German, English} ○ Arabic-Arabic: Autoencoder 14
  • 16. Effect of target languages ● They expected translating into morph-rich languages would make the model learn more about morphology. → No ● The accuracy doesn’t correlate with the BLEU score ○ Autoencoder couldn’t learn the morphological representation. ○ If the model only works as a recreator, it doesn’t have to learn it. ○ “A better translation model learns more informative representation.” ● Possible explanation ○ Arabic-English is simply better than -Hebrew and -German. ○ These models may not be able to afford to understand the representations of word structure. 16
  • 17. Decoder Analysis ● Similar experiments ○ Decoder’s input is the correct previous word. ○ Char-based decoder’s input is the char-based representation. ○ Char-based decoder’s output is the word-level. ● Arabic-English or English-Arabic 17
  • 18. Effect of decoder states ● Decoder states doesn’t have a morphological information. ● BLEU doesn’t correlate the accuracy ○ French-English: 37.8 BLEU / 54.26% accuracy 18
  • 19. Effect of attention ● Encoder states ○ Task: creating a generic, close to language-independent representation of source sentence . ○ When the attention is attached, these are treated as a memo. ○ When the model translates the noun, the attention sees the noun words. ● Decoder states ○ Task: using encoder’s representation to generate the target sentence in a specific language. ○ “Without the attention mechanism, the decoder is forced to learn more informative representations of the target language.” 19
  • 20. Effect of word representation (Decoder) ● Char-based representations don’t hep the decoder ○ The decoder’s predictions are still done at word level. ○ “In Arabic-English the char-based model reduces the number of generated unknown words in the MT test set by 25%.” ○ “In English-Arabic the number of unknown words remains roughly the same between word-based and char-based models.” 20
  • 21. Conclusion ● Their results lead to the following conclusions ○ Char-based representations are better than word-based ones ○ Lower layers captures morphology, while deeper layers improve translation performance. ○ Translating into morphologically-poorer languages leads to better source representations. ○ The attentional decoder learns impoverished representations that do not carry much information about morphology. ● “Jointly learning translation and morphology can possibly lead to better representations and improved translation.” 21