SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
“Dual Learning for Machine Translation”
Di He et al.
2016年1⽉
Toru Fujino
東⼤ 新領域 ⼈間環境学 陳研究室 D1
Paper information
• Authors: Di He et al. (Microsoft Research Asia)
• Conference: NIPS 2016
• Date: 11/01/2016 (arxiv)
• Times cited: 1
Overview
• What
• Introduce an autoencoder-like mechanism, “Dual learning”,
to utilize monolingual datasets
• Results
• Dual Learning with 10% data ≈ Baseline model with 100% data
1)	“Dual	Learning:	A	New	Learning	Paradigm”,	https://www.youtube.com/watch?v=HzokNo3g63E&feature=youtu.be
1)
1)
Neural machine translation
• Learn conditional probability 𝑃(𝑦|𝑥; Θ) from a input
𝑥 = {𝑥,, 𝑥., … , 𝑥01
} to an output 𝑦 = {𝑦,, 𝑦., … , 𝑦03
}
• Maximize the log probability
Θ∗
= argmax	 ; ; log 𝑃(𝑦>|𝑦?>, 𝑥; Θ)
03
>@,B,C ∈E
Difficulty in getting large bilingual data
• Solution: utilization of monolingual data
• Train a language model of the target language, and then
integrate it with the MT model1)2)
<- does not fundamentally address the shortage of
parallel data.
• Generate pesudo bilingual data from monolingual data3)4)
<- no guarantee on the quality of the pesudo bilingual data
1) T.	Brants et	al.,	“Large	language	models	in	machine	translation”,	EMNLP	2007
2) C.	Gucehre et	al.,	“On	using	monolingual	corpora	in	neural	machine	translation”,	arix 2015
3) R.	Sennrich et	al.,	“Improving	neural	machine	translation	models	with	monolingual	data”,	ACL	2016
4) N.	Ueffing et	al.,	“Semi-supervised	model	adaptation	for	statistical	machine	translation”,	Machine	Translation	Journal	2008
Dual learning algorithm
• Use monolingual datasets to train translation
models through dual learning
• Things required
𝐷G:	corpus of language A
𝐷I: corpus of language B (not necessarily aligned with 𝐷G)
𝑃(. |𝑠; ΘGI): translation model from A to B
𝑃(. |𝑠; 𝛩IG): translation model from B to A
𝐿𝑀G . : learned language model of A
𝐿𝑀I . : learned language model of B
Dual learning algorithm
1. Generate 𝐾 translated sentences
𝑠PQR,,, 𝑠PQR,., … , 𝑠PQR,S
from 𝑃 . 𝑠; ΘTU based on beam search
Dual learning algorithm
1. Generate 𝐾 translated sentences
𝑠PQR,,, 𝑠PQR,., … , 𝑠PQR,S
from 𝑃 . 𝑠; ΘTU based on beam search
2. Compute intermediate rewards
𝑟,,,, 𝑟,,., … , 𝑟,,S
from 𝐿𝑀I(𝑠PQR,W) for each sentence as
𝑟,,W = 𝐿𝑀I(𝑠PQR,W)
Dual learning algorithm
3. Get communication rewards
𝑟.,,, 𝑟.,., … , 𝑟.,W
for each sentence as 𝑟.,W = ln 𝑃(𝑠|𝑠PQR,W; ΘUT)
Dual learning algorithm
3. Get communication rewards
𝑟.,,, 𝑟.,., … , 𝑟.,W
for each sentence as 𝑟.,W = ln 𝑃(𝑠|𝑠PQR,W; ΘUT)
4. Set the total reward of k-th sentence as
𝑟W = 𝛼𝑟,,W + 1 − 𝛼 𝑟.,W
Dual learning algorithm
5. Compute the stochastic gradient of ΘGI and ΘTU
𝛻^_`
𝐸 𝑟 =
1
𝐾
;[𝑟W∇TU ln 𝑃(𝑠PQR,W|𝑠; ΘGI)]
S
W@,
𝛻^`_
𝐸 𝑟 =
1
𝐾
;[(1 − 𝛼)∇IG ln 𝑃(𝑠PQR,W|𝑠; ΘIG)]
S
W@,
Dual learning algorithm
5. Compute the stochastic gradient of ΘGI and ΘTU
𝛻^_`
𝐸 𝑟 =
1
𝐾
;[𝑟W∇TU ln 𝑃(𝑠PQR,W|𝑠; ΘGI)]
S
W@,
𝛻^`_
𝐸 𝑟 =
1
𝐾
;[(1 − 𝛼)∇IG ln 𝑃(𝑠PQR,W|𝑠; ΘIG)]
S
W@,
6. Update model parameters
ΘGI ← ΘGI + 𝛾,∇g_`
𝐸[𝑟]
ΘIG ← ΘIG + 𝛾.∇g`_
𝐸[𝑟]
Dual learning algorithm
Experiment settings
• Baseline models
• Bahdanau et al., “Neural Machine Translation by Jointly
Learning to Align and Translate”
• Sennrich et al., “Improving Neural Machine Translation
Models with Monolingual Data”
Dataset
• WMTʼ14
• 12M sentence pairs
• English -> French, French -> English
• Data usage (for dual learning)
• Small
1. Train translation models with 10% bilingual data.
2. Train translation models with 10% bilingual data and
monolingual data through dual learning algorithm.
3. Train translation models only with monolingual data through dual
learning algorithm.
• Large
1. Train translation models with 100% bilingual data.
2. Train translation models with 100% bilingual data.
3. Train translation models only with monolingual data through dual
learning algorithm.
Evaluation
• BLEU: geometric mean of n-gram precision
Results
• Outperform the base line models
• In Fr->En, dual learning with 10% data ≈ baseline
models with 100% data.
• Dual learning is effective especially in a small dataset.
Results
• For different source sentence length
• Improvement is significant for long sentences.
Results
• Reconstruction performance (BLEU)
• Huge improvement from baseline models, especially in
En->Fr-En(S)
Results
• Reconstruction examples
Future extensions & words
• Application in other domains
• Generalization of dual learning
• Dual -> Triple -> … -> n-loop
• Learn from scratch
• only with monolingual data
• maybe plus lexical dictionary
Application Primal	task Dual	task
Speech	processing Speech recognition Text	to	speech
Image	understanding Image	captioning Image	generation
Conversation engine Question Response
Search engine Search
Query/Keyword	
suggestion
Summary
• What
• Introduce “Dual learning algorithm” to utilize
monolingual data
• Results
• With 100% data, the model outperforms the baseline
models
• With 10% data, the model shows the comparable result
with the baseline models
• Future
• Dual learning mechanism can be applied to other
domains
• Learn from scratch
Some notes
• Dual Learning does not learn word-to-word
correspondences?
• Training from bilingual data is a must?
• Or lexical dictionary
Appendix: Stochastic gradient of models

Contenu connexe

Tendances

(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...Joonhyung Lee
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Kazuto Fukuchi
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowEtsuji Nakai
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowAndrew Ferlitsch
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Tensor flow (1)
Tensor flow (1)Tensor flow (1)
Tensor flow (1)景逸 王
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Alessio Tonioni
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowPaolo Tomeo
 

Tendances (20)

(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
 
Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Tensor flow (1)
Tensor flow (1)Tensor flow (1)
Tensor flow (1)
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 

En vedette

Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmKatsuki Ohto
 
時系列データ3
時系列データ3時系列データ3
時系列データ3graySpace999
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentHiroyuki Fukuda
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoderssuga93
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansKimikazu Kato
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence LearningDeep Learning JP
 
NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics  NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics Koichi Hamada
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...Kusano Hitoshi
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介Kohei Hayashi
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 

En vedette (11)

Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
時系列データ3
時系列データ3時系列データ3
時系列データ3
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descent
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning
 
NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics  NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 

Similaire à Dual Learning for Machine Translation (NIPS 2016)

Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problemJaeHo Jang
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.pptbutest
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmtJAEMINJEONG5
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...広樹 本間
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...NU_I_TODALAB
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docxevonnehoggarth79783
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...Jinho Choi
 

Similaire à Dual Learning for Machine Translation (NIPS 2016) (20)

Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Voice Cloning
Voice CloningVoice Cloning
Voice Cloning
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
 
Word embedding
Word embedding Word embedding
Word embedding
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Learning
LearningLearning
Learning
 
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
 

Plus de Toru Fujino

実世界のゲームにおける推移性と非推移性
実世界のゲームにおける推移性と非推移性実世界のゲームにおける推移性と非推移性
実世界のゲームにおける推移性と非推移性Toru Fujino
 
Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)
Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)
Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)Toru Fujino
 
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...Toru Fujino
 
Language as a Latent Variable: Discrete Generative Models for Sentence Compre...
Language as a Latent Variable: Discrete Generative Models for Sentence Compre...Language as a Latent Variable: Discrete Generative Models for Sentence Compre...
Language as a Latent Variable: Discrete Generative Models for Sentence Compre...Toru Fujino
 
Controllable Text Generation (ICML 2017 under review)
Controllable Text Generation (ICML 2017 under review)Controllable Text Generation (ICML 2017 under review)
Controllable Text Generation (ICML 2017 under review)Toru Fujino
 
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Toru Fujino
 

Plus de Toru Fujino (6)

実世界のゲームにおける推移性と非推移性
実世界のゲームにおける推移性と非推移性実世界のゲームにおける推移性と非推移性
実世界のゲームにおける推移性と非推移性
 
Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)
Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)
Generating Wikipedia by Summarizing Long Sequences (ICLR 2018)
 
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
 
Language as a Latent Variable: Discrete Generative Models for Sentence Compre...
Language as a Latent Variable: Discrete Generative Models for Sentence Compre...Language as a Latent Variable: Discrete Generative Models for Sentence Compre...
Language as a Latent Variable: Discrete Generative Models for Sentence Compre...
 
Controllable Text Generation (ICML 2017 under review)
Controllable Text Generation (ICML 2017 under review)Controllable Text Generation (ICML 2017 under review)
Controllable Text Generation (ICML 2017 under review)
 
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
 

Dernier

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Dual Learning for Machine Translation (NIPS 2016)

  • 1. “Dual Learning for Machine Translation” Di He et al. 2016年1⽉ Toru Fujino 東⼤ 新領域 ⼈間環境学 陳研究室 D1
  • 2. Paper information • Authors: Di He et al. (Microsoft Research Asia) • Conference: NIPS 2016 • Date: 11/01/2016 (arxiv) • Times cited: 1
  • 3. Overview • What • Introduce an autoencoder-like mechanism, “Dual learning”, to utilize monolingual datasets • Results • Dual Learning with 10% data ≈ Baseline model with 100% data 1) “Dual Learning: A New Learning Paradigm”, https://www.youtube.com/watch?v=HzokNo3g63E&feature=youtu.be 1) 1)
  • 4. Neural machine translation • Learn conditional probability 𝑃(𝑦|𝑥; Θ) from a input 𝑥 = {𝑥,, 𝑥., … , 𝑥01 } to an output 𝑦 = {𝑦,, 𝑦., … , 𝑦03 } • Maximize the log probability Θ∗ = argmax ; ; log 𝑃(𝑦>|𝑦?>, 𝑥; Θ) 03 >@,B,C ∈E
  • 5. Difficulty in getting large bilingual data • Solution: utilization of monolingual data • Train a language model of the target language, and then integrate it with the MT model1)2) <- does not fundamentally address the shortage of parallel data. • Generate pesudo bilingual data from monolingual data3)4) <- no guarantee on the quality of the pesudo bilingual data 1) T. Brants et al., “Large language models in machine translation”, EMNLP 2007 2) C. Gucehre et al., “On using monolingual corpora in neural machine translation”, arix 2015 3) R. Sennrich et al., “Improving neural machine translation models with monolingual data”, ACL 2016 4) N. Ueffing et al., “Semi-supervised model adaptation for statistical machine translation”, Machine Translation Journal 2008
  • 6. Dual learning algorithm • Use monolingual datasets to train translation models through dual learning • Things required 𝐷G: corpus of language A 𝐷I: corpus of language B (not necessarily aligned with 𝐷G) 𝑃(. |𝑠; ΘGI): translation model from A to B 𝑃(. |𝑠; 𝛩IG): translation model from B to A 𝐿𝑀G . : learned language model of A 𝐿𝑀I . : learned language model of B
  • 7. Dual learning algorithm 1. Generate 𝐾 translated sentences 𝑠PQR,,, 𝑠PQR,., … , 𝑠PQR,S from 𝑃 . 𝑠; ΘTU based on beam search
  • 8. Dual learning algorithm 1. Generate 𝐾 translated sentences 𝑠PQR,,, 𝑠PQR,., … , 𝑠PQR,S from 𝑃 . 𝑠; ΘTU based on beam search 2. Compute intermediate rewards 𝑟,,,, 𝑟,,., … , 𝑟,,S from 𝐿𝑀I(𝑠PQR,W) for each sentence as 𝑟,,W = 𝐿𝑀I(𝑠PQR,W)
  • 9. Dual learning algorithm 3. Get communication rewards 𝑟.,,, 𝑟.,., … , 𝑟.,W for each sentence as 𝑟.,W = ln 𝑃(𝑠|𝑠PQR,W; ΘUT)
  • 10. Dual learning algorithm 3. Get communication rewards 𝑟.,,, 𝑟.,., … , 𝑟.,W for each sentence as 𝑟.,W = ln 𝑃(𝑠|𝑠PQR,W; ΘUT) 4. Set the total reward of k-th sentence as 𝑟W = 𝛼𝑟,,W + 1 − 𝛼 𝑟.,W
  • 11. Dual learning algorithm 5. Compute the stochastic gradient of ΘGI and ΘTU 𝛻^_` 𝐸 𝑟 = 1 𝐾 ;[𝑟W∇TU ln 𝑃(𝑠PQR,W|𝑠; ΘGI)] S W@, 𝛻^`_ 𝐸 𝑟 = 1 𝐾 ;[(1 − 𝛼)∇IG ln 𝑃(𝑠PQR,W|𝑠; ΘIG)] S W@,
  • 12. Dual learning algorithm 5. Compute the stochastic gradient of ΘGI and ΘTU 𝛻^_` 𝐸 𝑟 = 1 𝐾 ;[𝑟W∇TU ln 𝑃(𝑠PQR,W|𝑠; ΘGI)] S W@, 𝛻^`_ 𝐸 𝑟 = 1 𝐾 ;[(1 − 𝛼)∇IG ln 𝑃(𝑠PQR,W|𝑠; ΘIG)] S W@, 6. Update model parameters ΘGI ← ΘGI + 𝛾,∇g_` 𝐸[𝑟] ΘIG ← ΘIG + 𝛾.∇g`_ 𝐸[𝑟]
  • 14. Experiment settings • Baseline models • Bahdanau et al., “Neural Machine Translation by Jointly Learning to Align and Translate” • Sennrich et al., “Improving Neural Machine Translation Models with Monolingual Data”
  • 15. Dataset • WMTʼ14 • 12M sentence pairs • English -> French, French -> English • Data usage (for dual learning) • Small 1. Train translation models with 10% bilingual data. 2. Train translation models with 10% bilingual data and monolingual data through dual learning algorithm. 3. Train translation models only with monolingual data through dual learning algorithm. • Large 1. Train translation models with 100% bilingual data. 2. Train translation models with 100% bilingual data. 3. Train translation models only with monolingual data through dual learning algorithm.
  • 16. Evaluation • BLEU: geometric mean of n-gram precision
  • 17. Results • Outperform the base line models • In Fr->En, dual learning with 10% data ≈ baseline models with 100% data. • Dual learning is effective especially in a small dataset.
  • 18. Results • For different source sentence length • Improvement is significant for long sentences.
  • 19. Results • Reconstruction performance (BLEU) • Huge improvement from baseline models, especially in En->Fr-En(S)
  • 21. Future extensions & words • Application in other domains • Generalization of dual learning • Dual -> Triple -> … -> n-loop • Learn from scratch • only with monolingual data • maybe plus lexical dictionary Application Primal task Dual task Speech processing Speech recognition Text to speech Image understanding Image captioning Image generation Conversation engine Question Response Search engine Search Query/Keyword suggestion
  • 22. Summary • What • Introduce “Dual learning algorithm” to utilize monolingual data • Results • With 100% data, the model outperforms the baseline models • With 10% data, the model shows the comparable result with the baseline models • Future • Dual learning mechanism can be applied to other domains • Learn from scratch
  • 23. Some notes • Dual Learning does not learn word-to-word correspondences? • Training from bilingual data is a must? • Or lexical dictionary