SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Confidential & ProprietaryConfidential & Proprietary
Moving to Neural Machine
Translation at Google
Xiaobing Liu, Google Brain Team
11/28/2017
Growing Use of Deep Learning at Google
Android
Apps
GMail
Image Understanding
Maps
NLP
Photos
Speech
Translation
many research uses..
YouTube
… many others ...
Across many products/areas:
# of directories containing model description files
Confidential & Proprietary
Why we care about translations
● 50% of Internet content
is in English.
● Only 20% of the world’s
population speaks
English.
To make the world’s information accessible, we need
translations
Confidential & Proprietary
Google Translate, a truly global product...
1B+
103
Monthly active users
Translations every single day, that is 140 Billion Words
Google Translate Languages cover 99% of online
population
1B+
Confidential & Proprietary
Agenda
● Quick History
● From Sequence to Sequence-to-Sequence Models
● BNMT (Brain Neural Machine Translation)
○ Architecture & Training
○ TPU and Quantization
● Multilingual Models
● What’s next?
Confidential & Proprietary
Quick Research History
● Various people at Google tried to improve translation with neural networks
○ Brain team, Translate team
Confidential & Proprietary
Quick Research History
● Various people at Google tried to improve translation with neural networks
○ Brain team, Translate team
● Sequence-To-Sequence models (NIPS 2014)
○ Based on many earlier approaches to estimate P(Y|X) directly
○ State-of-the-art on WMT En->Fr using custom software, very long training
○ Translation could be learned without explicit alignment!
○ Drawback: all information needs to be carried in internal state
■ Translation breaks down for long sentences!
Confidential & Proprietary
Quick Research History
● Various people at Google tried to improve translation with neural networks
○ Brain team, Translate team
● Sequence-To-Sequence models (NIPS 2014)
○ Based on many earlier approaches to estimate P(Y|X) directly
○ State-of-the-art on WMT En->Fr using custom software, very long training
○ Translation could be learned without explicit alignment!
○ Drawback: all information needs to be carried in internal state
■ Translation breaks down for long sentences!
● Attention Models (2014)
○ Removes drawback by giving access to all encoder states
■ Translation quality is now independent of sentence length!
Confidential & Proprietary
Preprocess
Neural Network
Old: Phrase-based translation New: Neural machine translation
● Lots of individual pieces
● Optimized somewhat independently
● End-to-end learning
● Simpler architecture
● Plus results are much better!
Confidential & Proprietary
Expected time to launch:
Actual time to launch:
3 years
13.5 months
Sept 2015:
Began project
using
TensorFlow
Feb 2016:
First
production
data results
Sept 2016:
zh->en
launched
Nov 2016:
8 languages
launched
(16 pairs to/from
English)
Mar 2017:
7 more
launched
(Hindi, Russian,
Vietnamese, Thai,
Polish, Arabic,
Hebrew)
Apr 2017:
26 more
launched
(16 European, 8 Indish,
Indonesian, Afrikaans)
Jun/Aug 2017:
36/20 more
launched
97 launched!
Original
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the
highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,”
the House of God. Close to the western summit there is the dried and frozen
carcass of a leopard. No one has explained what the leopard was seeking at that
altitude.
Original
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the
highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,”
the House of God. Close to the western summit there is the dried and frozen
carcass of a leopard. No one has explained what the leopard was seeking at that
altitude.
Back translation from Japanese (old)
Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that
the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai
language, has been referred to as the house of God. The top close to the west,
there is a dry, frozen carcass of a leopard. Whether the leopard had what the
demand at that altitude, there is no that nobody explained.
Original
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the
highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,”
the House of God. Close to the western summit there is the dried and frozen
carcass of a leopard. No one has explained what the leopard was seeking at that
altitude.
Back translation from Japanese (old)
Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that
the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai
language, has been referred to as the house of God. The top close to the west,
there is a dry, frozen carcass of a leopard. Whether the leopard had what the
demand at that altitude, there is no that nobody explained.
Back translation from Japanese (new)
Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be
the highest mountain in Africa. The summit of the west is called “Ngaje Ngai”
God‘s house in Masai language. There is a dried and frozen carcass of a leopard
near the summit of the west. No one can explain what the leopard was seeking
at that altitude.
Confidential & Proprietary
0
△TranslationQuality
+
Significant
change &
launchable
Chinese to
English
Almost all
language pairs
>0.1
>0.5
+0.6
Translation Quality
Zh/Ja/Ko/Tr to
English
0.6-1.5
0 = Worst translation
6 = Perfect translation
TranslationQuality
● Asian languages improved the most
● Some improvements as big as last 10 years of
improvements combined
neural (GNMT)
phrase-based (PBMT)
English
>
Spanish
English
>
French
English
>
Chinese
Spanish
>
English
French
>
English
Chinese
>
English
Translation model
Translationquality
0
1
2
3
4
5
6
human
perfect translation
Neural Machine Translation
Closes gap between old system
and human-quality translation
by 58% to 87%
Enables better communication
across the world
research.googleblog.com/2016/09/a-neural-network-for-machine.html
Confidential & Proprietary
+75%Increase in daily English - Korean
translations on Android over
the past six months
Does quality matter?
Confidential & Proprietary
Neural Recurrent Sequence Models
● Predict next token: P(Y) = P(Y1) * P(Y2|Y1) * P(Y3|Y1,Y2) * ...
○ Language Models, state-of-the-art on public benchmark
■ Exploring the limits of language modeling
EOS Y1 Y2 Y3
Y1 Y2 Y3 EOS
Confidential & Proprietary
Sequence to Sequence
● Learn to map: X1, X2, EOS -> Y1, Y2, Y3, EOS
● Encoder/Decoder framework (decoder by itself just neural LM)
● Theoretically any sequence length for input/output works
X1 X2 EOS Y1 Y2 Y3
Y1 Y2 Y3 EOS
Confidential & Proprietary
Encoder LSTMs
X3 X2 </s>
Decoder LSTMs
<s> Y1 Y3
SoftMax
Y1 Y2 </s>
Deep Sequence to Sequence
Confidential & Proprietary
Attention Mechanism
● Addresses the information bottleneck problem
○ All encoder states accessible instead of only final one
Sj
Ti
eij
Confidential & Proprietary
BNMT Model Architecture
Confidential & Proprietary
Language Translation: Model Details
8 Encoder LSTMs
1 Backwards LSTM
● Runs parallel with
1st Encoder LSTM8x
…
Source
Embeddings
Encoder
LSTMs
Backward
LSTM
Confidential & Proprietary
Language Translation: Model Details
8 Encoder LSTMs
1 Backwards LSTM
● Runs parallel with
1st Encoder LSTM
Unrolled ‘x’ steps
● x: # words in
source language
● Average value: 28
8x
…
Source
Embeddings
Encoder
LSTMs
Backward
LSTM
Unroll
‘x’ Steps
Confidential & Proprietary
Language Translation: Model Details
8 Decoder LSTMs
Attention model
● Runs parallel with 8
decoder LSTMs
● Output: broadcast to
decoder LSTMs
● Input: previous step
of 1st decoder LSTM
8x
…
8x
…
Source
Embeddings
Target
Embeddings
Encoder
LSTMs
Decoder
LSTMs
Attention
Model
Backward
LSTM
Unroll
‘x’ Steps
Confidential & Proprietary
Language Translation: Model Details
8 Decoder LSTMs
Attention model
● Runs parallel with 8
decoder LSTMs
● Output: broadcast to
decoder LSTMs
● Input: previous step
of 1st decoder LSTM
Unrolled ‘x’ steps
8x
…
8x
…
Source
Embeddings
Target
Embeddings
Encoder
LSTMs
Decoder
LSTMs
Attention
Model
Backward
LSTM
Unroll
‘x’ Steps
Unroll
‘x’ Steps
Confidential & Proprietary
Language Translation: Model Details
Softmax layer
● Computes the
target classes
8x
…
8x
…
Source
Embeddings
Target
Embeddings
Encoder
LSTMs
Decoder
LSTMs
Attention
Model
Softmax
Backward
LSTM
Unroll
‘x’ Steps
Unroll
‘x’ Steps
Confidential & Proprietary
Attention Mechanism
● Solves information bottleneck problem
Confidential & Proprietary
Model Training
● Runs on ~100 GPUs (12 replicas, 8 GPUs each)
○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
Confidential & Proprietary
Model Training
● Runs on ~100 GPUs (12 replicas, 8 GPUs each)
○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
● Optimization
○ Combination of Adam & SGD with delayed exponential decay
○ 128/256 sentence pairs combined into one batch (run in one ‘step’)
Confidential & Proprietary
Model Training
● Runs on ~100 GPUs (12 replicas, 8 GPUs each)
○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
● Optimization
○ Combination of Adam & SGD with delayed exponential decay
○ 128/256 sentence pairs combined into one batch (run in one ‘step’)
● Training time
○ ~1 week for 2.5M steps = ~300M sentence pairs
○ For example, on English->French we use only 15% of available data!
Model + Data Parallelism
...
Params
Many
replicas
Parameters
distributed across
many parameter
server machines
Params Params
...
Confidential & Proprietary
Speed matters. A lot.
10 0.2seconds/
sentence
seconds/
sentence
2 months
● Users care about speed
● Better algorithms and hardware (TPUs) made it possible
Confidential & Proprietary
Latency: BNMT versus PBMT (old system)
Confidential & Proprietary
Speed-up
This does not scale!
TRAINING
TIME
2-3
weeks per
model
TRAINING
DATA
100M+
per model
MODELS
~1032
Multilingual models
English->French English->Spanish French->English Spanish->EnglishMultilingual Model
Confidential & Proprietary
Multilingual Model
● Model several language pairs in single model
○ We ran first experiments in 2/2016, surprisingly this worked
Confidential & Proprietary
Multilingual Model
● Model several language pairs in single model
○ We ran first experiments in 2/2016, surprisingly this worked!
● Prepend source with additional token to indicate target language
○ Translate to Spanish:
■ <2es> How are you </s> -> Cómo estás </s>
○ Translate to English:
■ <2en> Como estás </s> -> How are you </s>
Confidential & Proprietary
Multilingual Model
● Model several language pairs in single model
○ We ran first experiments in 2/2016, surprisingly this worked!
● Prepend source with additional token to indicate target language
○ Translate to Spanish:
■ <2es> How are you </s> -> Cómo estás </s>
○ Translate to English:
■ <2en> Cómo estás </s> -> How are you </s>
● No other changes to model architecture!
○ Extremely simple and effective
○ Usually with shared WPM for source/target
Can we do more?
Zero-Shot Translation
Confidential & Proprietary
Multilingual Model and Zero-Shot Translation
En Es
Es En
Single Multi
34.5 35.1
38.0 37.3
Translation:
<2es> How are you </s> Cómo estás </s>
<2en> Cómo estás </s> How are you </s>
1.
Confidential & Proprietary
Multilingual Model and Zero-Shot Translation
En Es
Es En
Single Multi
34.5 35.1
38.0 37.3
34.5 35.0
44.5 43.7
En Es
Pt En
En Es
Pt En
23.0 BLEU
Translation:
<2es> How are you </s> Cómo estás </s>
<2en> Cómo estás </s> How are you </s>
Zero-shot (pt->es):
<2es> Como você está </s> Cómo estás </s>
1.
2.
Confidential & Proprietary
Multilingual Model and Zero-Shot Translation
En Es
Es En
Single Multi
34.5 35.1
38.0 37.3
34.5 35.0
44.5 43.7
34.5 34.9
38.0 37.2
37.1 37.8
44.5 43.7
En Es
Pt En
En Es
Es En
En Pt
Pt En
En Es
Pt En
En Es
Es En
En Pt
Pt En
23.0 BLEU
24.0 BLEU
Translation:
<2es> How are you </s> Cómo estás </s>
<2en> Cómo estás </s> How are you </s>
Zero-shot (pt->es):
<2es> Como você está </s> Cómo estás </s>
1.
2.
3.
Interlingua?
Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
● Short rare queries
○ “deoxyribonucleic acid” in Japanese ?
○ Should be easy but isn’t yet
Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
● Short rare queries
○ “deoxyribonucleic acid” in Japanese ?
○ Should be easy but isn’t yet
● Transliteration of names
○ Eichelbergertown -> アイセルベルクタウン
Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
● Short rare queries
○ “deoxyribonucleic acid” in Japanese ?
○ Should be easy but isn’t yet
● Transliteration of names
○ Eichelbergertown -> アイセルベルクタウン
● Junk
○ xxx -> 牛津词典 (Oxford dictionary)
○ The cat is a good computer. -> 的英语翻译 (of the English language?)
○ Many sentences containing news started with “Reuters”
Confidential & Proprietary
Open Research Problems
● Use of context
○ Full document translation, streaming translation
○ Use other modalities & features
Confidential & Proprietary
Open Research Problems
● Use of context
○ Full document translation, streaming translation
○ Use other modalities & features
● Better automatic measures & objective functions
○ Current BLEU score weighs all words the same regardless of meaning
■ ‘president’ mostly more important than ‘the’
○ Discriminative training
■ Training with Maximum Likelihood produces mismatched training/test procedure!
● No decoding errors for maximum-likelihood training
■ RL (and similar) already running but no significant enough gains yet
● Humans see no difference in the results
Confidential & Proprietary
Open Research Problems
● Use of context
○ Full document translation, streaming translation
○ Use other modalities & features
● Better automatic measures & objective functions
○ Current BLEU score weighs all words the same regardless of meaning
■ ‘president’ mostly more important than ‘the’
○ Discriminative training
■ Training with Maximum Likelihood produces mismatched training/test procedure!
● No decoding errors for maximum-likelihood training
■ RL (and similar) already running but no significant enough gains yet
● Humans see no difference in the results
● Lots of improvements are boring to do!
○ Because they are incremental (but still have to be done)
○ Data cleaning, new test sets etc.
Confidential & Proprietary
What’s next from research?
● Convolutional sequence-to-sequence models
○ No recurrency, just windows over input with shared parameters
○ Encoder can be computed in parallel => faster
Per-Example Routing
Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-Experts Layer,
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le & Jeff Dean
Submitted to ICLR 2017, https://openreview.net/pdf?id=B1ckMDqlg
Per-Example Routing
Attention Is All You Need
Transformer: autoregressive pure attention model.
Previous-output attention: masked batch matmul
Also great on parsing, try it on your task!
Translation Model Training time BLEU (diff. from MOSES)
Transformer (large) 3 days on 8 GPU 28.4 (+7.8)
Transformer (small) 1 day on 1 GPU 24.9 (+4.3)
GNMT + Mixture of Experts 1 day on 64 GPUs 26.0 (+5.4)
ConvS2S (FB) 18 days on 1 GPU 25.1 (+4.5)
GNMT 1 day on 96 GPUs 24.6 (+4.0)
MOSES (phrase-based) N/A 20.6 (+0.0)
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.
Gomez, Lukasz Kaiser, Illia Polosukhin, https://arxiv.org/abs/1706.03762
Confidential & Proprietary
BNMT for other projects
Other projects using same codebase for completely different problems (in search,
Google Assistant, …)
● Question/answering system (chat bots)
● Summarization
● Dialog modeling
● Generate question from query
● …
Confidential & Proprietary
Resources
● TensorFlow (www.tensorflow.org)
○ Code/Bugs on GitHub
○ Help on StackOverflow
○ Discussion on mailing list
● All information about BNMT is in these papers & blog posts
○ Google's Neural Machine Translation System: Bridging the Gap between Human and Machine
Translation
○ Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
● NYT article describes some of the development
○ The Great AI Awakening
● Internship & Residency
○ 3 months internships possible
○ 1-year residency program g.co/brainresidency
Confidential & Proprietary
Questions?
Thank you!
xbing@google.com
g.co/brain
Confidential & Proprietary
Decoding Sequence Models
● Find the N-best highest probability output sequences
○ Take K-best Y1 and feed them one-by-one, generating K hypotheses
○ Take K-best Y2 for each of the hyps, generating K^2 new hyps (tree) etc.
○ At each step, cut hyps to N-best (or by score) until at end
Example N=2, K=2
EOS Y1a,
Y1b
Y2a,
Y2b
Y3a,
Y2b
Y1a,
Y1b
Y2a,
Y2b
Y3a,
Y2b EOS
1. Y1a Y2a …
2. Y1a Y2b …
3. Y1b Y2a …
4. Y1b Y2b ...
Confidential & Proprietary
Sampling from Sequence Models
● Generate samples of sequences
a. Generate probability distribution P(Y1)
b. Sample from P(Y1) according to its probabilities
c. Feed in found sample as input, goto a)
EOS Y1 Y2 Y3
Y1 Y2 Y3 EOS

Contenu connexe

Tendances

Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translateNeural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translatesotanemoto
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
[Impl] neural machine translation
[Impl] neural machine translation[Impl] neural machine translation
[Impl] neural machine translationJaeHo Jang
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahniHitesh Wagle
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...Hayahide Yamagishi
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Vimukthi Wickramasinghe
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMakerSuman Debnath
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Intro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsIntro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsAkhil Kaushik
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 

Tendances (19)

Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translateNeural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translate
 
BERT
BERTBERT
BERT
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
[Impl] neural machine translation
[Impl] neural machine translation[Impl] neural machine translation
[Impl] neural machine translation
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahni
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
Bert
BertBert
Bert
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Intro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsIntro to Data Structure & Algorithms
Intro to Data Structure & Algorithms
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 

Similaire à Moving to neural machine translation at google - gopro-meetup

Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPHendrik D'Oosterlinck
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageLuciano Sabença
 
building intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningbuilding intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningmustafa sarac
 
Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANDeep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANTAUS - The Language Data Network
 
GOOGLE Translate.pptx
GOOGLE Translate.pptxGOOGLE Translate.pptx
GOOGLE Translate.pptxchriss776676
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMTJulia Kreutzer
 
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebula Project
 
The REAL Angular Keynote
The REAL Angular KeynoteThe REAL Angular Keynote
The REAL Angular KeynoteLukas Ruebbelke
 
Beyond the Hype: 4 Years of Go in Production
Beyond the Hype: 4 Years of Go in ProductionBeyond the Hype: 4 Years of Go in Production
Beyond the Hype: 4 Years of Go in ProductionC4Media
 
The Flow of TensorFlow
The Flow of TensorFlowThe Flow of TensorFlow
The Flow of TensorFlowJeongkyu Shin
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationIconic Translation Machines
 

Similaire à Moving to neural machine translation at google - gopro-meetup (20)

Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
Python overview
Python overviewPython overview
Python overview
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-Language
 
building intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningbuilding intelligent systems with large scale deep learning
building intelligent systems with large scale deep learning
 
Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANDeep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
 
GOOGLE Translate.pptx
GOOGLE Translate.pptxGOOGLE Translate.pptx
GOOGLE Translate.pptx
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
 
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
 
The REAL Angular Keynote
The REAL Angular KeynoteThe REAL Angular Keynote
The REAL Angular Keynote
 
Beyond the Hype: 4 Years of Go in Production
Beyond the Hype: 4 Years of Go in ProductionBeyond the Hype: 4 Years of Go in Production
Beyond the Hype: 4 Years of Go in Production
 
Linux Sucks
Linux SucksLinux Sucks
Linux Sucks
 
Linux Sucks
Linux SucksLinux Sucks
Linux Sucks
 
The Flow of TensorFlow
The Flow of TensorFlowThe Flow of TensorFlow
The Flow of TensorFlow
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine Translation
 

Plus de Chester Chen

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfChester Chen
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdfChester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?Chester Chen
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataChester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleChester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapChester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bigheadChester Chen
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in sparkChester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 

Plus de Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 

Dernier

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 

Dernier (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 

Moving to neural machine translation at google - gopro-meetup

  • 1. Confidential & ProprietaryConfidential & Proprietary Moving to Neural Machine Translation at Google Xiaobing Liu, Google Brain Team 11/28/2017
  • 2. Growing Use of Deep Learning at Google Android Apps GMail Image Understanding Maps NLP Photos Speech Translation many research uses.. YouTube … many others ... Across many products/areas: # of directories containing model description files
  • 3. Confidential & Proprietary Why we care about translations ● 50% of Internet content is in English. ● Only 20% of the world’s population speaks English. To make the world’s information accessible, we need translations
  • 4. Confidential & Proprietary Google Translate, a truly global product... 1B+ 103 Monthly active users Translations every single day, that is 140 Billion Words Google Translate Languages cover 99% of online population 1B+
  • 5. Confidential & Proprietary Agenda ● Quick History ● From Sequence to Sequence-to-Sequence Models ● BNMT (Brain Neural Machine Translation) ○ Architecture & Training ○ TPU and Quantization ● Multilingual Models ● What’s next?
  • 6. Confidential & Proprietary Quick Research History ● Various people at Google tried to improve translation with neural networks ○ Brain team, Translate team
  • 7. Confidential & Proprietary Quick Research History ● Various people at Google tried to improve translation with neural networks ○ Brain team, Translate team ● Sequence-To-Sequence models (NIPS 2014) ○ Based on many earlier approaches to estimate P(Y|X) directly ○ State-of-the-art on WMT En->Fr using custom software, very long training ○ Translation could be learned without explicit alignment! ○ Drawback: all information needs to be carried in internal state ■ Translation breaks down for long sentences!
  • 8. Confidential & Proprietary Quick Research History ● Various people at Google tried to improve translation with neural networks ○ Brain team, Translate team ● Sequence-To-Sequence models (NIPS 2014) ○ Based on many earlier approaches to estimate P(Y|X) directly ○ State-of-the-art on WMT En->Fr using custom software, very long training ○ Translation could be learned without explicit alignment! ○ Drawback: all information needs to be carried in internal state ■ Translation breaks down for long sentences! ● Attention Models (2014) ○ Removes drawback by giving access to all encoder states ■ Translation quality is now independent of sentence length!
  • 9. Confidential & Proprietary Preprocess Neural Network Old: Phrase-based translation New: Neural machine translation ● Lots of individual pieces ● Optimized somewhat independently ● End-to-end learning ● Simpler architecture ● Plus results are much better!
  • 10. Confidential & Proprietary Expected time to launch: Actual time to launch: 3 years 13.5 months Sept 2015: Began project using TensorFlow Feb 2016: First production data results Sept 2016: zh->en launched Nov 2016: 8 languages launched (16 pairs to/from English) Mar 2017: 7 more launched (Hindi, Russian, Vietnamese, Thai, Polish, Arabic, Hebrew) Apr 2017: 26 more launched (16 European, 8 Indish, Indonesian, Afrikaans) Jun/Aug 2017: 36/20 more launched 97 launched!
  • 11. Original Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude.
  • 12. Original Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude. Back translation from Japanese (old) Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.
  • 13. Original Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude. Back translation from Japanese (old) Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained. Back translation from Japanese (new) Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude.
  • 14. Confidential & Proprietary 0 △TranslationQuality + Significant change & launchable Chinese to English Almost all language pairs >0.1 >0.5 +0.6 Translation Quality Zh/Ja/Ko/Tr to English 0.6-1.5 0 = Worst translation 6 = Perfect translation TranslationQuality ● Asian languages improved the most ● Some improvements as big as last 10 years of improvements combined
  • 15. neural (GNMT) phrase-based (PBMT) English > Spanish English > French English > Chinese Spanish > English French > English Chinese > English Translation model Translationquality 0 1 2 3 4 5 6 human perfect translation Neural Machine Translation Closes gap between old system and human-quality translation by 58% to 87% Enables better communication across the world research.googleblog.com/2016/09/a-neural-network-for-machine.html
  • 16. Confidential & Proprietary +75%Increase in daily English - Korean translations on Android over the past six months Does quality matter?
  • 17. Confidential & Proprietary Neural Recurrent Sequence Models ● Predict next token: P(Y) = P(Y1) * P(Y2|Y1) * P(Y3|Y1,Y2) * ... ○ Language Models, state-of-the-art on public benchmark ■ Exploring the limits of language modeling EOS Y1 Y2 Y3 Y1 Y2 Y3 EOS
  • 18. Confidential & Proprietary Sequence to Sequence ● Learn to map: X1, X2, EOS -> Y1, Y2, Y3, EOS ● Encoder/Decoder framework (decoder by itself just neural LM) ● Theoretically any sequence length for input/output works X1 X2 EOS Y1 Y2 Y3 Y1 Y2 Y3 EOS
  • 19. Confidential & Proprietary Encoder LSTMs X3 X2 </s> Decoder LSTMs <s> Y1 Y3 SoftMax Y1 Y2 </s> Deep Sequence to Sequence
  • 20. Confidential & Proprietary Attention Mechanism ● Addresses the information bottleneck problem ○ All encoder states accessible instead of only final one Sj Ti eij
  • 21. Confidential & Proprietary BNMT Model Architecture
  • 22. Confidential & Proprietary Language Translation: Model Details 8 Encoder LSTMs 1 Backwards LSTM ● Runs parallel with 1st Encoder LSTM8x … Source Embeddings Encoder LSTMs Backward LSTM
  • 23. Confidential & Proprietary Language Translation: Model Details 8 Encoder LSTMs 1 Backwards LSTM ● Runs parallel with 1st Encoder LSTM Unrolled ‘x’ steps ● x: # words in source language ● Average value: 28 8x … Source Embeddings Encoder LSTMs Backward LSTM Unroll ‘x’ Steps
  • 24. Confidential & Proprietary Language Translation: Model Details 8 Decoder LSTMs Attention model ● Runs parallel with 8 decoder LSTMs ● Output: broadcast to decoder LSTMs ● Input: previous step of 1st decoder LSTM 8x … 8x … Source Embeddings Target Embeddings Encoder LSTMs Decoder LSTMs Attention Model Backward LSTM Unroll ‘x’ Steps
  • 25. Confidential & Proprietary Language Translation: Model Details 8 Decoder LSTMs Attention model ● Runs parallel with 8 decoder LSTMs ● Output: broadcast to decoder LSTMs ● Input: previous step of 1st decoder LSTM Unrolled ‘x’ steps 8x … 8x … Source Embeddings Target Embeddings Encoder LSTMs Decoder LSTMs Attention Model Backward LSTM Unroll ‘x’ Steps Unroll ‘x’ Steps
  • 26. Confidential & Proprietary Language Translation: Model Details Softmax layer ● Computes the target classes 8x … 8x … Source Embeddings Target Embeddings Encoder LSTMs Decoder LSTMs Attention Model Softmax Backward LSTM Unroll ‘x’ Steps Unroll ‘x’ Steps
  • 27. Confidential & Proprietary Attention Mechanism ● Solves information bottleneck problem
  • 28. Confidential & Proprietary Model Training ● Runs on ~100 GPUs (12 replicas, 8 GPUs each) ○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
  • 29. Confidential & Proprietary Model Training ● Runs on ~100 GPUs (12 replicas, 8 GPUs each) ○ Because softmax size only 32k, can be fully calculated (no sampling or HSM) ● Optimization ○ Combination of Adam & SGD with delayed exponential decay ○ 128/256 sentence pairs combined into one batch (run in one ‘step’)
  • 30. Confidential & Proprietary Model Training ● Runs on ~100 GPUs (12 replicas, 8 GPUs each) ○ Because softmax size only 32k, can be fully calculated (no sampling or HSM) ● Optimization ○ Combination of Adam & SGD with delayed exponential decay ○ 128/256 sentence pairs combined into one batch (run in one ‘step’) ● Training time ○ ~1 week for 2.5M steps = ~300M sentence pairs ○ For example, on English->French we use only 15% of available data!
  • 31. Model + Data Parallelism ... Params Many replicas Parameters distributed across many parameter server machines Params Params ...
  • 32. Confidential & Proprietary Speed matters. A lot. 10 0.2seconds/ sentence seconds/ sentence 2 months ● Users care about speed ● Better algorithms and hardware (TPUs) made it possible
  • 33. Confidential & Proprietary Latency: BNMT versus PBMT (old system)
  • 35. This does not scale! TRAINING TIME 2-3 weeks per model TRAINING DATA 100M+ per model MODELS ~1032
  • 36. Multilingual models English->French English->Spanish French->English Spanish->EnglishMultilingual Model
  • 37. Confidential & Proprietary Multilingual Model ● Model several language pairs in single model ○ We ran first experiments in 2/2016, surprisingly this worked
  • 38. Confidential & Proprietary Multilingual Model ● Model several language pairs in single model ○ We ran first experiments in 2/2016, surprisingly this worked! ● Prepend source with additional token to indicate target language ○ Translate to Spanish: ■ <2es> How are you </s> -> Cómo estás </s> ○ Translate to English: ■ <2en> Como estás </s> -> How are you </s>
  • 39. Confidential & Proprietary Multilingual Model ● Model several language pairs in single model ○ We ran first experiments in 2/2016, surprisingly this worked! ● Prepend source with additional token to indicate target language ○ Translate to Spanish: ■ <2es> How are you </s> -> Cómo estás </s> ○ Translate to English: ■ <2en> Cómo estás </s> -> How are you </s> ● No other changes to model architecture! ○ Extremely simple and effective ○ Usually with shared WPM for source/target
  • 40. Can we do more?
  • 42. Confidential & Proprietary Multilingual Model and Zero-Shot Translation En Es Es En Single Multi 34.5 35.1 38.0 37.3 Translation: <2es> How are you </s> Cómo estás </s> <2en> Cómo estás </s> How are you </s> 1.
  • 43. Confidential & Proprietary Multilingual Model and Zero-Shot Translation En Es Es En Single Multi 34.5 35.1 38.0 37.3 34.5 35.0 44.5 43.7 En Es Pt En En Es Pt En 23.0 BLEU Translation: <2es> How are you </s> Cómo estás </s> <2en> Cómo estás </s> How are you </s> Zero-shot (pt->es): <2es> Como você está </s> Cómo estás </s> 1. 2.
  • 44. Confidential & Proprietary Multilingual Model and Zero-Shot Translation En Es Es En Single Multi 34.5 35.1 38.0 37.3 34.5 35.0 44.5 43.7 34.5 34.9 38.0 37.2 37.1 37.8 44.5 43.7 En Es Pt En En Es Es En En Pt Pt En En Es Pt En En Es Es En En Pt Pt En 23.0 BLEU 24.0 BLEU Translation: <2es> How are you </s> Cómo estás </s> <2en> Cómo estás </s> How are you </s> Zero-shot (pt->es): <2es> Como você está </s> Cómo estás </s> 1. 2. 3.
  • 46. Confidential & Proprietary Challenges ● Early cutoff ○ Cuts off or drops some words in source sentence
  • 47. Confidential & Proprietary Challenges ● Early cutoff ○ Cuts off or drops some words in source sentence ● Broken dates/numbers ○ 5 days ago -> 6일 전 ○ (but, on average BNMT significantly better than PBMT on number expressions!)
  • 48. Confidential & Proprietary Challenges ● Early cutoff ○ Cuts off or drops some words in source sentence ● Broken dates/numbers ○ 5 days ago -> 6일 전 ○ (but, on average BNMT significantly better than PBMT on number expressions!) ● Short rare queries ○ “deoxyribonucleic acid” in Japanese ? ○ Should be easy but isn’t yet
  • 49. Confidential & Proprietary Challenges ● Early cutoff ○ Cuts off or drops some words in source sentence ● Broken dates/numbers ○ 5 days ago -> 6일 전 ○ (but, on average BNMT significantly better than PBMT on number expressions!) ● Short rare queries ○ “deoxyribonucleic acid” in Japanese ? ○ Should be easy but isn’t yet ● Transliteration of names ○ Eichelbergertown -> アイセルベルクタウン
  • 50. Confidential & Proprietary Challenges ● Early cutoff ○ Cuts off or drops some words in source sentence ● Broken dates/numbers ○ 5 days ago -> 6일 전 ○ (but, on average BNMT significantly better than PBMT on number expressions!) ● Short rare queries ○ “deoxyribonucleic acid” in Japanese ? ○ Should be easy but isn’t yet ● Transliteration of names ○ Eichelbergertown -> アイセルベルクタウン ● Junk ○ xxx -> 牛津词典 (Oxford dictionary) ○ The cat is a good computer. -> 的英语翻译 (of the English language?) ○ Many sentences containing news started with “Reuters”
  • 51. Confidential & Proprietary Open Research Problems ● Use of context ○ Full document translation, streaming translation ○ Use other modalities & features
  • 52. Confidential & Proprietary Open Research Problems ● Use of context ○ Full document translation, streaming translation ○ Use other modalities & features ● Better automatic measures & objective functions ○ Current BLEU score weighs all words the same regardless of meaning ■ ‘president’ mostly more important than ‘the’ ○ Discriminative training ■ Training with Maximum Likelihood produces mismatched training/test procedure! ● No decoding errors for maximum-likelihood training ■ RL (and similar) already running but no significant enough gains yet ● Humans see no difference in the results
  • 53. Confidential & Proprietary Open Research Problems ● Use of context ○ Full document translation, streaming translation ○ Use other modalities & features ● Better automatic measures & objective functions ○ Current BLEU score weighs all words the same regardless of meaning ■ ‘president’ mostly more important than ‘the’ ○ Discriminative training ■ Training with Maximum Likelihood produces mismatched training/test procedure! ● No decoding errors for maximum-likelihood training ■ RL (and similar) already running but no significant enough gains yet ● Humans see no difference in the results ● Lots of improvements are boring to do! ○ Because they are incremental (but still have to be done) ○ Data cleaning, new test sets etc.
  • 54. Confidential & Proprietary What’s next from research? ● Convolutional sequence-to-sequence models ○ No recurrency, just windows over input with shared parameters ○ Encoder can be computed in parallel => faster
  • 56. Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-Experts Layer, Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le & Jeff Dean Submitted to ICLR 2017, https://openreview.net/pdf?id=B1ckMDqlg Per-Example Routing
  • 57. Attention Is All You Need Transformer: autoregressive pure attention model. Previous-output attention: masked batch matmul Also great on parsing, try it on your task! Translation Model Training time BLEU (diff. from MOSES) Transformer (large) 3 days on 8 GPU 28.4 (+7.8) Transformer (small) 1 day on 1 GPU 24.9 (+4.3) GNMT + Mixture of Experts 1 day on 64 GPUs 26.0 (+5.4) ConvS2S (FB) 18 days on 1 GPU 25.1 (+4.5) GNMT 1 day on 96 GPUs 24.6 (+4.0) MOSES (phrase-based) N/A 20.6 (+0.0) Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, https://arxiv.org/abs/1706.03762
  • 58. Confidential & Proprietary BNMT for other projects Other projects using same codebase for completely different problems (in search, Google Assistant, …) ● Question/answering system (chat bots) ● Summarization ● Dialog modeling ● Generate question from query ● …
  • 59. Confidential & Proprietary Resources ● TensorFlow (www.tensorflow.org) ○ Code/Bugs on GitHub ○ Help on StackOverflow ○ Discussion on mailing list ● All information about BNMT is in these papers & blog posts ○ Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation ○ Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation ● NYT article describes some of the development ○ The Great AI Awakening ● Internship & Residency ○ 3 months internships possible ○ 1-year residency program g.co/brainresidency
  • 60. Confidential & Proprietary Questions? Thank you! xbing@google.com g.co/brain
  • 61. Confidential & Proprietary Decoding Sequence Models ● Find the N-best highest probability output sequences ○ Take K-best Y1 and feed them one-by-one, generating K hypotheses ○ Take K-best Y2 for each of the hyps, generating K^2 new hyps (tree) etc. ○ At each step, cut hyps to N-best (or by score) until at end Example N=2, K=2 EOS Y1a, Y1b Y2a, Y2b Y3a, Y2b Y1a, Y1b Y2a, Y2b Y3a, Y2b EOS 1. Y1a Y2a … 2. Y1a Y2b … 3. Y1b Y2a … 4. Y1b Y2b ...
  • 62. Confidential & Proprietary Sampling from Sequence Models ● Generate samples of sequences a. Generate probability distribution P(Y1) b. Sample from P(Y1) according to its probabilities c. Feed in found sample as input, goto a) EOS Y1 Y2 Y3 Y1 Y2 Y3 EOS