SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Seq2seq
...and beyond
Hello!I am Roberto Silveira
EE engineer, ML enthusiast
rsilveira79@gmail.com
@rsilveira79
Sequence
Is a matter of time
RNN
Is what you need!
Basic Recurrent cells (RNN)
Source: http://colah.github.io/
Issues
× Difficulties to deal with long term
dependencies
× Difficult to train - vanish gradient issues
Long term issues
Source: http://colah.github.io/,
CS224d notes
Sentence 1
"Jane walked into the room. John walked
in too. Jane said hi to ___"
Sentence 2
"Jane walked into the room. John walked in
too. It was late in the day, and everyone was
walking home after a long day at work. Jane
said hi to ___"
LSTM in 2 min...
Review
× Address long term dependencies
× More complex to train
× Very powerful, lots of data
Source: http://colah.github.io/
LSTM in 2 min...
Review
× Address long term dependencies
× More complex to train
× Very powerful, lots of data
Cell state
Source: http://colah.github.io/
Forget
gate
Input
gate
Output
gate
Gated recurrent unit (GRU) in 2 min ...
Review
× Fewer hyperparameters
× Train faster
× Better solution w/ less data
Source: http://www.wildml.com/,
arXiv:1412.3555
Gated recurrent unit (GRU) in 2 min ...
Review
× Fewer hyperparameters
× Train faster
× Better solution w/ less data
Source: http://www.wildml.com/,
arXiv:1412.3555
Reset
gate
Update
gate
Seq2seq
learning
Or encoder-decoder
architectures
Variable size input - output
Source: http://karpathy.github.io/
Variable size input - output
Source: http://karpathy.github.io/
Basic idea
"Variable" size input (encoder) ->
Fixed size vector representation ->
"Variable" size output (decoder)
"Machine",
"Learning",
"is",
"fun"
"Aprendizado",
"de",
"Máquina",
"é",
"divertido"
0.636
0.122
0.981
Input
One word at a time Stateful
Model
Stateful
Model
Encoded
Sequence
Output
One word at a time
First RNN
(Encoder)
Second
RNN
(Decoder)
Memory of previous
word influence next
result
Memory of previous
word influence next
result
Sequence to Sequence Learning with NeuralNetworks (2014)
"Machine",
"Learning",
"is",
"fun"
"Aprendizado",
"de",
"Máquina",
"é",
"divertido"
0.636
0.122
0.981
1000d word
embeddings
4 layers
1000
cells/layer
Encoded
Sequence
LSTM
(Encoder)
LSTM
(Decoder)
Source: arXiv 1409.3215v3
TRAINING → SGD w/o momentum, fixed learning rate of 0.7, 7.5 epochs, batches of 128
sentences, 10 days of training (WMT 14 dataset English to French)
4 layers
1000
cells/layer
Recurrent encoder-decoders
Les chiens aiment les os <EOS> Dogs love bones
Dogs love bones <EOS>
Source Sequence Target Sequence
Source: arXiv 1409.3215v3
Recurrent encoder-decoders
Les chiens aiment les os <EOS> Dogs love bones
Dogs love bones <EOS>
Source: arXiv 1409.3215v3
Recurrent encoder-decoders
Leschiensaimentlesos <EOS> Dogs love bones
Dogs love bones <EOS>
Source: arXiv 1409.3215v3
Source: arXiv 1409.3215v3
Recurrent encoder-decoders - issues
● Difficult to cope with large sentences (longer than training corpus)
● Decoder w/ attention mechanism →relieve encoder to squash into
fixed length vector
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015)
Source: arXiv 1409.0473v7
Decoder
Context vector for
each target word
Weights of each
annotation hj
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015)
Source: arXiv 1409.0473v7
Decoder
Context vector for
each target word
Weights of each
annotation hj
Non-monotonic
alignment
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Dogs
love
+
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Dogs
love
+
love
bones+
Challenges in using the model
● Cannot handle true
variable size input
Source: http://suriyadeepan.github.io/
PADDING
BUCKETING
WORD
EMBEDDINGS
● Capture context
semantic meaning
● Hard to deal with both
short and large sentences
padding
Source: http://suriyadeepan.github.io/
EOS : End of sentence
PAD : Filler
GO : Start decoding
UNK : Unknown; word not in vocabulary
Q : "What time is it? "
A : "It is seven thirty."
Q : [ PAD, PAD, PAD, PAD, PAD, “?”, “it”,“is”, “time”, “What” ]
A : [ GO, “It”, “is”, “seven”, “thirty”, “.”, EOS, PAD, PAD, PAD ]
Source: https://www.tensorflow.org/
bucketing
Efficiently handle sentences of different lengths
Ex: 100 tokens is the largest sentence in corpus
How about short sentences like: "How are you?" → lots of PAD
Bucket list: [(5, 10), (10, 15), (20, 25), (40, 50)]
(defaut on Tensorflow translate.py)
Q : [ PAD, PAD, “.”, “go”,“I”]
A : [GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
Word embeddings (remember previous presentation ;-)
Distributed representations → syntactic and semantic is captured
Take =
0.286
0.792
-0.177
-0.107
0.109
-0.542
0.349
0.271
Word embeddings (remember previous presentation ;-)
Linguistic regularities (recap)
Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation)
Source: arXiv 1406.1078v3
Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation)
Source: arXiv 1406.1078v3
1000d vector representation
applications
Neural conversational model - chatbots
Source: arXiv 1506.05869v3
Google Smart reply
Google Smart reply
Source: arXiv 1606.04870v1
Interesting facts
● Currently responsible for 10% Inbox replies
● Training set 238 million messages
Google Smart reply
Source: arXiv 1606.04870v1
Seq2Seq
model
Interesting facts
● Currently responsible for 10% Inbox replies
● Training set 238 million messages
Feedforward
triggering
model
Semi-supervised
semantic clustering
Image captioning(Paper - Show and Tell: A Neural Image Caption Generator)
Source: arXiv 1411.4555v2
Image captioning(Paper - Show and Tell: A Neural Image Caption Generator)
Encoder
Decoder
Source: arXiv 1411.4555v2
What's next?
And so?
Multi-task sequence to sequence(Paper - MULTI-TASK SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1511.06114v4
One-to-Many
(common encoder)
Many-to-One
(common decoder)
Many-to-Many
Neural programmer(Paper - NEURAL PROGRAMMER: INDUCING LATENT PROGRAMS WITH GRADIENT DESCENT)
Source: arXiv 1511.04834v3
Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1611.02683v1
Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1611.02683v1
Pre-trained
Pre-trained
THANKS!
rsilveira79@gmail.com
@rsilveira79
Place your screenshot here
A Quick
example on
tensorflow

Contenu connexe

Tendances

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 

Tendances (20)

Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 

En vedette (8)

Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Companies
CompaniesCompanies
Companies
 
RF Encoder / Decoder Chipset
RF Encoder / Decoder ChipsetRF Encoder / Decoder Chipset
RF Encoder / Decoder Chipset
 
5g wireless technology
5g wireless technology 5g wireless technology
5g wireless technology
 
Marvel Case Presentation
Marvel Case PresentationMarvel Case Presentation
Marvel Case Presentation
 
Network Architecture of 5G Mobile Tecnology
Network Architecture of 5G Mobile TecnologyNetwork Architecture of 5G Mobile Tecnology
Network Architecture of 5G Mobile Tecnology
 
5G Wireless Technology
5G Wireless Technology5G Wireless Technology
5G Wireless Technology
 
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless TechnologiesPresentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
 

Similaire à Sequence to sequence (encoder-decoder) learning

Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)
Eelco Visser
 

Similaire à Sequence to sequence (encoder-decoder) learning (20)

05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
LSA algorithm
LSA algorithmLSA algorithm
LSA algorithm
 
Tg noh jeju_workshop
Tg noh jeju_workshopTg noh jeju_workshop
Tg noh jeju_workshop
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Java Annotations and Pre-processing
Java  Annotations and Pre-processingJava  Annotations and Pre-processing
Java Annotations and Pre-processing
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
SAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionSAE: Structured Aspect Extraction
SAE: Structured Aspect Extraction
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)
 
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term RewritingCompiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Sequence to sequence (encoder-decoder) learning

  • 2. Hello!I am Roberto Silveira EE engineer, ML enthusiast rsilveira79@gmail.com @rsilveira79
  • 5. Basic Recurrent cells (RNN) Source: http://colah.github.io/ Issues × Difficulties to deal with long term dependencies × Difficult to train - vanish gradient issues
  • 6. Long term issues Source: http://colah.github.io/, CS224d notes Sentence 1 "Jane walked into the room. John walked in too. Jane said hi to ___" Sentence 2 "Jane walked into the room. John walked in too. It was late in the day, and everyone was walking home after a long day at work. Jane said hi to ___"
  • 7. LSTM in 2 min... Review × Address long term dependencies × More complex to train × Very powerful, lots of data Source: http://colah.github.io/
  • 8. LSTM in 2 min... Review × Address long term dependencies × More complex to train × Very powerful, lots of data Cell state Source: http://colah.github.io/ Forget gate Input gate Output gate
  • 9. Gated recurrent unit (GRU) in 2 min ... Review × Fewer hyperparameters × Train faster × Better solution w/ less data Source: http://www.wildml.com/, arXiv:1412.3555
  • 10. Gated recurrent unit (GRU) in 2 min ... Review × Fewer hyperparameters × Train faster × Better solution w/ less data Source: http://www.wildml.com/, arXiv:1412.3555 Reset gate Update gate
  • 12. Variable size input - output Source: http://karpathy.github.io/
  • 13. Variable size input - output Source: http://karpathy.github.io/
  • 14. Basic idea "Variable" size input (encoder) -> Fixed size vector representation -> "Variable" size output (decoder) "Machine", "Learning", "is", "fun" "Aprendizado", "de", "Máquina", "é", "divertido" 0.636 0.122 0.981 Input One word at a time Stateful Model Stateful Model Encoded Sequence Output One word at a time First RNN (Encoder) Second RNN (Decoder) Memory of previous word influence next result Memory of previous word influence next result
  • 15. Sequence to Sequence Learning with NeuralNetworks (2014) "Machine", "Learning", "is", "fun" "Aprendizado", "de", "Máquina", "é", "divertido" 0.636 0.122 0.981 1000d word embeddings 4 layers 1000 cells/layer Encoded Sequence LSTM (Encoder) LSTM (Decoder) Source: arXiv 1409.3215v3 TRAINING → SGD w/o momentum, fixed learning rate of 0.7, 7.5 epochs, batches of 128 sentences, 10 days of training (WMT 14 dataset English to French) 4 layers 1000 cells/layer
  • 16. Recurrent encoder-decoders Les chiens aiment les os <EOS> Dogs love bones Dogs love bones <EOS> Source Sequence Target Sequence Source: arXiv 1409.3215v3
  • 17. Recurrent encoder-decoders Les chiens aiment les os <EOS> Dogs love bones Dogs love bones <EOS> Source: arXiv 1409.3215v3
  • 18. Recurrent encoder-decoders Leschiensaimentlesos <EOS> Dogs love bones Dogs love bones <EOS> Source: arXiv 1409.3215v3
  • 19. Source: arXiv 1409.3215v3 Recurrent encoder-decoders - issues ● Difficult to cope with large sentences (longer than training corpus) ● Decoder w/ attention mechanism →relieve encoder to squash into fixed length vector
  • 20. NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015) Source: arXiv 1409.0473v7 Decoder Context vector for each target word Weights of each annotation hj
  • 21. NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015) Source: arXiv 1409.0473v7 Decoder Context vector for each target word Weights of each annotation hj Non-monotonic alignment
  • 22. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS>
  • 23. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS> Dogs
  • 24. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS> Dogs Dogs love +
  • 25. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS> Dogs Dogs love + love bones+
  • 26. Challenges in using the model ● Cannot handle true variable size input Source: http://suriyadeepan.github.io/ PADDING BUCKETING WORD EMBEDDINGS ● Capture context semantic meaning ● Hard to deal with both short and large sentences
  • 27. padding Source: http://suriyadeepan.github.io/ EOS : End of sentence PAD : Filler GO : Start decoding UNK : Unknown; word not in vocabulary Q : "What time is it? " A : "It is seven thirty." Q : [ PAD, PAD, PAD, PAD, PAD, “?”, “it”,“is”, “time”, “What” ] A : [ GO, “It”, “is”, “seven”, “thirty”, “.”, EOS, PAD, PAD, PAD ]
  • 28. Source: https://www.tensorflow.org/ bucketing Efficiently handle sentences of different lengths Ex: 100 tokens is the largest sentence in corpus How about short sentences like: "How are you?" → lots of PAD Bucket list: [(5, 10), (10, 15), (20, 25), (40, 50)] (defaut on Tensorflow translate.py) Q : [ PAD, PAD, “.”, “go”,“I”] A : [GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
  • 29. Word embeddings (remember previous presentation ;-) Distributed representations → syntactic and semantic is captured Take = 0.286 0.792 -0.177 -0.107 0.109 -0.542 0.349 0.271
  • 30. Word embeddings (remember previous presentation ;-) Linguistic regularities (recap)
  • 31. Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation) Source: arXiv 1406.1078v3
  • 32. Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation) Source: arXiv 1406.1078v3 1000d vector representation
  • 34. Neural conversational model - chatbots Source: arXiv 1506.05869v3
  • 36. Google Smart reply Source: arXiv 1606.04870v1 Interesting facts ● Currently responsible for 10% Inbox replies ● Training set 238 million messages
  • 37. Google Smart reply Source: arXiv 1606.04870v1 Seq2Seq model Interesting facts ● Currently responsible for 10% Inbox replies ● Training set 238 million messages Feedforward triggering model Semi-supervised semantic clustering
  • 38. Image captioning(Paper - Show and Tell: A Neural Image Caption Generator) Source: arXiv 1411.4555v2
  • 39. Image captioning(Paper - Show and Tell: A Neural Image Caption Generator) Encoder Decoder Source: arXiv 1411.4555v2
  • 41. Multi-task sequence to sequence(Paper - MULTI-TASK SEQUENCE TO SEQUENCE LEARNING) Source: arXiv 1511.06114v4 One-to-Many (common encoder) Many-to-One (common decoder) Many-to-Many
  • 42. Neural programmer(Paper - NEURAL PROGRAMMER: INDUCING LATENT PROGRAMS WITH GRADIENT DESCENT) Source: arXiv 1511.04834v3
  • 43. Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING) Source: arXiv 1611.02683v1
  • 44. Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING) Source: arXiv 1611.02683v1 Pre-trained Pre-trained
  • 46. Place your screenshot here A Quick example on tensorflow