SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
A Survey of Current
Neural Network
Architectures for NLP
Márton Miháltz
Meltwater Group
Hungarian NLP Meetup
2
• Introduction
• Short intro to NN concepts
• Recurrent neural networks
• Long Short-Term Memory, Gated Recurrent Unit
• Recursive neural networks
• Applications to sentiment analysis: Socher et al. 2013; Tai et al. 2015
• Convolutional neural networks
• Applications to text classification: Kim 2014
• Some more recent architectures
• Memory networks, attention models, hybrid architectures
• Tools
• Theano, Torch, Tensor Flow, Caffe, Keras
Outline
3
• Feed-forward neural network
• Activation fn: tanh, ReLU,
Leaky/Parametric ReLU, SoftPlus, …
• Logistic regression or softmax
function for classification layer
• Loss functions (objectives):
categorical cross-entropy, neg. log
likelihood, …
• Training (optimizers): Gradient
Descent, SGD, Mini-batch GD,
RMSprop, Ada, Adagrad, Adam,
Adamax, Nesterov Momentum,
L-BFGS, …
Very Short Intro to Modern Neural Networks
• Input embeddings
• 1-hot encoding
• Random vectors
• Pre-trained vectors, eg. distributional similarity
4
● Tutorials, Blogs
○ Denny Britz’s blog (RNNs, CNNs for NLP, code etc.) -- code in Theano, Tensor Flow
○ Cristopher Olah’s blog (architectures, DL for NLP etc.)
○ Andrej Karpathy’s fun blogpost about RNNs: generate Shakespeare, Paul Graham text,
LaTex source, C code etc. + nice LSTM activity visualizations
○ Deeplearning.net Tutorial -- code in Theano (python)
● Courses
○ Richard Socher’s course Deep Learning for Natural Language Processing at Stanford --
code in Tensor Flow
○ Stanford Unsupervised Feature Learning and Deep Learning Tutorial -- code in Matlab
○ Stanford course Convolutional Neural Networks for Image Recognition (Andrej Karpathy)
● Other sources
○ Bengio’s Deep Learning book
Further Reading (DL for NLP)
5
• Powerful apparatus for learning complex functions for ML
• Better at certain NLP tasks than previous methods
• Pre-trained distributed representation vectors
• Word2vec, GloVe, GenSim, doc2vec, skip-thought vectors etc.
• Vector space properties: similarity, analogies, compositionality etc.
• Less feature engineering needed
• Network learns abstract representations
• Transfer learning / domain adaptation
• Joint learning/execution of NLP steps possible
• Easy to go multimodal
Why Deep Learning for NLP?
6
● About RNNs
○ Internal state depends on state of last step
○ Good for sequential input
○ Backprop. Through Time (BPTT) training
● Applications
○ Language modeling (eg. in machine translation)
○ Sequential labeling
○ Text generation (eg. image description generation, together w/ CNN)
● Problems with RNNs
○ Long sentences, long-term dependencies
○ Exponentially shrinking gradients (“vanishing gradients”)
○ Solutions:
■ Initialization of weights; regularization; using ReLU activ. fn.
■ RNN variations: bidirectional RNN, deep RNN etc.
■ gated RNNs: LSTM, GRU
Recurrent Neural Networks
7
• Long Short Term Memory Networks
• A special recurrent network
• Has a memory cell (internal memory) (c)
• 3 gates: input, forget, output
sigmoid layers with pointwise multiplication
operation (vector of values in [0, 1])
• LSTM is able to remove or add information to the
cell state, regulated by gates, which optionally let
information through
• Gated Recurrent Units
• Another RNN variant
• No internal memory different from internal state
• 2 gates: reset, update (z)
• Reset gate: how to combine new input with previous
state, update gate: how much of the previous state
to keep
LSTMs and GRUs
t-1 t-1
t-1 t-1
[Chung et al. 2014
+ red labels by me]
8
• Overcome RNNs’ long dependency limitations
& vanishing gradients problem
• Very hip in current NLP applications, eg. SOTA in MT
• More complex architectures:
• Bi-directional LSTM
• Stacked (deep) (B-)LSTM/GRU layers
• Another extension, Grid-LSTM (Kalchbrenner et al. 2015)
• Still evolving!
• LSTM vs. GRU better: still in the jury
• GRU has fewer parameters, may be faster to train
• LSTM may be better with more data
LSTMs and GRUs
9
• About RNNs
• Hierarchical architecture
• Shared weights
• Plausible approach for modeling linguistics structures
• Sentiment Analysis with Recursive Networks (Socher et al. 2013)
• Compositional processing of parsed input (Eg. able to handle negations)
• Performs sentence-level sentiment classification:
Rotten Tomatoes dataset (Pang & Lee 2005): 11K movie review sentences pos or neg
85.5% Accuracy on binary class subset, 45.7% on 5-class
• Not SOTA score any more, but was first to go over 80% after 7 years
• Sentiment Treebank for training
Recursive Networks
10
• Sentence words: embedding layer w/ random initial vectors (d=25..35)
• Parse nodes: compositionality function computes representation, recursive
• Softmax classifier: pos-neg (or 5-class) label for each word & each parse node
Recursive Neural Tensor Network
● Weight tensor V:
● Intuition:
each slice of the tensor
captures a specific
type of composition
Sentiment Analysis with RNTN
12
• Tree-LSTM
• Using constituency parsing
• Using GloVe word vectors, updated during training
• Idea: sum hidden states of child vectors
of tree nodes
• Each child has its own forget gate
• Polarity softmax classifiers on tree nodes
• Improves Socher et al 2013
• Fine-grained sentence sentiment: 51.0% vs. 45.7%
• Binary sentence sentiment: 88.0% vs. 85.4%
Tree-LSTMs for Sentiment Analysis
(Tai et al 2015)
13
Convolutional Neural Networks
• CNNs (ConvNets) widely used in
image processing
• Location invariety
• Compositionality
• Fast
• Convolution layers
• “sliding window” over input representation:
filter/kernel/feature generator
• Local connectivity
• Sharing weights
• Hyperparameters
• Wide vs. narrow convolution (padding)
• Filter size (width, height, depth)
• Number of filters/layer
• Stride size
• Channels (R, G, B)
14
CNNs for Text Classification
● Intuition: filter windows over
sentence words <-> n-grams
● Advantage over Recursive
NN/Tree-LSTM: does not require
parsing
● Becoming a standard baseline for
new text classification architectures
● Easy to parallelize on GPUs
15
CNN for Sentiment Analysis (Kim 2014)
• Sentence polarity classification (RT dataset/Sentiment Treebank)
• 88.1% on binary sentiment classification
• Use word2vec vectors
• sentences: concatenated word vectors
• 2 channels:
• Static word2vec vectors & tuned via backprop
• Multiple window sizes (h=3,4,5) and multiple filters (eg. 100)
• Apply max-pooling on feature map
• Selects most important feature from feature map
• Penultimate layer: final feature vector
• Concatenate all pooled features
• Final layer: softmax classifier (pos/neg sentiment)
• Regularization: dropout on penultimate layer
• Randomly set to 0 some of the feature weights
• Prevents co-adaptation of hidden units during forward propagation (overfitting)
16
Adaptation of
Word Vectors
17
• Recursive NNs
• Linguistically plausible, applicable to grammatical structures,
needs parsing
• Recurrent NNs
• Engineered for sequential input, current improvements with gated
RNNs (LSTM, GRU etc.)
• Convolutional NNs
• Exceptionally good for classification; unclear how to incorporate
phrase-level structures, hard to interpret, needs zero padding,
good for GPUs
Summary
18
• Memory Networks
• MemN2N (Sukhbaatar et al 2015)
Facebook’s bAbI Question Answering tasks 90-90%
• Dynamic Memory Networks (Kumar, Irsoy et al 2015): Sentiment on RT dataset 88.6%
Episodic memory: input sequences, questions, reasoning about answers
• Attention models
• Parsing (Vinyals & Hinton et al 2015); Machine Translation (Bahdanau & Bengio et al 2016)
• Relation extraction with LSTM + attention (Zhou et al 2016)
• Sentence embeddings with attention model (Wang et al 2016)
• Hybrid architectures
• NER with BLSTM-CNN (Chiu & Nichols 2016): 91.62% CoNLL, 86.28% OntoNotes
• Sequential labeling with BLSTM-CNN-CRF (Ma & Hovy 2016): 97.55% PoS, 91.21% NER
• Sentiment Analysis using CNN-LSTM (Wang et al 2016)
• Joint learning of NLP tasks
• Pos-tagging, chunking and CC-tagging with one network (Søgaard & Goldberg 2016)
• JEDI: Joint learning of NER and RE (Kirschnick et al 2016)
Some Recent Work
19
● Cuda, CudNN
○ You need these drivers installed
to utilize the GPU (Nvidia)
● Theano
○ Low level abstraction; you define
symbolic variables & functions;
python
● Tensor Flow
○ Low level abstraction; you define
data flow graphs; C++, python
● Torch
○ High abstraction level; very easy
C interfacing, Lua
Tools for Hacking ● Caffe
○ Very high level, simple JSON
config, little versatility, most useful
with convnets (C+Python to
extend)
● High-level wrappers
○ Keras: can bind to either Tensor
Flow or Theano; python
○ SkFlow: wrapper around Tensor
Flow for those familiar with
Scikit-learn; python
○ Pretty Tensor, TensorFlow
Slim: high level wrapper functions
for Tensor Flow; python
○ Digits: Supports Caffe and Torch
● More
○ nice overview here
Thank you!

Contenu connexe

Tendances

Tendances (20)

Deeplearning NLP
Deeplearning NLPDeeplearning NLP
Deeplearning NLP
 
Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 

En vedette

NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
evolutionpd
 

En vedette (20)

Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
新たなRNNと自然言語処理
新たなRNNと自然言語処理新たなRNNと自然言語処理
新たなRNNと自然言語処理
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
NLP
NLPNLP
NLP
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analytics
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill Boorman
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cws
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine Learning
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 

Similaire à Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)

Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
Natasha Latysheva
 

Similaire à Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07) (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Nlpnn
NlpnnNlpnn
Nlpnn
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Presentation-Licentiate degree.pptx
Presentation-Licentiate degree.pptxPresentation-Licentiate degree.pptx
Presentation-Licentiate degree.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 

Dernier

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 

Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)

  • 1. A Survey of Current Neural Network Architectures for NLP Márton Miháltz Meltwater Group Hungarian NLP Meetup
  • 2. 2 • Introduction • Short intro to NN concepts • Recurrent neural networks • Long Short-Term Memory, Gated Recurrent Unit • Recursive neural networks • Applications to sentiment analysis: Socher et al. 2013; Tai et al. 2015 • Convolutional neural networks • Applications to text classification: Kim 2014 • Some more recent architectures • Memory networks, attention models, hybrid architectures • Tools • Theano, Torch, Tensor Flow, Caffe, Keras Outline
  • 3. 3 • Feed-forward neural network • Activation fn: tanh, ReLU, Leaky/Parametric ReLU, SoftPlus, … • Logistic regression or softmax function for classification layer • Loss functions (objectives): categorical cross-entropy, neg. log likelihood, … • Training (optimizers): Gradient Descent, SGD, Mini-batch GD, RMSprop, Ada, Adagrad, Adam, Adamax, Nesterov Momentum, L-BFGS, … Very Short Intro to Modern Neural Networks • Input embeddings • 1-hot encoding • Random vectors • Pre-trained vectors, eg. distributional similarity
  • 4. 4 ● Tutorials, Blogs ○ Denny Britz’s blog (RNNs, CNNs for NLP, code etc.) -- code in Theano, Tensor Flow ○ Cristopher Olah’s blog (architectures, DL for NLP etc.) ○ Andrej Karpathy’s fun blogpost about RNNs: generate Shakespeare, Paul Graham text, LaTex source, C code etc. + nice LSTM activity visualizations ○ Deeplearning.net Tutorial -- code in Theano (python) ● Courses ○ Richard Socher’s course Deep Learning for Natural Language Processing at Stanford -- code in Tensor Flow ○ Stanford Unsupervised Feature Learning and Deep Learning Tutorial -- code in Matlab ○ Stanford course Convolutional Neural Networks for Image Recognition (Andrej Karpathy) ● Other sources ○ Bengio’s Deep Learning book Further Reading (DL for NLP)
  • 5. 5 • Powerful apparatus for learning complex functions for ML • Better at certain NLP tasks than previous methods • Pre-trained distributed representation vectors • Word2vec, GloVe, GenSim, doc2vec, skip-thought vectors etc. • Vector space properties: similarity, analogies, compositionality etc. • Less feature engineering needed • Network learns abstract representations • Transfer learning / domain adaptation • Joint learning/execution of NLP steps possible • Easy to go multimodal Why Deep Learning for NLP?
  • 6. 6 ● About RNNs ○ Internal state depends on state of last step ○ Good for sequential input ○ Backprop. Through Time (BPTT) training ● Applications ○ Language modeling (eg. in machine translation) ○ Sequential labeling ○ Text generation (eg. image description generation, together w/ CNN) ● Problems with RNNs ○ Long sentences, long-term dependencies ○ Exponentially shrinking gradients (“vanishing gradients”) ○ Solutions: ■ Initialization of weights; regularization; using ReLU activ. fn. ■ RNN variations: bidirectional RNN, deep RNN etc. ■ gated RNNs: LSTM, GRU Recurrent Neural Networks
  • 7. 7 • Long Short Term Memory Networks • A special recurrent network • Has a memory cell (internal memory) (c) • 3 gates: input, forget, output sigmoid layers with pointwise multiplication operation (vector of values in [0, 1]) • LSTM is able to remove or add information to the cell state, regulated by gates, which optionally let information through • Gated Recurrent Units • Another RNN variant • No internal memory different from internal state • 2 gates: reset, update (z) • Reset gate: how to combine new input with previous state, update gate: how much of the previous state to keep LSTMs and GRUs t-1 t-1 t-1 t-1 [Chung et al. 2014 + red labels by me]
  • 8. 8 • Overcome RNNs’ long dependency limitations & vanishing gradients problem • Very hip in current NLP applications, eg. SOTA in MT • More complex architectures: • Bi-directional LSTM • Stacked (deep) (B-)LSTM/GRU layers • Another extension, Grid-LSTM (Kalchbrenner et al. 2015) • Still evolving! • LSTM vs. GRU better: still in the jury • GRU has fewer parameters, may be faster to train • LSTM may be better with more data LSTMs and GRUs
  • 9. 9 • About RNNs • Hierarchical architecture • Shared weights • Plausible approach for modeling linguistics structures • Sentiment Analysis with Recursive Networks (Socher et al. 2013) • Compositional processing of parsed input (Eg. able to handle negations) • Performs sentence-level sentiment classification: Rotten Tomatoes dataset (Pang & Lee 2005): 11K movie review sentences pos or neg 85.5% Accuracy on binary class subset, 45.7% on 5-class • Not SOTA score any more, but was first to go over 80% after 7 years • Sentiment Treebank for training Recursive Networks
  • 10. 10 • Sentence words: embedding layer w/ random initial vectors (d=25..35) • Parse nodes: compositionality function computes representation, recursive • Softmax classifier: pos-neg (or 5-class) label for each word & each parse node Recursive Neural Tensor Network ● Weight tensor V: ● Intuition: each slice of the tensor captures a specific type of composition
  • 12. 12 • Tree-LSTM • Using constituency parsing • Using GloVe word vectors, updated during training • Idea: sum hidden states of child vectors of tree nodes • Each child has its own forget gate • Polarity softmax classifiers on tree nodes • Improves Socher et al 2013 • Fine-grained sentence sentiment: 51.0% vs. 45.7% • Binary sentence sentiment: 88.0% vs. 85.4% Tree-LSTMs for Sentiment Analysis (Tai et al 2015)
  • 13. 13 Convolutional Neural Networks • CNNs (ConvNets) widely used in image processing • Location invariety • Compositionality • Fast • Convolution layers • “sliding window” over input representation: filter/kernel/feature generator • Local connectivity • Sharing weights • Hyperparameters • Wide vs. narrow convolution (padding) • Filter size (width, height, depth) • Number of filters/layer • Stride size • Channels (R, G, B)
  • 14. 14 CNNs for Text Classification ● Intuition: filter windows over sentence words <-> n-grams ● Advantage over Recursive NN/Tree-LSTM: does not require parsing ● Becoming a standard baseline for new text classification architectures ● Easy to parallelize on GPUs
  • 15. 15 CNN for Sentiment Analysis (Kim 2014) • Sentence polarity classification (RT dataset/Sentiment Treebank) • 88.1% on binary sentiment classification • Use word2vec vectors • sentences: concatenated word vectors • 2 channels: • Static word2vec vectors & tuned via backprop • Multiple window sizes (h=3,4,5) and multiple filters (eg. 100) • Apply max-pooling on feature map • Selects most important feature from feature map • Penultimate layer: final feature vector • Concatenate all pooled features • Final layer: softmax classifier (pos/neg sentiment) • Regularization: dropout on penultimate layer • Randomly set to 0 some of the feature weights • Prevents co-adaptation of hidden units during forward propagation (overfitting)
  • 17. 17 • Recursive NNs • Linguistically plausible, applicable to grammatical structures, needs parsing • Recurrent NNs • Engineered for sequential input, current improvements with gated RNNs (LSTM, GRU etc.) • Convolutional NNs • Exceptionally good for classification; unclear how to incorporate phrase-level structures, hard to interpret, needs zero padding, good for GPUs Summary
  • 18. 18 • Memory Networks • MemN2N (Sukhbaatar et al 2015) Facebook’s bAbI Question Answering tasks 90-90% • Dynamic Memory Networks (Kumar, Irsoy et al 2015): Sentiment on RT dataset 88.6% Episodic memory: input sequences, questions, reasoning about answers • Attention models • Parsing (Vinyals & Hinton et al 2015); Machine Translation (Bahdanau & Bengio et al 2016) • Relation extraction with LSTM + attention (Zhou et al 2016) • Sentence embeddings with attention model (Wang et al 2016) • Hybrid architectures • NER with BLSTM-CNN (Chiu & Nichols 2016): 91.62% CoNLL, 86.28% OntoNotes • Sequential labeling with BLSTM-CNN-CRF (Ma & Hovy 2016): 97.55% PoS, 91.21% NER • Sentiment Analysis using CNN-LSTM (Wang et al 2016) • Joint learning of NLP tasks • Pos-tagging, chunking and CC-tagging with one network (Søgaard & Goldberg 2016) • JEDI: Joint learning of NER and RE (Kirschnick et al 2016) Some Recent Work
  • 19. 19 ● Cuda, CudNN ○ You need these drivers installed to utilize the GPU (Nvidia) ● Theano ○ Low level abstraction; you define symbolic variables & functions; python ● Tensor Flow ○ Low level abstraction; you define data flow graphs; C++, python ● Torch ○ High abstraction level; very easy C interfacing, Lua Tools for Hacking ● Caffe ○ Very high level, simple JSON config, little versatility, most useful with convnets (C+Python to extend) ● High-level wrappers ○ Keras: can bind to either Tensor Flow or Theano; python ○ SkFlow: wrapper around Tensor Flow for those familiar with Scikit-learn; python ○ Pretty Tensor, TensorFlow Slim: high level wrapper functions for Tensor Flow; python ○ Digits: Supports Caffe and Torch ● More ○ nice overview here