SlideShare une entreprise Scribd logo
1  sur  35
A Convolutional Neural Network for
Modelling Sentences
Presented By :
Anish Bhanushali
Anjani Jha
Mansi Goel
Authors :
Nal Kalchbrenner (University of Oxford)
Edward Grefenstette (University of Oxford)
Phil Blunsom (University of Oxford)
Objective :
This paper aims at develop an effective sentence model to analyse
and represent the semantic content of a sentence using a dynamic
CNN architecture
Word representation
The vast majority of rule-based and statistical NLP work regards words as atomic
symbols: hotel,conference,walk
One-hot Representation:
In vector space terms, this is a vector with one 1 and a lot of zeroes
[0 0 0 0 0 0 0 0 0 0 1 0
0 0 0]
Dimensionality: 20K (speech) – 50K (PTB) – 500K (big vocab) – 13M (Google 1T)
Problem with this representation:
motel [0 0 0 0 0 0 0 0 0 0 1
0 0 0 0] AND
Distributional similarity based representations
You can get a lot of value by representing a word by means of its
neighbors
“You shall know a word by the company it keeps”
(J. R. Firth 1957: 11)
One of the most successful ideas of modern statistical NLP
government debt problems turning into
saying that Europe needs unified
banking
banking
crises as has happened in
regulation to replace the hodgepodge
These words will represent banking
How to make neighbors represent words?
Answer: With a cooccurrence matrix X
Window based co-occurrence matrix
• Window around each word captures both syntactic (POS) and semantic
information
• Window length 1 (more common: 5 - 10)
• Symmetric (irrelevant whether left or right context)
Window based cooccurence matrix
3/31/1 6Richard Soc her
• Example corpus:
• I like deep learning.
• I like NLP.
• I enjoy flying.
counts I like enjoy deep learning NLP flying .
I 0 2 1 0 0 0 0 0
like 2 0 0 1 0 1 0 0
enjoy 1 0 0 0 0 0 1 0
deep 0 1 0 0 1 0 0 0
learning 0 0 0 1 0 0 0 1
NLP 0 1 0 0 0 0 0 1
flying 0 0 1 0 0 0 0 1
. 0 0 0 0 1 1 1 0
Problems with simple cooccurrence vectors
• Increase in size with vocabulary
• Very high dimensional: require a lot of storage
• Subsequent classification models have sparsity issues
• Models are less robust
Solution: Low dimensional vectors
• Idea: store “most” of the important information in a fixed, small
number of dimensions: a dense vector
• Usually around 25 – 1000 dimensions
• How to reduce the dimensionality?
Method 1: Dimensionality Reduction on X
Singular Value Decomposition of cooccurrence matrix X.
r
= nn
r
r
S
.
S2
S1
Sr
m m
V1
V2
V3
S3 .
. .
.
.
U1U2U3 . . .
=n
X U S
X
m
T
V
Sk
0
0
0
0
T
V
m V1
U1U2U3
. .
.
S2
V2
S1
S3 .
.
V.
3
k
U
k
kn
r
k
X is the best rank k approximation to X , in terms of least squares.
Simple SVD word vectors in Python
Corpus:
I like deep learning. I like NLP. I enjoy flying.
Simple SVD word vectors in Python
Richard Socher 3/31/16
Corpus: I like deep learning. I like NLP. I enjoy flying.
Printing first two columns of U corresponding to the 2 biggest singular values
Interesting syntactic patterns emerge in the vectors
TAKE
SHOW
TAKING
TOOK
TAKEN
SPEAK
SPEAKING
GROW
GROWING
SHOWN
SHOWED
SHOWING
EATING
SPOKE
CHOOSE
CHOSCEHNOSE
GROWN
GREW
SPOKEN
THROWING
STEAL
ATE
THROWTNHRTOHWREW
STOLEN
STEALING
CHOOSING
STOLE
EATEEANT
An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence
Rohde et al. 2005
Interesting semantic patterns emerge in the vectors
DRIVE
LEARN
DOCTOR
CLEAN
DRIVER
TEACH
TEACHER
TREAT PRAY
PRIEST
MARRY
SWIM
BRIDE
JANITOR
STUDENT
SWIMMER
An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence
Rohde et al. 2005
Problems with SVD
• Computational cost scales quadratically for n matrix: O(mn2) flops (when n<m)
• Bad for millions of words or documents
• Hard to incorporate new words or documents
• Instead of capturing co-occurrence counts directly,
• Predict surrounding words of every word
• Faster and can easily incorporate a new sentence/ document or add a word to
the vocabulary
Idea: Directly learn low-dimensional word vectors
Main Idea of word2vec
• Instead of capturing cooccurrence counts directly,
• Predict surrounding words of every word
• Faster and can easily incorporate a new sentence/ document or add
a word to the vocabulary
Details of Word2Vec
• Predict surrounding words in a window of length m of every word.
• Objective function: Maximize the log probability of any context word
given the current center word:
•
Where represents all variables we optimize
Linear Relationships in word2vec
These representations are very good at encoding dimensions of similarity!
• Analogies testing dimensions of similarity can be solved quite well just by doing
vector subtraction in the embedding space Syntactically
•xapple
− xapples
≈ xcar
− xcars
≈ xfamily
− xfamilies
•Similarly for verb and adjective morphological forms Semantically
(Semeval 2012 task 2)
•xshirt
− xclothing
≈ xchair
− xfurniture
•xking
− xman
≈ xqueen
− xwoman
The continuous bag of words model
• Main idea for continuous bag of words (CBOW): Predict center word from sum of
surrounding word vectors instead of predicting surrounding single words from center
word as in skip- gram model
• Disregard grammar and
work order
• Share the weight of
each words
The skip-gram model and negative sampling
• From paper: “Distributed Representations of Words and Phrases and their
Compositionality”(Mikolov et al. 2013)
• Main Idea : train binary logistic regressions for a true pair (center word and word in its
context window) and a couple of random pairs (the center word with a random word)
• Overall objective function:
where k is the number of negative samples we use
• So we maximize the probability of two words co-occurring in first log and minimize
prob. that random words appear around center word in second log
Convolution (one dimensional )
• Here we’ll introduce two type of convolution
• 1) Narrow
• 2) Wide
cj= tr(m)sj-m+1:j
Here m = 5 , m<=j<=s
|C| =|S|-|m|+1
Ci from R
|C| =|S|+|m|-1
Ci from R
Out-of-range input values s where i < 1
or i > s are taken to be zero.
Time-Delay Neural Networks
• The sequence S is viewed as having a time dimension and the
convolution is applied over the time dimension
• Each Sj is vector in Sentence matrix of d x s and M is matrix of
weight of size d x m
• Each row of m is convolved with the corresponding row of s and the
convolution is usually of the narrow type.
• To address the problem of varying sentence lengths, the Max-TDNN
takes the maximum of each row in the resulting matrix c yielding a
vector of d values
Narrow one dim . convolution through time
axis
S = d = 4 , s = 5 d = 4 , m =2
d = 4 , s=4
Taking Max
of every
row
Result matrix C
C vector is used as input to fully connected layer for
classification
But DCNN is slightly different
• 1) we use wide convolution at row wise 1D convolution
• 2)Then Dynamic k-max pooling operation will be used
• 3) We apply non linearity to the pooled output
• 4) Collection of 1-3 steps could be repeated n-times
• 5)Folding (usually comes around last layer )
• 6) k-max pooling
• 7)FULLY connected layer .
Image from Kalchbrenner (2014)
Dynamic k-Max Pooling
Dynamic k-Max Pooling
• The k-max pooling operation makes it possible to pool the k most active
features in p that may be a number of positions apart;
• it preserves the order of the features, but is insensitive to their specific
positions.
• It can also discern more finely the number of times the feature is highly
activated in p
• The k-max pooling operator is applied in the network after the topmost
convolutional layer.
• At intermediate convolutional layers the pooling parameter k is not fixed,
but is dynamically selected in order to allow for a smooth extraction of
higher-order and longer-range features.
Dynamic k-Max Pooling
Here,
l is the number of the current convolutional layer to which the pooling is applied
L is the total number of convolutional layers in the network;
ktop is the fixed pooling parameter for the topmost convolutional layer.
Non-Linear Features
• After (dynamic) k-max pooling is applied to the result of a
convolution, a bias b and a nonlinear function g are applied
component-wise to the pooled matrix. There is a single bias value for
each row of the pooled matrix d.
So why does this model work ?
• This model is sensitive to word order in input
• It can discriminate if any specific n-gram occurs in input
• To some extent it can tell relative position of most relevant n-gram
The left diagram emphasizes the pooled nodes. The
width of the convolutional filters is 3 and 2
respectively. With dynamic pooling, a filter with small
width at the higher layers can relate phrases far apart
in the input sentence.
What makes the feature graph of a DCNN peculiar
is the global range of the pooling operations.
The (dynamic) k-max pooling operator can draw
together features that correspond to words that are
many positions apart in the sentence
One variation to current approach
• Yoon Kim(2014)
Experiments (Sentiment Prediction in Movie Reviews)
Training :
The top layer of the network has a fully connected layer followed by a softmax non-linearity
that predicts the probability distribution over classes given the input sentence. The network
is trained to minimise the cross-entropy of the predicted and true distributions; the
objective includes an L 2 regularisation
In the binary case,
we use the given splits of 6920 training, 872 dev
and 1821 test sentences.
In the fine-grained case, we use the
standard 8544/1101/2210 splits.
The size of the vocabulary is 15448.
Question Type Classification
A question may be classified as
belonging to one of many question
types. The TREC questions dataset
involves six different question
types, e.g. whether the question
is about a location, about a person or
about some numeric information (Li
and Roth, 2002). The training
dataset consists of 5452 labelled
questions whereas the test dataset
consists of 500 questions.
Twitter Sentiment Prediction with
Distant Supervision
Train the models on a large dataset of
tweets, where a tweet is automatically
labelled as positive or negative depending
on the emoticon that occurs in it.
The training set consists of 1.6 million
tweets with emoticon-based labels and
the test set of about 400 hand-annotated
tweets.
This results in a vocabulary of 76643 word
types. The architecture of the DCNN
Thank you!

Contenu connexe

Tendances

Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networksHojin Yang
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...Taiji Suzuki
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedSEMINARGROOT
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...Taiji Suzuki
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2Jeong Choi
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Taiji Suzuki
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial Ligeng Zhu
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkEMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkPeinan ZHANG
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 

Tendances (20)

Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networks
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image Description
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkEMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 

Similaire à Convolutional Neural Network for Modelling Sentences

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Deep Learning Bangalore meet up
Deep Learning Bangalore meet up Deep Learning Bangalore meet up
Deep Learning Bangalore meet up Satyam Saxena
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detectionNASIM ALAM
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networksFares Hasan
 
20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared ver20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared verHsing-chuan Hsieh
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 

Similaire à Convolutional Neural Network for Modelling Sentences (20)

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Icon18revrec sudeshna
Icon18revrec sudeshnaIcon18revrec sudeshna
Icon18revrec sudeshna
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Deep Learning Bangalore meet up
Deep Learning Bangalore meet up Deep Learning Bangalore meet up
Deep Learning Bangalore meet up
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
 
Word embedding
Word embedding Word embedding
Word embedding
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networks
 
20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared ver20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared ver
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Spectral convnets
Spectral convnetsSpectral convnets
Spectral convnets
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 

Dernier

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 

Dernier (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 

Convolutional Neural Network for Modelling Sentences

  • 1. A Convolutional Neural Network for Modelling Sentences Presented By : Anish Bhanushali Anjani Jha Mansi Goel Authors : Nal Kalchbrenner (University of Oxford) Edward Grefenstette (University of Oxford) Phil Blunsom (University of Oxford)
  • 2. Objective : This paper aims at develop an effective sentence model to analyse and represent the semantic content of a sentence using a dynamic CNN architecture
  • 3. Word representation The vast majority of rule-based and statistical NLP work regards words as atomic symbols: hotel,conference,walk One-hot Representation: In vector space terms, this is a vector with one 1 and a lot of zeroes [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] Dimensionality: 20K (speech) – 50K (PTB) – 500K (big vocab) – 13M (Google 1T) Problem with this representation: motel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] AND
  • 4. Distributional similarity based representations You can get a lot of value by representing a word by means of its neighbors “You shall know a word by the company it keeps” (J. R. Firth 1957: 11) One of the most successful ideas of modern statistical NLP government debt problems turning into saying that Europe needs unified banking banking crises as has happened in regulation to replace the hodgepodge These words will represent banking
  • 5. How to make neighbors represent words? Answer: With a cooccurrence matrix X Window based co-occurrence matrix • Window around each word captures both syntactic (POS) and semantic information • Window length 1 (more common: 5 - 10) • Symmetric (irrelevant whether left or right context)
  • 6. Window based cooccurence matrix 3/31/1 6Richard Soc her • Example corpus: • I like deep learning. • I like NLP. • I enjoy flying. counts I like enjoy deep learning NLP flying . I 0 2 1 0 0 0 0 0 like 2 0 0 1 0 1 0 0 enjoy 1 0 0 0 0 0 1 0 deep 0 1 0 0 1 0 0 0 learning 0 0 0 1 0 0 0 1 NLP 0 1 0 0 0 0 0 1 flying 0 0 1 0 0 0 0 1 . 0 0 0 0 1 1 1 0
  • 7. Problems with simple cooccurrence vectors • Increase in size with vocabulary • Very high dimensional: require a lot of storage • Subsequent classification models have sparsity issues • Models are less robust
  • 8. Solution: Low dimensional vectors • Idea: store “most” of the important information in a fixed, small number of dimensions: a dense vector • Usually around 25 – 1000 dimensions • How to reduce the dimensionality?
  • 9. Method 1: Dimensionality Reduction on X Singular Value Decomposition of cooccurrence matrix X. r = nn r r S . S2 S1 Sr m m V1 V2 V3 S3 . . . . . U1U2U3 . . . =n X U S X m T V Sk 0 0 0 0 T V m V1 U1U2U3 . . . S2 V2 S1 S3 . . V. 3 k U k kn r k X is the best rank k approximation to X , in terms of least squares.
  • 10. Simple SVD word vectors in Python Corpus: I like deep learning. I like NLP. I enjoy flying.
  • 11. Simple SVD word vectors in Python Richard Socher 3/31/16 Corpus: I like deep learning. I like NLP. I enjoy flying. Printing first two columns of U corresponding to the 2 biggest singular values
  • 12. Interesting syntactic patterns emerge in the vectors TAKE SHOW TAKING TOOK TAKEN SPEAK SPEAKING GROW GROWING SHOWN SHOWED SHOWING EATING SPOKE CHOOSE CHOSCEHNOSE GROWN GREW SPOKEN THROWING STEAL ATE THROWTNHRTOHWREW STOLEN STEALING CHOOSING STOLE EATEEANT An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence Rohde et al. 2005
  • 13. Interesting semantic patterns emerge in the vectors DRIVE LEARN DOCTOR CLEAN DRIVER TEACH TEACHER TREAT PRAY PRIEST MARRY SWIM BRIDE JANITOR STUDENT SWIMMER An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence Rohde et al. 2005
  • 14. Problems with SVD • Computational cost scales quadratically for n matrix: O(mn2) flops (when n<m) • Bad for millions of words or documents • Hard to incorporate new words or documents • Instead of capturing co-occurrence counts directly, • Predict surrounding words of every word • Faster and can easily incorporate a new sentence/ document or add a word to the vocabulary Idea: Directly learn low-dimensional word vectors
  • 15. Main Idea of word2vec • Instead of capturing cooccurrence counts directly, • Predict surrounding words of every word • Faster and can easily incorporate a new sentence/ document or add a word to the vocabulary
  • 16. Details of Word2Vec • Predict surrounding words in a window of length m of every word. • Objective function: Maximize the log probability of any context word given the current center word: • Where represents all variables we optimize
  • 17. Linear Relationships in word2vec These representations are very good at encoding dimensions of similarity! • Analogies testing dimensions of similarity can be solved quite well just by doing vector subtraction in the embedding space Syntactically •xapple − xapples ≈ xcar − xcars ≈ xfamily − xfamilies •Similarly for verb and adjective morphological forms Semantically (Semeval 2012 task 2) •xshirt − xclothing ≈ xchair − xfurniture •xking − xman ≈ xqueen − xwoman
  • 18. The continuous bag of words model • Main idea for continuous bag of words (CBOW): Predict center word from sum of surrounding word vectors instead of predicting surrounding single words from center word as in skip- gram model • Disregard grammar and work order • Share the weight of each words
  • 19. The skip-gram model and negative sampling • From paper: “Distributed Representations of Words and Phrases and their Compositionality”(Mikolov et al. 2013) • Main Idea : train binary logistic regressions for a true pair (center word and word in its context window) and a couple of random pairs (the center word with a random word) • Overall objective function: where k is the number of negative samples we use • So we maximize the probability of two words co-occurring in first log and minimize prob. that random words appear around center word in second log
  • 20. Convolution (one dimensional ) • Here we’ll introduce two type of convolution • 1) Narrow • 2) Wide cj= tr(m)sj-m+1:j Here m = 5 , m<=j<=s |C| =|S|-|m|+1 Ci from R |C| =|S|+|m|-1 Ci from R Out-of-range input values s where i < 1 or i > s are taken to be zero.
  • 21. Time-Delay Neural Networks • The sequence S is viewed as having a time dimension and the convolution is applied over the time dimension • Each Sj is vector in Sentence matrix of d x s and M is matrix of weight of size d x m • Each row of m is convolved with the corresponding row of s and the convolution is usually of the narrow type. • To address the problem of varying sentence lengths, the Max-TDNN takes the maximum of each row in the resulting matrix c yielding a vector of d values
  • 22. Narrow one dim . convolution through time axis S = d = 4 , s = 5 d = 4 , m =2 d = 4 , s=4 Taking Max of every row Result matrix C C vector is used as input to fully connected layer for classification
  • 23. But DCNN is slightly different • 1) we use wide convolution at row wise 1D convolution • 2)Then Dynamic k-max pooling operation will be used • 3) We apply non linearity to the pooled output • 4) Collection of 1-3 steps could be repeated n-times • 5)Folding (usually comes around last layer ) • 6) k-max pooling • 7)FULLY connected layer .
  • 26. Dynamic k-Max Pooling • The k-max pooling operation makes it possible to pool the k most active features in p that may be a number of positions apart; • it preserves the order of the features, but is insensitive to their specific positions. • It can also discern more finely the number of times the feature is highly activated in p • The k-max pooling operator is applied in the network after the topmost convolutional layer. • At intermediate convolutional layers the pooling parameter k is not fixed, but is dynamically selected in order to allow for a smooth extraction of higher-order and longer-range features.
  • 27. Dynamic k-Max Pooling Here, l is the number of the current convolutional layer to which the pooling is applied L is the total number of convolutional layers in the network; ktop is the fixed pooling parameter for the topmost convolutional layer.
  • 28. Non-Linear Features • After (dynamic) k-max pooling is applied to the result of a convolution, a bias b and a nonlinear function g are applied component-wise to the pooled matrix. There is a single bias value for each row of the pooled matrix d.
  • 29. So why does this model work ? • This model is sensitive to word order in input • It can discriminate if any specific n-gram occurs in input • To some extent it can tell relative position of most relevant n-gram
  • 30. The left diagram emphasizes the pooled nodes. The width of the convolutional filters is 3 and 2 respectively. With dynamic pooling, a filter with small width at the higher layers can relate phrases far apart in the input sentence. What makes the feature graph of a DCNN peculiar is the global range of the pooling operations. The (dynamic) k-max pooling operator can draw together features that correspond to words that are many positions apart in the sentence
  • 31. One variation to current approach • Yoon Kim(2014)
  • 32. Experiments (Sentiment Prediction in Movie Reviews) Training : The top layer of the network has a fully connected layer followed by a softmax non-linearity that predicts the probability distribution over classes given the input sentence. The network is trained to minimise the cross-entropy of the predicted and true distributions; the objective includes an L 2 regularisation In the binary case, we use the given splits of 6920 training, 872 dev and 1821 test sentences. In the fine-grained case, we use the standard 8544/1101/2210 splits. The size of the vocabulary is 15448.
  • 33. Question Type Classification A question may be classified as belonging to one of many question types. The TREC questions dataset involves six different question types, e.g. whether the question is about a location, about a person or about some numeric information (Li and Roth, 2002). The training dataset consists of 5452 labelled questions whereas the test dataset consists of 500 questions.
  • 34. Twitter Sentiment Prediction with Distant Supervision Train the models on a large dataset of tweets, where a tweet is automatically labelled as positive or negative depending on the emoticon that occurs in it. The training set consists of 1.6 million tweets with emoticon-based labels and the test set of about 400 hand-annotated tweets. This results in a vocabulary of 76643 word types. The architecture of the DCNN