This is presentation about what skip-gram and CBOW is in seminar of Natural Language Processing Labs.
- how to make vector of words using skip-gram & CBOW.
4. Skip-gram&CBOW F=Wx Skip-gram CBOW
· F = Wx
- x : one-hot vector of Vocabularies.
- W : vector of each word that we want.
1 2 3 4 5
x : 1 by 5
W : 5 by 5
1
2
3
4
5
x : 1 by 5
W : 5 by 7
1 2 3 4 5 6 7
Dimension of word2vec
1
2
3
4
5
6
7
Always the same
Always the same
Hidden layer
in Neural Network
6. Skip-gram&CBOW
· Let me explain the architecture of skip-gram.
F=Wx Skip-gram CBOW
1
2
3
4
5
6
7
Sotfmax Cross-entropy
(cost function)
Input vector :
One-hot coding
Hidden Layer
Output Layer
Different!
W’ : Word2Vec we want from skip-gram
Backpropagation to Minimize cost function(Cross-entropy in here)
Center word Window word
Input vector * W Hidden layer * W’
7. Skip-gram&CBOW F=Wx Skip-gram CBOW
· Let’s say, our vocabulary is {I, like, the, natural, language, processing} from a sentence, “I like the natural
language processing”. and the size of windows is 1.
- a pair consists of {center word, window word skipped}
I like the natural language processing
I like the natural language processing
I like the natural language processing
I like the natural language processing
I like the natural language processing
I like the natural language processing
{I, like}
{like, I}, {like, the}
{the, like}, {the, natural}
{natural, the}, {natural, language}
{language, natural}, {language, processing}
{processing, language}
A sample for an
example of skip-gram
8. Skip-gram&CBOW F=Wx Skip-gram CBOW
I like the natural language processing {like, I}, {like, the}
A sample for an example
of skip-gram
I like the natural language processing
One-hot vector of “I” 1 0 0 0 0 0
One-hot vector of “like” 0 1 0 0 0 0
One-hot vector of “the” 0 0 1 0 0 0
1
2
3
4
5
6
7
Sotfmax Cross-entropy
(cost function)
Input vector
Hidden Layer
Output Layer
W, W’ is different!
Backpropagation to Minimize cost function(Cross-entropy in here)
“like” word “I” word that neural net expects
Input vector * W Hidden layer * W’
the real
“I” word
Compare “I” word vector that
neural net expects to the real “I”
word vector
1
9. Skip-gram&CBOW F=Wx Skip-gram CBOW
I like the natural language processing {like, I}, {like, the}
A sample for an example
of skip-gram
I like the natural language processing
One-hot vector of “I” 1 0 0 0 0 0
One-hot vector of “like” 0 1 0 0 0 0
One-hot vector of “the” 0 0 1 0 0 0
1
2
3
4
5
6
7
Sotfmax Cross-entropy
(cost function)
Input vector
Hidden Layer
Output Layer
W, W’ is different!
Backpropagation to Minimize cost function(Cross-entropy in here)
“like” word “the” word that neural net expects
Input vector * W Hidden layer * W’
the real
“the” word
Compare “the” word vector that
neural net expects to the real
“the” word vector
2
11. Skip-gram&CBOW F=Wx Skip-gram CBOW
· Let me explain the architecture of Continuous Bag-of-Word.
1
2
3
4
5
6
7
Sotfmax Cross-entropy
(cost function)
Hidden Layer
Output Layer
Different!
W’ : Word2Vec we want from CBOW
Backpropagation to Minimize cost function(Cross-entropy in here)
Center word
Input vector * W Hidden layer * W’
Input Layer
Window word
*It is normal to use
Negative Sampling as
cost function
12. Skip-gram&CBOW F=Wx Skip-gram CBOW
· Let’s say, our vocabulary is {I, like, the, NLP, programming} from a sentence, “I like the NLP programming”.
and the size of windows is 1.
- a pair consists of {[window word], center word}
I like the NLP programming
I like the NLP programming
I like the NLP programming
I like the NLP programming
I like the NLP programming
{ [like], I }
{ [I, the], like }
{ [like, NLP], the }
{ [the, programming], natural }
{ [NLP], language }
A sample for an
example of CBOW
13. Skip-gram&CBOW F=Wx Skip-gram CBOW
1
2
3
4
5
6
7
Sotfmax
Cross-entropy
(cost function)
Hidden Layer
Output Layer
Different!
W’ : Word2Vec we want from CBOW
Backpropagation to Minimize cost function(Cross-entropy in here)
Input vector * W Hidden layer * W’
Input Layer
“I” word & “the”
word
“like” word that neural net expects
I like the NLP programming { [I, the], like }
A sample for an
example of CBOW
the real
“like” word
Compare expectation of neural
net to the real value