Instructor: Mat Leonard
Outline
1. Text Processing
Using Python + NLTK
Cleaning
Normalization
Tokenization
Part-of-speech Tagging
Stemming and Lemmatization
2. Feature Extraction
Bag of Words
TF-IDF
Word Embeddings
Word2Vec
GloVe
3. Topic Modeling
Latent Variables
Beta and Dirichlet Distributions
Laten Dirichlet Allocation
4. NLP with Deep Learning
Neural Networks
Recurrent Neural Networks (RNNs)
Word Embeddings
Sentiment Analysis with RNNs
19. Challenges: Context
The old Welshman came home toward daylight, spattered
with candle-grease, smeared with clay, and almost worn
out. He found Huck still in the bed that had been provided
for him, and delirious with fever. The physicians were all at
the cave, so the Widow Douglas came and took charge of
the patient.
—The Adventures of Tom Sawyer, Mark Twain
20. Process
“Mary went back home. …”
Transform
Analyze
Predict
Present
{<“mary”, “go”, “home”>, … }
{<0.4, 0.8, 0.3, 0.1, 0.7>, …}
34. Bag of Words
“Little House on the Prairie” {“littl”, “hous”, “prairi”}
“Mary had a Little Lamb” {“mari”, “littl”, “lamb”}
“The Silence of the Lambs” {“silenc”, “lamb”}
“Twinkle Twinkle Little Star” {“twinkl”, “littl”, “star”} ?
35. Bag of Words
littl hous prairi mari
lamb silenc twinkl star
“Little House on the Prairie”
“Mary had a Little Lamb”
“The Silence of the Lambs”
“Twinkle Twinkle Little Star”
corpus (D) vocabulary (V)
36. Bag of Words
littl hous prairi mari lamb silenc twinkl star
“Little House on the Prairie”
“Mary had a Little Lamb”
“The Silence of the Lambs”
“Twinkle Twinkle Little Star”
37. 1 1 1 0 0 0 0 0
1 0 0 1 1 0 0 0
0 0 0 0 1 1 0 0
1 0 0 0 0 0 2 1
Bag of Words
littl hous prairi mari lamb silenc twinkl star
“Little House on the Prairie”
“Mary had a Little Lamb”
“The Silence of the Lambs”
“Twinkle Twinkle Little Star”
Document-Term Matrix term frequency
38. Document Similarity
1 1 1 0 0 0 0 0
1 0 0 1 1 0 0 0
“Little House on the Prairie”
“Mary had a Little Lamb”
littl hous prairi mari lamb silenc twinkl star
a
b
a·b = ∑ a0b0 + a1b1 + … + anbn = 1 + 0 + 0 + 0 + 0 + 0 + 0 + 01 dot product
39. Document Similarity
1 1 1 0 0 0 0 0
1 0 0 1 1 0 0 0
“Little House on the Prairie”
“Mary had a Little Lamb”
littl hous prairi mari lamb silenc twinkl star
a
b
a·b = ∑ a0b0 + a1b1 + … + anbn = 1 dot product
cos(θ) =
a·b
‖a‖·‖b‖ √3×√3
=
1
=
1
3
cosine similarity
40. 1 1 1 0 0 0 0 0
1 0 0 1 1 0 0 0
0 0 0 0 1 1 0 0
1 0 0 0 0 0 2 1
3 1 1 1 2 1 1 1
Term Specificity
littl hous prairi mari lamb silenc twinkl star
“Little House on the Prairie”
“Mary had a Little Lamb”
“The Silence of the Lambs”
“Twinkle Twinkle Little Star”
document frequency
/3
/3
/3
/3
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/2
/2
/2
/2
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
41. Term Specificity
1/3 1 1 0 0 0 0 0
1/3 0 0 1 1/2 0 0 0
0 0 0 0 1/2 1 0 0
1/3 0 0 0 0 0 2 1
littl hous prairi mari lamb silenc twinkl star
“Little House on the Prairie”
“Mary had a Little Lamb”
“The Silence of the Lambs”
“Twinkle Twinkle Little Star”
42. TF-IDF
tfidf(t, d, D) = tf(t, d) · idf(t, D)
term frequency
inverse document frequency
count(t, d)/|d|
log(|D|/|{d∈D : t∈d}|)
62. LDA: Parameter Estimation
Choose |Z| = k
Initialize distribution parameters
Sample <document, words>
Update distribution parameters
expectation
maximization
63. LDA: Use Cases
• Topic modeling, document categorization.
• Mixture of topics in a new document: P(z | w, α, β)
• Generate collections of words with desired mixture.
64. LDA: Further Reading
David Blei, Andrew Ng, Michael Jordan, 2003. Latent Dirichlet Allocation,
In Journal of Machine Learning Research, vol. 3, pp. 993-102.
Thomas Boggs, 2014. Visualizing Dirichlet Distributions with matplotlib.
115. One-hot encoding
What a great movie!
aardvark
zygote
a
great
m
ovie
rhinoceros
cookie
figurine
w
hat
1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
a
great
m
ovie
w
hat
117. 2. Examining the data
What a great movie!
aardvark
zygote
a
great
m
ovie
rhinoceros
cookie
figurine
w
hat
1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25
a
great
m
ovie
w
hat
[1, 7, 15, 23]
118. 3. One-hot encoding the input
What a great movie!
aardvark
zygote
a
great
m
ovie
rhinoceros
cookie
figurine
w
hat
1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25
a
great
m
ovie
w
hat
[1, 7, 15, 23]
1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
127. One-hot encoding
“I expected this movie to be much better”
“This movie is much better than I expected”
I
expected
m
ovie
to
1 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1 0
be
better
m
uch
is
this
I
expected
m
ovie
than
1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0
be
better
m
uch
this
143. Word2Vec: Recap
• Robust, distributed representation.
• Vector size independent of vocabulary.
• Train once, store in lookup table.
• Deep learning ready!
144. Word2Vec: Further Reading
Tomas Mikolov, et al., 2013. Distributed Representation of Words and
Phrases and their Compositionality, In Advances of Neural Information
Processing Systems (NIPS), pp. 3111-3119.
Adrian Colyer, 2016. The amazing power of word vectors.