Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks

145 vues

Publié le

Instructor: Mat Leonard
Outline
1. Text Processing
Using Python + NLTK
Cleaning
Normalization
Tokenization
Part-of-speech Tagging
Stemming and Lemmatization
2. Feature Extraction
Bag of Words
TF-IDF
Word Embeddings
Word2Vec
GloVe
3. Topic Modeling
Latent Variables
Beta and Dirichlet Distributions
Laten Dirichlet Allocation
4. NLP with Deep Learning
Neural Networks
Recurrent Neural Networks (RNNs)
Word Embeddings
Sentiment Analysis with RNNs

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks

  1. 1. Enhancing NLP with Deep Neural Networks github.com/adarsh0806/ml_workshop_udacity
  2. 2. Outline Classic NLP • Text Processing • Feature Extraction • Topic Modeling • Lab: Topic modeling using LDA Deep NLP • Neural Networks • Recurrent Neural Networks • Word Embeddings • Lab: Sentiment Analysis using RNNs
  3. 3. Introduction to NLP
  4. 4. Introduction to NLP • Communication and Cognition • Structured Languages • Unstructured Text • Applications and Challenges
  5. 5. Communication and Cognition Language is… • a medium of communication • a vehicle for thinking and reasoning
  6. 6. Structured Languages • Natural language lacks precisely defined structure
  7. 7. Structured Languages • Mathematics: y = 2x + 5
  8. 8. Structured Languages • Formal Logic: Parent(x, y) ⋀ Parent(x, z) → Sibling(y, z)
  9. 9. Structured Languages • SQL: SELECT name, email
 FROM users
 WHERE name LIKE 'A%';
  10. 10. Grammar • Arithmetic (single digit): E → E+E | E−E | E×E | E÷E | (E) | D D → 0 | 1 | 2 | … | 9 E E E T E E 5 D + × T 7 D T 2 D
  11. 11. Grammar • English sentences (limited): S → NP VP NP → N | DET NP | ADJ NP VP → V | V NP … S NP VP DET NP V NP NMy dog N likes candy
  12. 12. – Stuart Little, E.B. White “Because he was so small, Stuart was often hard to find around the house.” noun verb
  13. 13. Unstructured Text the quick brown fox jumps over the lazy dog
  14. 14. Unstructured Text the quick brown fox jumps over the lazy dog
  15. 15. Unstructured Text the quick brown fox jumps over the lazy dog + -
  16. 16. Applications 📃 📑 😀 😞 😡 health politics science what time is it? ¿que hora es?¿ ?
  17. 17. Challenges: Representation My dog likes candy. 0.2 0.8 0.1 0.2 0.3 0.4 0.1 0.7 {“my”, “dog”, “likes”, “candy”} I dog likes candy own
  18. 18. Challenges: Temporal Sequence of milk water petrol I want to buy a gallon
  19. 19. Challenges: Context The old Welshman came home toward daylight, spattered with candle-grease, smeared with clay, and almost worn out. He found Huck still in the bed that had been provided for him, and delirious with fever. The physicians were all at the cave, so the Widow Douglas came and took charge of the patient. —The Adventures of Tom Sawyer, Mark Twain
  20. 20. Process “Mary went back home. …” Transform Analyze Predict Present {<“mary”, “go”, “home”>, … } {<0.4, 0.8, 0.3, 0.1, 0.7>, …}
  21. 21. Classic NLP: Text Processing
  22. 22. Text Processing • Tokenization • Stop Word Removal • Stemming and Lemmatization
  23. 23. Tokenization “Jack and Jill went up the hill” <“jack”, “and”, “jill”,
 “went”, “up”, “the”, “hill”>
  24. 24. Tokenization “No, she didn’t do it.” <“no,”, “she”, “didn”,
 “’”, “t”, “do” “it”, “.”> <“no”, “she”, “didnt”,
 “do”, “it”> ?
  25. 25. Tokenization Big money behind big special effects tends to suggest a big story. Nope, not here. Instead this huge edifice is like one of those over huge luxury condos that're empty in every American town, pretending as if there's a local economy huge enough to support such. —Rotten Tomatoes ? <"big", "money", "behind", "big", "special", "effects", "tends", "to", "suggest", "big", "story", "nope", "not", "here", "instead", "this", "huge", "edifice", "is", "like", "one", "of", "those", "over", "huge", "luxury", "condos", "that", "re", "empty", "in", "every", "american", "town", "pretending", "as", "if", "there", "local", "economy", "huge", "enough", "to", "support", "such"> <"big", "money", "behind", "big", "special", "effects", "tends", "to", "suggest", "big", "story"> <"nope", "not", "here"> <"instead", "this", "huge", "edifice", "is", "like", "one", "of", "those", "over", "huge", "luxury", "condos", "that", "re", "empty", "in", "every", "american", "town", "pretending", "as", "if", "there", "local", "economy", "huge", "enough", "to", "support", "such">
  26. 26. Stop Word Removal The wristwatch was invented in 1904 by Louis Cartier.
  27. 27. Stemming branching branched branches branch
  28. 28. Stemming caching cached caches cach 📄
  29. 29. Lemmatization is was were be
  30. 30. Text Processing Summary Tokenize “Jenna went back to University.” Clean Normalize <“jenna”, “go”, “univers”> lowercase, split remove stop words lemmatize, stem <“jenna”, “went”, “back”, “to”, “university”> <“jenna”, “went”, “university”>
  31. 31. Classic NLP: Feature Extraction
  32. 32. Feature Extraction • Bag of Words Representation • Document-Term Matrix • Term Frequency-Inverse Document Frequency (TF-IDF)
  33. 33. Bag of Words 📄 foo bar baz zip jam
  34. 34. Bag of Words “Little House on the Prairie” {“littl”, “hous”, “prairi”} “Mary had a Little Lamb” {“mari”, “littl”, “lamb”} “The Silence of the Lambs” {“silenc”, “lamb”} “Twinkle Twinkle Little Star” {“twinkl”, “littl”, “star”} ?
  35. 35. Bag of Words littl hous prairi mari lamb silenc twinkl star “Little House on the Prairie” “Mary had a Little Lamb” “The Silence of the Lambs” “Twinkle Twinkle Little Star” corpus (D) vocabulary (V)
  36. 36. Bag of Words littl hous prairi mari lamb silenc twinkl star “Little House on the Prairie” “Mary had a Little Lamb” “The Silence of the Lambs” “Twinkle Twinkle Little Star”
  37. 37. 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 2 1 Bag of Words littl hous prairi mari lamb silenc twinkl star “Little House on the Prairie” “Mary had a Little Lamb” “The Silence of the Lambs” “Twinkle Twinkle Little Star” Document-Term Matrix term frequency
  38. 38. Document Similarity 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 “Little House on the Prairie” “Mary had a Little Lamb” littl hous prairi mari lamb silenc twinkl star a b a·b = ∑ a0b0 + a1b1 + … + anbn = 1 + 0 + 0 + 0 + 0 + 0 + 0 + 01 dot product
  39. 39. Document Similarity 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 “Little House on the Prairie” “Mary had a Little Lamb” littl hous prairi mari lamb silenc twinkl star a b a·b = ∑ a0b0 + a1b1 + … + anbn = 1 dot product cos(θ) = a·b ‖a‖·‖b‖ √3×√3 = 1 = 1 3 cosine similarity
  40. 40. 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 2 1 3 1 1 1 2 1 1 1 Term Specificity littl hous prairi mari lamb silenc twinkl star “Little House on the Prairie” “Mary had a Little Lamb” “The Silence of the Lambs” “Twinkle Twinkle Little Star” document frequency /3 /3 /3 /3 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /2 /2 /2 /2 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1
  41. 41. Term Specificity 1/3 1 1 0 0 0 0 0 1/3 0 0 1 1/2 0 0 0 0 0 0 0 1/2 1 0 0 1/3 0 0 0 0 0 2 1 littl hous prairi mari lamb silenc twinkl star “Little House on the Prairie” “Mary had a Little Lamb” “The Silence of the Lambs” “Twinkle Twinkle Little Star”
  42. 42. TF-IDF tfidf(t, d, D) = tf(t, d) · idf(t, D) term frequency inverse document frequency count(t, d)/|d| log(|D|/|{d∈D : t∈d}|)
  43. 43. Example Task: Spam Detection
  44. 44. Example Task: Spam Detection 0.2 0.1 0.6 0.0 0.8 0.4 0.6 0.2 TF-IDF Classifier
  45. 45. Classic NLP: Topic Modeling
  46. 46. Topic Modeling • Latent Variables • Latent Dirichlet Allocation • Lab: Topic Modeling using LDA
  47. 47. Bag of Words: Graphical Model 📄📄 📄 📄 📄d t P(t|d) 500 docs 1000 words # parameters ? climate tax rule cure votespace play
  48. 48. Latent Variables: Topics 📄📄 📄 📄 📄 climate tax rule cure votespace play d t P(t|d) =∑z P(t|z) P(z|d) z P(z|d) P(t|z) 500 docs 1000 words 10 topicsscience politics sports
  49. 49. science politics sports Mixture Model: Document → Topic 📄 P(z|d) d z 0.1 0.8 0.1 ∑z P(z|d) = 1
  50. 50. science politics sports Mixture Model: Document → Topic 📄 P(z|d) d z 0.1 0.8 0.1 ∑z P(z|d) = 1 θ = multinomial distribution
  51. 51. Binomial H T 0.5 0.5 0.7 0.3 0.4 0.6
  52. 52. Binomial Multinomial H T 0.5 0.5 0.7 0.3 0.4 0.6 science politics 0.9 0.05 0.1 0.8 0.2 0.1 sports 0.05 0.1 0.7
  53. 53. Probability Simplex Multinomial science politics 0.9 0.05 0.1 0.8 0.2 0.1 sports 0.05 0.1 0.7 science politics sports 1 0 1 1
  54. 54. Probability Simplex Multinomial science politics 0.9 0.05 0.1 0.8 0.2 0.1 sports 0.05 0.1 0.7 sciencepolitics sports
  55. 55. Dirichlet Distributions α = <a, b, c> concentration parameters
  56. 56. LDA: Sample a Document θα science politics sports 0.1 0.8 0.1
  57. 57. LDA: Sample a Topic θ z politics science politics sports 0.1 0.8 0.1
  58. 58. science politics sports Mixture Model: Topic → Word 📄d t z P(z|d) P(t|z) 0.1 0.8 0.1θ = climate tax rule cure votespace play
  59. 59. science politics sports Mixture Model: Topic → Word 📄d t z P(z|d) P(t|z) 0.1 0.8 0.1θ = 𝝋 = space climate tax rule … 0.05 0.31 0.12 0.27 … climate tax rule cure votespace play α β
  60. 60. LDA: Sample a Word z w politics rule space climate tax rule … 0.05 0.31 0.12 0.27 … β 𝝋
  61. 61. LDA: Plate Model θ z wα β N M document corpus 𝝋 Ktopic
  62. 62. LDA: Parameter Estimation Choose |Z| = k Initialize distribution parameters Sample <document, words> Update distribution parameters expectation maximization
  63. 63. LDA: Use Cases • Topic modeling, document categorization. • Mixture of topics in a new document: P(z | w, α, β) • Generate collections of words with desired mixture.
  64. 64. LDA: Further Reading David Blei, Andrew Ng, Michael Jordan, 2003. Latent Dirichlet Allocation, In Journal of Machine Learning Research, vol. 3, pp. 993-102. Thomas Boggs, 2014. Visualizing Dirichlet Distributions with matplotlib.
  65. 65. Lab: Topic Modeling using LDA Categorize Newsgroups Data
  66. 66. Deep NLP: Neural Networks
  67. 67. Playing Go
  68. 68. Playing Jeopardy
  69. 69. Self Driving Car
  70. 70. Many other applications
  71. 71. Neural Networks
  72. 72. Neural Networks
  73. 73. Neural Networks
  74. 74. Neural Networks
  75. 75. Goal: Split Data
  76. 76. Admissions Office Grades 1 2 3 4 5 6 7 8 9 10 Exam 1 2 3 4 5 6 7 8 9 10 GradesExam No Student 1 Exam: 9/10 Grades: 8/10 Student 2 Exam: 3/10 Grades: 4/10 Student 3 Exam: 7/10 Grades: 6/10 Yes Quiz: Does the student get accepted? Logistic Regression Score = Exam + Grades Score > 10 Score = Exam + 2*Grades Score > 18 Score = Exam - Grades Score > 5
  77. 77. GradesExam Rank Admissions Office
  78. 78. Admissions Office Grades 1 2 3 4 5 6 7 8 9 10 Exam 1 2 3 4 5 6 7 8 9 10 Question: How do we find this line?
  79. 79. Goal: Split Data I'm good I'm good I'm goodI'm good ??
  80. 80. Goal: Split Data Come closer Farther Closer Quiz Where would the misclassified point want the line to move? I'm good
  81. 81. Come closer I'm good Come closer I'm good Perceptron Algorithm I'm good I'm good I'm goodI'm good
  82. 82. Grades 0 1 2 3 4 5 6 7 8 9 10 Exam 0 1 2 3 4 5 6 7 8 9 10 Admissions Office Student 4 Exam: 10 Grades: 0
  83. 83. Grades 0 1 2 3 4 5 6 7 8 9 10 Exam 0 1 2 3 4 5 6 7 8 9 10 Admissions Office
  84. 84. Grades 0 1 2 3 4 5 6 7 8 9 10 Exam 0 1 2 3 4 5 6 7 8 9 10 Admissions Office
  85. 85. Admissions Office Judge 1 Judge 2 Score = 1*Exam + 8*Grades Score = 7*Exam + 2*Grades
  86. 86. Admissions Office President If both judges say 'admit', then admit
  87. 87. Normalizing the Score Accept Reject
  88. 88. Normalizing the Score 0.6 0.4 0.5 0.7 0.3 0.2 0.8
  89. 89. Admissions Office President Score = 2*(Judge 1 Score) + 3*(Judge 2 Score)
  90. 90. Judge 1 Judge 2 President
  91. 91. Neural Network 2 3 Judge 1 Judge 2 President Score = 2*(Judge 1 score) + 3*(Judge 2 score) Score = 1*Exam + 8*Grades Score = 7*Exam + 2*Grades Grades Exam 1 8 7 2
  92. 92. Neural Network Judge 1 Judge 2 President Grades Exam
  93. 93. Neural Network x2 x1 Hidden layerInput layer Output layer
  94. 94. x2 x1 Hidden layerInput layer Output layer
  95. 95. x3 x1 x2 Hidden layerInput layer Output layer
  96. 96. x2 x1 Hidden layerInput layer Output layer Dog Cat Bird
  97. 97. Deep Neural Network x2 x1 Hidden layerInput layer Output layerHidden layer
  98. 98. Neural Network
  99. 99. playground.tensorflow.org/
  100. 100. Self Driving Car
  101. 101. Self Driving Car
  102. 102. Playing Go
  103. 103. Exam: 8 Grades: 7 Magic Hat
  104. 104. Temperature Symptoms Etc. Medical Applications Sick/Healthy
  105. 105. Stock Market GOOG: 7.32 AAPL: 3.14 MSFT: 1.32 Buy GOOG
  106. 106. Computer Vision Dog
  107. 107. YouTubeAge Location Watched: Despacito Watched: Pitbull video Probability of watching Ricky Martin video: 80%
  108. 108. Spam Detection Hello, it’s grandma! Not spam
  109. 109. Spam Detection E@rn c@$h quickly! Spam
  110. 110. Lab: Sentiment Analysis Classify Movie Reviews Notebook: IMDB_Keras.ipynb
  111. 111. Sentiment Analysis Lab
  112. 112. Sentiment Analysis What a great movie! That was terrible.
  113. 113. One-hot encoding What a great movie! aardvark zygote a great m ovie rhinoceros cookie figurine w hat 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 a great m ovie w hat
  114. 114. 1. Loading the data
  115. 115. 2. Examining the data What a great movie! aardvark zygote a great m ovie rhinoceros cookie figurine w hat 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 a great m ovie w hat [1, 7, 15, 23]
  116. 116. 3. One-hot encoding the input What a great movie! aardvark zygote a great m ovie rhinoceros cookie figurine w hat 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 a great m ovie w hat [1, 7, 15, 23] 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
  117. 117. 3. One-hot encoding the output [1,0] [0,1]
  118. 118. 4. Building the Model Architecture Build the Model
  119. 119. 4. Building the Model Architecture Compile Model
  120. 120. 4. Solution
  121. 121. 1. Compile Model 2. Fit Model 5. Training the Model
  122. 122. Building the Neural Network Running the Neural Network
  123. 123. 5. Training the Model
  124. 124. Deep NLP: Recurrent Neural Networks
  125. 125. One-hot encoding “I expected this movie to be much better” “This movie is much better than I expected” I expected m ovie to 1 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1 0 be better m uch is this I expected m ovie than 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 be better m uch this
  126. 126. Recurrent Neural Networks John saw Mary. Mary saw John.
  127. 127. Neural Network NN John NN saw NN Mary
  128. 128. Recurrent Neural Network NN NN sawJohn NN Mary John saw Mary
  129. 129. Recurrent Neural Network NN NN sawMary NN John Mary saw John
  130. 130. new memory Recurrent Neural Network new word previous memory RNN
  131. 131. Recurrent Neural Network previous memory new memory new word RNN
  132. 132. Recurrent Neural Network how howhow HiHiHi how how how how
  133. 133. Recurrent Neural Network RNN (phrase) NN (sentiment) Review
  134. 134. Deep NLP: Word Embeddings
  135. 135. Word Embeddings • Document vs. Word Representations • Word2Vec • GloVe • Embeddings in Deep Learning • Visualizing Word Vectors: tSNE
  136. 136. Document vs. Word Representations 📄 <0.2, 0.7, 0.1, 0.3, 0.4> + - 📄📄 <0.8, 0.3, 0.2, 0.7, 0.5>w
  137. 137. Word Embeddings ๏kid ๏child ๏horse ๏chair
  138. 138. Word2Vec
  139. 139. Word2Vec the quick brown fox jumps over the lazy dog Continuous Bag of Words (CBoW) Continuous Skip-gram
  140. 140. Skip-gram Model jumps 0 1 0 0 over word vector ⋮ ⋮ ⋮ ⋮ fox brown the wt 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 wt-2 wt-1 wt+1 wt+2
  141. 141. Word2Vec: Recap • Robust, distributed representation. • Vector size independent of vocabulary. • Train once, store in lookup table. • Deep learning ready!
  142. 142. Word2Vec: Further Reading Tomas Mikolov, et al., 2013. Distributed Representation of Words and Phrases and their Compositionality, In Advances of Neural Information Processing Systems (NIPS), pp. 3111-3119. Adrian Colyer, 2016. The amazing power of word vectors.
  143. 143. GloVe Global Vectors for Word Representation
  144. 144. P(j|i) j i Context? a cup of coffee
  145. 145. P(j|i) Context Target j i wi wj · =
  146. 146. P(j|i) ×=
  147. 147. Co-occurrence Probabilities P(solid|ice)ice P(solid|steam) P(water|ice) P(water|steam)steam solid water P(solid|ice) P(solid|steam) ≫1 P(water|ice) P(water|steam) ≈1
  148. 148. Embeddings in Deep Learning
  149. 149. woman - man + king = queen ๏man ๏king ๏queen ๏woman
  150. 150. Distributional Hypothesis “Would you like a cup of ______?” “I like my ______ black.” “I need my morning ______ before I can do anything!”
  151. 151. “Would you like a cup of ______?” “I like my ______ black.” “I need my morning ______ before I can do anything!”
  152. 152. tea coffee
  153. 153. tea coffee “________ grounds are great for composting!” “I prefer loose leaf ______.” Coffee tea
  154. 154. tea coffee
  155. 155. tea coffee
  156. 156. Word Embedding
 one-hot encoded word (size ≈ 105 ) word vector
 (size ≈ 102 ) one-hot encoded output (size ≈ 105 ) (Lookup) Dense
 Layer(s) Recurrent
 Layer(s) Learn
  157. 157. Word Embedding
 one-hot encoded word (size ≈ 105 ) word vector
 (size ≈ 102 ) one-hot encoded output (size ≈ 105 ) (Lookup) Dense
 Layer(s) Recurrent
 Layer(s) input image
 (size ≈ 105 ) Convolutional
 Layer(s) output label (size ≈ 102 ) Dense
 Layer(s) 128 128 Learn (Pre- trained) Learn
  158. 158. Visualizing Word Vectors: t-SNE t-Distributed Stochastic Neighbor Embedding
  159. 159. t-SNE t-Distributed Stochastic Neighbor Embedding mango avocado car pencil
  160. 160. t-SNE t-Distributed Stochastic Neighbor Embedding mango avocado car pencil
  161. 161. Lab: Sentiment Analysis using RNNs Classify Movie Reviews Notebook: IMDB_Keras_RNN.ipynb
  162. 162. Workshop Summary Classic NLP • Text Processing: Stop word removal, stemming, lemmatization • Feature Extraction: Bag-of-Words, TF-IDF • Topic Modeling: Latent Dirichlet Allocation • Lab: Topic modeling using LDA Deep NLP • Neural Networks • Recurrent Neural Networks • Word Embeddings: Word2Vec, GloVe, tSNE • Lab: Sentiment Analysis using RNNs
  163. 163. Additional Resources • Recurrent Neural Networks (RNNs) • Long Short-Term Memory Networks (LSTMs) • Visual Question-Answering
  164. 164. Arpan Chakraborty Adarsh Nair Luis Serrano udacity.com/school-of-ai

×