Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Introduction to
word embeddings
Pavel Kalaidin
@facultyofwonder
Moscow Data Fest, September, 12th, 2015
distributional hypothesis
лойс
годно, лойс
лойс за песню
из принципа не поставлю лойс
взаимные лойсы
лойс, если согласен
What is the meaning of лойс?
годно, лойс
лойс за песню
из принципа не поставлю лойс
взаимные лойсы
лойс, если согласен
What is the meaning of лойс?
кек
кек, что ли?
кек)))))))
ну ты кек
What is the meaning of кек?
кек, что ли?
кек)))))))
ну ты кек
What is the meaning of кек?
vectorial representations
of words
simple and flexible
platform for
understanding text and
probably not messing up
one-hot encoding?
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
co-occurrence matrix
recall: word-document co-occurrence
matrix for LSA
credits: [x]
from entire document to
window (length 5-10)
still seems suboptimal ->
big, sparse, etc.
lower dimensions, we
want dense vectors
(say, 25-1000)
How?
matrix factorization?
SVD of co-occurrence
matrix
lots of memory?
idea: directly learn low-
dimensional vectors
here comes word2vec
Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al: [paper]
idea: instead of capturing co-
occurrence counts
predict surrounding words
Two models:
C-BOW
predicting the word given its context
skip-gram
predicting the context given a word
Explained in great d...
CBOW: several times faster than skip-gram,
slightly better accuracy for the frequent words
Skip-Gram: works well with smal...
Examples?
Wwoman
- Wman
= Wqueen
-
Wking
classic example
<censored example>
word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling
Word-Embedding Method, Goldberg et al, 2014 [arxiv]
all done with gensim:
github.com/piskvorky/gensim/
...failing to take advantage of
the vast amount of repetition
in the data
so back to co-occurrences
GloVe for Global Vectors
Pennington et al, 2014: nlp.stanford.
edu/pubs/glove.pdf
Ratios seem to cancel noise
The gist: model ratios with
vectors
The model
Preserving
linearity
Preventing mixing
dimensions
Restoring
symmetry, part 1
recall:
Restoring symmetry, part 2
Least squares problem it is now
SGD->AdaGrad
ok, Python code
glove-python:
github.com/maciejkula/glove-python
two sets of vectors
input and context + bias
average/sum/drop
complexity |V|2
complexity |C|0.8
Evaluation: it works
#spb
#gatchina
#msk
#kyiv
#minsk
#helsinki
Compared to word2vec
#spb
#gatchina
#msk
#kyiv
#minsk
#helsinki
t-SNE:
github.com/oreillymedia/t-SNE-tutorial
seaborn:
stanford.edu/~mwaskom/software/seaborn/
Abusing models
music playlists:
github.com/mattdennewitz/playlist-to-vec
deep walk:
DeepWalk: Online Learning of Social
Representations [link]
user interests
Paragraph vectors: cs.stanford.
edu/~quocle/paragraph_vector.pdf
predicting hashtags
interesting read: #TAGSPACE: Semantic
Embeddings from Hashtags [link]
RusVectōrēs: distributional semantic
models for Russian: ling.go.mail.
ru/dsm/en/
corpus matters
building block for
bigger models
╰(*´︶`*)╯
</slides>
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Introduction to word embeddings with Python
Prochain SlideShare
Chargement dans…5
×

Introduction to word embeddings with Python

3 725 vues

Publié le

My talk at Moscow Data Fest, 2015

Publié dans : Données & analyses
  • Identifiez-vous pour voir les commentaires

Introduction to word embeddings with Python

  1. 1. Introduction to word embeddings Pavel Kalaidin @facultyofwonder Moscow Data Fest, September, 12th, 2015
  2. 2. distributional hypothesis
  3. 3. лойс
  4. 4. годно, лойс лойс за песню из принципа не поставлю лойс взаимные лойсы лойс, если согласен What is the meaning of лойс?
  5. 5. годно, лойс лойс за песню из принципа не поставлю лойс взаимные лойсы лойс, если согласен What is the meaning of лойс?
  6. 6. кек
  7. 7. кек, что ли? кек))))))) ну ты кек What is the meaning of кек?
  8. 8. кек, что ли? кек))))))) ну ты кек What is the meaning of кек?
  9. 9. vectorial representations of words
  10. 10. simple and flexible platform for understanding text and probably not messing up
  11. 11. one-hot encoding? 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  12. 12. co-occurrence matrix recall: word-document co-occurrence matrix for LSA
  13. 13. credits: [x]
  14. 14. from entire document to window (length 5-10)
  15. 15. still seems suboptimal -> big, sparse, etc.
  16. 16. lower dimensions, we want dense vectors (say, 25-1000)
  17. 17. How?
  18. 18. matrix factorization?
  19. 19. SVD of co-occurrence matrix
  20. 20. lots of memory?
  21. 21. idea: directly learn low- dimensional vectors
  22. 22. here comes word2vec Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al: [paper]
  23. 23. idea: instead of capturing co- occurrence counts predict surrounding words
  24. 24. Two models: C-BOW predicting the word given its context skip-gram predicting the context given a word Explained in great detail here, so we’ll skip it for now Also see: word2vec Parameter Learning Explained, Rong, paper
  25. 25. CBOW: several times faster than skip-gram, slightly better accuracy for the frequent words Skip-Gram: works well with small amount of data, represents well rare words or phrases
  26. 26. Examples?
  27. 27. Wwoman - Wman = Wqueen - Wking classic example
  28. 28. <censored example>
  29. 29. word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method, Goldberg et al, 2014 [arxiv]
  30. 30. all done with gensim: github.com/piskvorky/gensim/
  31. 31. ...failing to take advantage of the vast amount of repetition in the data
  32. 32. so back to co-occurrences
  33. 33. GloVe for Global Vectors Pennington et al, 2014: nlp.stanford. edu/pubs/glove.pdf
  34. 34. Ratios seem to cancel noise
  35. 35. The gist: model ratios with vectors
  36. 36. The model
  37. 37. Preserving linearity
  38. 38. Preventing mixing dimensions
  39. 39. Restoring symmetry, part 1
  40. 40. recall:
  41. 41. Restoring symmetry, part 2
  42. 42. Least squares problem it is now
  43. 43. SGD->AdaGrad
  44. 44. ok, Python code
  45. 45. glove-python: github.com/maciejkula/glove-python
  46. 46. two sets of vectors input and context + bias average/sum/drop
  47. 47. complexity |V|2
  48. 48. complexity |C|0.8
  49. 49. Evaluation: it works
  50. 50. #spb #gatchina #msk #kyiv #minsk #helsinki
  51. 51. Compared to word2vec
  52. 52. #spb #gatchina #msk #kyiv #minsk #helsinki
  53. 53. t-SNE: github.com/oreillymedia/t-SNE-tutorial seaborn: stanford.edu/~mwaskom/software/seaborn/
  54. 54. Abusing models
  55. 55. music playlists: github.com/mattdennewitz/playlist-to-vec
  56. 56. deep walk: DeepWalk: Online Learning of Social Representations [link]
  57. 57. user interests Paragraph vectors: cs.stanford. edu/~quocle/paragraph_vector.pdf
  58. 58. predicting hashtags interesting read: #TAGSPACE: Semantic Embeddings from Hashtags [link]
  59. 59. RusVectōrēs: distributional semantic models for Russian: ling.go.mail. ru/dsm/en/
  60. 60. corpus matters
  61. 61. building block for bigger models ╰(*´︶`*)╯
  62. 62. </slides>

×