Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Efficient Methods for Incorporating
Knowledge into Topic Models
[Yang, Downey and Boyd-Graber 2015]
2015/10/24
EMNLP 2015 ...
Large-scale Topic Model
• In academic papers
– Up to 10^3 topics
• Industrial applications
– 10^5~10^6 topics!
– Search en...
(Standard) LDA
[Blei+ 2003, Griffiths+ 2004]
• "Conventional" Gibbs sampling
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛 𝑑,𝑡 + 𝛼
𝑛 𝑤,𝑡 + 𝛽
𝑛 𝑡 ...
SparseLDA [Yao+ 2009]
𝑡
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝
𝑡
𝛼𝛽
𝑛 𝑡 + 𝑉𝛽
+
𝑡
𝑛 𝑑,𝑡 𝛽
𝑛 𝑡 + 𝑉𝛽
+
𝑡
𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡
𝑛 𝑡 + 𝑉𝛽
• 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡...
Leveraging Prior Knowledge
• The objective function of topic models
does not correlate with human
judgements
Word correlation prior
knowledge
• Must-link
– “quarterback” and “fumble” are both
related to American football
• Cannot-l...
SC-LDA [Yang+ 2015]
• 𝑚 ∈ 𝑀 : Prior knowledge
• 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior
knowledge 𝑚 about word 𝑤 with to...
Inference for SC-LDA
𝑉
Word correlation prior
knowledge for SC-LDA
• 𝑓𝑚 𝑧, 𝑤, 𝑑 =
𝑢∈𝑀 𝑤
𝑚
log max 𝜆, 𝑛 𝑢,𝑧 +
𝑣∈𝑀 𝑤
𝑐
log
1
max 𝜆, 𝑛 𝑣,𝑧
– where 𝑀...
Factor Graph
• They tell that prior knowledge is incorporated
“by adding a factor graph to encode prior
knowledge,” but it...
[Ramage+ 2009] Labeled LDA
• Supervized LDA for labeled documents
– It is equivalent to SC-LDA with the
following potentia...
Experiments
• Baselines
– Dirichlet Forest-LDA [Andrzejewski+ 2009]
– Logic-LDA [Andrzejewski+ 2011]
– MRF-LDA [Xie+ 2015]...
Generate Word Correlation
• Must-link
– Obtain synsets from WordNet 3.0
– Similarity between the word and its
synsets on w...
Convergence Speed
The average running time per iteration
over 100 iterations, averaged over 5
seeds, on 20NG dataset.
Coherence [Mimno+ 2011]
• 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2
𝑀
𝑙=1
𝑚−1
log
𝐹 𝑣 𝑚
𝑡
,𝑣𝑙
𝑡
+𝜖
𝐹 𝑣𝑙
𝑡
– 𝐹 𝑣 : document frequency of word type 𝑣
–...
References
• [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models
• [Blei+ 2003] Latent Dirichlet ...
Prochain SlideShare
Chargement dans…5
×

[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models

12 339 vues

Publié le

EMNLP 2015 読み会での発表資料です。

Publié dans : Technologie
  • Soyez le premier à commenter

[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models

  1. 1. Efficient Methods for Incorporating Knowledge into Topic Models [Yang, Downey and Boyd-Graber 2015] 2015/10/24 EMNLP 2015 Reading @shuyo
  2. 2. Large-scale Topic Model • In academic papers – Up to 10^3 topics • Industrial applications – 10^5~10^6 topics! – Search engines, online ads. and so on – To capture infrequent topics • This paper handles up to 500 topics... really?
  3. 3. (Standard) LDA [Blei+ 2003, Griffiths+ 2004] • "Conventional" Gibbs sampling 𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡 + 𝛽 𝑛 𝑡 + 𝑉𝛽 – 𝑇 : Topic size – For 𝑈~𝒰 0, 𝑧 𝑇 𝑞 𝑧 , find 𝑡 s.t. 𝑧 𝑡−1 𝑞 𝑧 < 𝑈 < 𝑧 𝑡 𝑞 𝑧 • For large T, it is computationally intensive – 𝑛 𝑤,𝑡 is sparse – When T is very large, 𝑛 𝑑,𝑡 is too e.g. 𝑇 = 106 > 𝑛 𝑑
  4. 4. SparseLDA [Yao+ 2009] 𝑡 𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑡 𝛼𝛽 𝑛 𝑡 + 𝑉𝛽 + 𝑡 𝑛 𝑑,𝑡 𝛽 𝑛 𝑡 + 𝑉𝛽 + 𝑡 𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡 𝑛 𝑡 + 𝑉𝛽 • 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡 𝑟𝑡 , 𝑞 = 𝑡 𝑞𝑡 • For 𝑈~𝒰 0, 𝑠 + 𝑟 + 𝑞 , – If 0 < 𝑈 < 𝑠, find 𝑡 s.t. 𝑧 𝑡−1 𝑠 𝑧 < 𝑈 < 𝑧 𝑡 𝑠 𝑧 – If 𝑠 < 𝑈 < 𝑠 + 𝑟, find 𝑡 s.t.𝑛 𝑑,𝑡 > 0, 𝑧 𝑡−1 𝑟𝑧 < 𝑈 − 𝑠 < 𝑧 𝑡 𝑟𝑧 – If 𝑠 + 𝑟 < 𝑈 < 𝑠 + 𝑟 + 𝑞, find 𝑡 s.t.𝑛 𝑤,𝑡 > 0, 𝑧 𝑡−1 𝑞 𝑧 < 𝑈 − 𝑠 − 𝑟 < 𝑧 𝑡 𝑞 𝑧 • Faster because 𝑛 𝑤,𝑡 and 𝑛 𝑑,𝑡 are sparse 𝑠𝑡 𝑟𝑡 𝑞𝑡 independent on w, d dependent on d only
  5. 5. Leveraging Prior Knowledge • The objective function of topic models does not correlate with human judgements
  6. 6. Word correlation prior knowledge • Must-link – “quarterback” and “fumble” are both related to American football • Cannot-link – “fumble” and “bank” imply two different topics
  7. 7. SC-LDA [Yang+ 2015] • 𝑚 ∈ 𝑀 : Prior knowledge • 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior knowledge 𝑚 about word 𝑤 with topic 𝑧 in document 𝑑 • 𝜓 𝒛, 𝑀 = 𝑧∈𝒛 exp 𝑓𝑚 𝑧, 𝑤, 𝑑 • 𝑃 𝒘, 𝒛 𝛼, 𝛽, 𝑀 = 𝑃 𝒘 𝒛, 𝛽 𝑃 𝒛 𝛼 𝜓(𝒛, 𝑀) maybe ∝ maybe 𝑚 ∈ 𝑀, all 𝑤 with 𝑧 in all 𝑑 Sparse Constrained
  8. 8. Inference for SC-LDA 𝑉
  9. 9. Word correlation prior knowledge for SC-LDA • 𝑓𝑚 𝑧, 𝑤, 𝑑 = 𝑢∈𝑀 𝑤 𝑚 log max 𝜆, 𝑛 𝑢,𝑧 + 𝑣∈𝑀 𝑤 𝑐 log 1 max 𝜆, 𝑛 𝑣,𝑧 – where 𝑀 𝑤 𝑚 : Must-link of 𝑤, 𝑀 𝑤 𝑐 : Cannot-link of 𝑤 • 𝑃 𝑧 = 𝑡 𝒛−, 𝑤, 𝑀 ∝ 𝛼𝛽 𝑛 𝑡+𝑉𝛽 + 𝑛 𝑑,𝑡 𝛽 𝑛 𝑡+𝑉𝛽 + 𝑛 𝑑,𝑡+𝛼 𝑛 𝑤,𝑡 𝑛 𝑡+𝑉𝛽 𝑢∈𝑀 𝑤 𝑚 max 𝜆, 𝑛 𝑢,𝑧 𝑣∈𝑀 𝑤 𝑐 1 max 𝜆, 𝑛 𝑣,𝑧
  10. 10. Factor Graph • They tell that prior knowledge is incorporated “by adding a factor graph to encode prior knowledge,” but it does not be drawn. • The potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 contains 𝑛 𝑤,𝑧, and 𝜑 𝑤,𝑧 ∝ 𝑛 𝑤,𝑧 + 𝛽. • So the above model seems like Fig.b: Fig.a Fig.b
  11. 11. [Ramage+ 2009] Labeled LDA • Supervized LDA for labeled documents – It is equivalent to SC-LDA with the following potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 = 1, if 𝑧 ∈ 𝑚 𝑑 −∞, else where 𝑚 𝑑 specifies a label set of 𝑑
  12. 12. Experiments • Baselines – Dirichlet Forest-LDA [Andrzejewski+ 2009] – Logic-LDA [Andrzejewski+ 2011] – MRF-LDA [Xie+ 2015] • Encodes word correlations in LDA as MRF – SparseLDA DATASET DOCS TYPE TOKEN(APPROX) Experiments NIPS 1,500 12,419 1,900,000 Word correlation NYT-NEWS 3,000,000 102,660 100,000,000 20NG 18,828 21,514 1,946,000 Labeled docs
  13. 13. Generate Word Correlation • Must-link – Obtain synsets from WordNet 3.0 – Similarity between the word and its synsets on word embedding from word2vec is higher than threshold 0.2 • Cannot-link – Nothing?
  14. 14. Convergence Speed The average running time per iteration over 100 iterations, averaged over 5 seeds, on 20NG dataset.
  15. 15. Coherence [Mimno+ 2011] • 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2 𝑀 𝑙=1 𝑚−1 log 𝐹 𝑣 𝑚 𝑡 ,𝑣𝑙 𝑡 +𝜖 𝐹 𝑣𝑙 𝑡 – 𝐹 𝑣 : document frequency of word type 𝑣 – 𝐹 𝑣, 𝑣′ :co-document frequency of word type 𝑣, 𝑣′ It means “include”? 𝜖 is very small like 10−12 [Röder+ 2015] -39.1 -36.6
  16. 16. References • [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models • [Blei+ 2003] Latent Dirichlet allocation. • [Griffiths+ 2004] Finding scientific topics. • [Yao+ 2009] Efficient methods for topic model inference on streaming document collections. • [Ramage+ 2009] Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora. • [Andrzejewski+ 2009] Incorporating domain knowledge into topic modeling via Dirichlet forest priors. • [Andrzejewski+ 2011] A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. • [Xie+ 2015] Incorporating word correlation knowledge into topic modeling. • [Mimno+ 2011] Optimizing semantic coherence in topic models. • [Röder+ 2015] Exploring the space of topic coherence measures.

×