This document presents a summary of a research paper that proposes a new topic modeling technique called Admixture of Poisson Markov Random Fields (APM). APM extends existing topic models by allowing word dependencies within topics. It models each document as a mixture of Poisson MRFs over words, where the MRFs capture word co-occurrence patterns. This addresses a limitation of previous models that consider words as independent within topics. The document provides background on topic modeling and related techniques like latent semantic analysis and latent Dirichlet allocation. It then describes the APM model and an optimization algorithm used to fit the large number of APM parameters in parallel. Evaluation shows APM learns more interpretable topics than LDA and better represents word dependencies.
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
1. Admixture of Poisson MRFs: A New Topic
Model with Word Dependencies
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon
April 30, 2015
* Presenter
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
2. Analyzing Large Collections of Documents
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Bag of Words Matrix:
- Removes order and
syntax information
- Unrealistic but powerful
2
Model
Computation
3
Summary
4
Collection of
Documents
1
)
Examples:
1. Research papers
2. News articles
3. Twitter posts
1
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
3. Analyzing Large Collections of Documents
Model
Computation
3
Summary
4
Collection of
Documents
1
)
Examples:
1. Research papers
2. News articles
3. Twitter posts
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Bag of Words Matrix:
- Removes order and
syntax information
- Unrealistic but powerful
2
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
4. Analyzing Large Collections of Documents
Summary
4
Collection of
Documents
1
)
Examples:
1. Research papers
2. News articles
3. Twitter posts
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Bag of Words Matrix:
- Removes order and
syntax information
- Unrealistic but powerful
2
Model
Computation
3
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
5. Analyzing Large Collections of Documents
Collection of
Documents
1
)
Examples:
1. Research papers
2. News articles
3. Twitter posts
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Bag of Words Matrix:
- Removes order and
syntax information
- Unrealistic but powerful
2
Model
Computation
3
Summary
4
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
6. Research Paper Example - Top Words
Top Words
(Frequency)
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Collection of
Documents
Digital
Representation
Model
Computation Summary
Examples:
1. Research papers
2. News articles
3. Twitter posts
Bag of Words Matrix:
- Removes order and
syntax information
- Unrealistic but powerful
1
Titles of Research Papers:
1. Machine Learning (ICML, NIPS)
2. Communication Networks (INFOCOM)
3. Programming Languages (PLDI, CAV, POPL, OOPSLA)
2
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
7. Research Paper Example - Topic Modeling
Topic
Modeling
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Collection of
Documents
Digital
Representation
Model
Computation Summary
Examples:
1. Research papers
2. News articles
3. Twitter posts
Bag of Words Matrix:
- Removes order and
syntax information
- Unrealistic but powerful
1
Titles of Research Papers:
1. Machine Learning (ICML, NIPS)
2. Communication Networks (INFOCOM)
3. Programming Languages (PLDI, CAV, POPL, OOPSLA)
2
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
8. Applications for Topic Modeling
Applications
1. Summarize/Visualize [Hall et al. 2008]
2. Word sense disambiguation [Boyd-Graber et al. 2007]
3. Multi-lingual understanding [Mimno et al. 2009]
4. Information retrieval [Wei & Croft 2006]
Different domains
1. Genetics [Pritchard et al. 2000 (14,000 citations)]
2. Computer vision [Li et al. 2010]
3. Social networks [Airoldi et al. 2008]
4. Social science surveys [Roberts et al. 2014]
5. Social E-commerce [Hu et al. 2014]
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
9. Brief History - Latent Semantic Analysis (LSA)
2
k p
n
k
k k
U VT
Σ
Singular Value
Decomposition
3
Low Dimensional
Document
Representation
“Latent Topic”
4
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
n
p
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
10. Brief History - Latent Semantic Analysis (LSA)
3
Low Dimensional
Document
Representation
“Latent Topic”
4
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Singular Value
Decomposition
n
p
2
k p
n
k
k k
U VT
Σ
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
11. Brief History - Latent Semantic Analysis (LSA)
“Latent Topic”
4
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Singular Value
Decomposition
n
p
2
k p
n
k
k k
U VT
Σ
3
Low Dimensional
Document
Representation
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
12. Brief History - Latent Semantic Analysis (LSA)
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
Singular Value
Decomposition
n
p
2
k p
n
k
k k
U VT
Σ
“Latent Topic”
4
3
Low Dimensional
Document
Representation
Positive and negative values difficult to interpret
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
13. Brief History - Probabilistic Topic Models
3
Topic Weights
per Document
“Topics”
(Word weights)
4
1
Doc 1
Doc 2
Doc 3
Doc 4
...
“networks”
“learning”
“program
m
ing”
…
Digital
Representation
n
p
2
k p
n
k
Probabilistic Topic
Models
Related through
probability model
Probability vectors are much easier to interpret
LDA - Added Bayesian priors for regularization
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
14. Comparison of 2D Projections
SVD dimensions are difficult
to interpret
APM has smooth
distribution compared to
LDA
SVD
LDA
APM
comm−net.6978
mach−learn.8925
prog−lang.6618
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
15. Brief History - Extensions/Variants
1. Add time information [Blei & Lafferty 2006]
2. Add author information [Rosen-Zvi et al. 2004]
3. Add document category information [Mcauliffe & Blei 2008]
4. Automatically discover number of topics [Teh et al. 2006]
5. Model correlation between topics [Blei & Lafferty 2006]
6. . . .
7. . . .
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
16. Brief History - Extensions/Variants
1. Add time information [Blei & Lafferty 2006]
2. Add author information [Rosen-Zvi et al. 2004]
3. Add document category information [Mcauliffe & Blei 2008]
4. Automatically discover number of topics [Teh et al. 2006]
5. Model correlation between topics [Blei & Lafferty 2006]
6. . . .
7. . . .
Previous models - topics only have weights for single words
Our model - topics have weights for pairs of words
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
17. Interpreting Topics
LDA 3 topics
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
18. Interpreting Topics
LDA 3 topics
LDA 6 topics
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
19. Interpreting Topics
LDA 3 topics
LDA 30 topics
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
21. Overview of APM
Admixture of Poisson MRFS
(APM)
Multinomial Admixture Poisson MRF
Gaussian
MRF
Mixture
LDA
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
22. Overview of APM
Admixture of Poisson MRFS
(APM)
Multinomial Admixture Poisson MRF
Gaussian
MRF
Mixture
LDA
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
23. Mixtures
Multiple sub-populations
The sub-populations are usually unknown a priori
Each individual from the population comes from exactly one
subpopulation
Figure source: Kalai, Moitra, and Valiant. Disentangling Gaussians. Communications
of the ACM. 2012.
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
24. Admixtures
Mixtures - Draws from single
component distribution. (Top)
Admixtures - Draws from a
distribution whose parameters are a
convex combination of component
parameters. (Bottom)
x2
x1
"Documents"
Mixture
Components
x2
x1
Dense
"Topic"
Sparse
"Document"
Dense
"Document”
Sparse
"Topic"
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
25. Overview of APM
Admixture of Poisson MRFS
(APM)
Multinomial Admixture Poisson MRF
Gaussian
MRF
Mixture
LDA
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
26. Gaussian MRFs
Allows for dependencies between variables
What if the data dimension is large?
If dimension is 1000, 10002
/2 =500,000 parameters
Assume some conditional independence between variables.
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
27. Independent PMRFIndependent PMRF
Count of Word 1
CountofWord2
0 2 4 6 8
8
6
4
2
0
1. Each conditional (”slice”)
of a PMRF is 1-D Poisson.
2. Distinct from Gaussian
MRF
3. Positive dependencies can
model word co-occurence.
Positive Dependency PMRFPMRF Positive Dependency
Count of Word 1
CountofWord2
0 2 4 6 8
8
6
4
2
0
Negative Dependency PMRFPMRF Negative Dependency
Count of Word 1
CountofWord2
0 2 4 6 8
8
6
4
2
0
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
28. Poisson MRFs [Yang et al., 2012]
P(A | B, C) P(B | A, C) P(C | A, B)
P(A, B, C) ??
If we assume the node conditional distributions are Poisson,
does there exist a joint MRF distribution
that has these conditionals?
Poisson MRF joint distribution:
Pr
PMRF
(x | θ, Θ) ∝ exp θT
x + xT
Θx −
p
s=1
ln(xs!) .
Node conditionals are 1-D Poissons:
Pr(xs | x−s, θs, Θs) ∝ exp{ (θs + xT
Θs
ηs
) xs − ln(xs!) }.
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
29. Overview of APM
Admixture of Poisson MRFS
(APM)
Multinomial Admixture Poisson MRF
Gaussian
MRF
Mixture
LDA
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
30. Admixture of Poisson MRFs (APM) [Inouye et al. 2014]
APM replaces standard Multinomial with Poisson MRF
Pr
APM
(x, w, θ1...k
, Θ1...k
)
= Pr
PMRF
x ¯θ =
k
j=1
wj θj
, ¯Θ =
k
j=1
wj Θj
Pr
Dir
(w)
k
j=1
Pr(θj
, Θj
)
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
31. APM Algorithm
1. Optimization problem is not convex
2. Want to exploit parallel computing
3. Large optimization problem: APM has O(kp2) parameters
versus O(kp) for LDA
LDA(k = 5, p = 1000) ⇒ 5,000 parameters
APM(k = 5, p = 1000) ⇒ 5,000,000 parameters
APM(k = 5, NNZ(Θ) = 10 per word) ⇒ 50,000 free
parameters
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
32. Parallel Alternating Newton-like Algorithm
1. Split the algorithm into alternating convex problems
arg min
Φ1,Φ2,··· ,Φp
−
1
n
p
s=1
tr(Ψs
Φs
) −
n
i=1
exp(zT
i Φs
wi ) +
p
s=1
λ vec(Φs
)1 1
arg min
w1,w2,··· ,wn∈∆k
−
1
n
n
i=1
ψT
i wi −
p
s=1
exp(zT
i Φs
wi )
where zi = [1 xT
i ]T
Ψs
= f (X, W)
φj
s = [θj
s (Θj
s )T
]T
ψi = f (X, Φ1...k
)
Φs
= [φ1
s φ2
s · · · φk
s ]
2. Subproblems in summation can be computed in parallel
3. Use fast Newton-like optimization method [Hsieh et al. 2014]
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
33. Timing Results on Wikipedia Dataset (k = 5, λ = 0.5)
1
3.1
3.4
0.6
2.2 2.2
0
1
2
3
4
n = 20,000
p = 5,000
# of Words = 50M
n = 100,000
p = 5,000
# of Words = 133M
n = 20,000
p = 10,000
# of Words = 57M
Time(hrs)
APM Training Time on Wikipedia Dataset
1st Iter. Avg. Next 3 Iter.
Algorithm scales approximately as O(np2)
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
34. Parallel Speedup
0
5
10
15
20
0 5 10 15 20
Speedup
# of MATLAB Workers
Parallel Speedup on BNC Dataset
Perfect Speedup
Actual Speedup
BNC dataset has n = 4049 and p = 1646
Speedup could be O(min(n, p)) on distributed system
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
35. Evaluating APM: No Direct Evaluation of Edge Parameters
Previous metrics evaluate the similarity of word pairs
[Newman et al. 2010, Mimno et al. 2011, Aletras and Court 2013]
Averaged statistic for all 10
2 pairs of top words computed
Attempted to correlate with human judgment
Unlike previous topic models, APM explicitly models
dependencies between words
How can we semantically evaluate the parameters for these
dependencies?
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
36. Evocation [Boyd-Graber et al. 2006]
Evocation denotes the idea of which words “evoke” or “bring
to mind” other words
Different types of evocation:
1. Rose - Flower (example)
2. Brave - Noble (kind)
3. Yell - Talk (manner)
4. Eggs - Bacon (co-occurence)
5. Snore - Sleep (setting)
6. Wet - Desert (antonymy)
7. Work - Lazy (exclusivity)
8. Banana - Kiwi (likeness)
Distinctive from word similarity or synonymy
Collected human scores for approximately 15% of word pairs
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
37. Evocation Metric Illustration
Word Pair H M
w1 ↔ w2
w1 ↔ w3
w1 ↔ w4
w2 ↔ w3
w2 ↔ w4
w3 ↔ w4
w2 ↔ w3
w2 ↔ w4
w3 ↔ w4
w1 ↔ w3
w1 ↔ w2
w1 ↔ w4
Word Pair H M
w2 ↔ w4
w3 ↔ w4
w1 ↔ w3
w1 ↔ w4
Word Pair H M
Rank by model weights M Sum top-m human scores H
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
38. Models for Comparison
APM: Admixture of Poisson MRFs
APM-LowReg: Very small regularization parameter
APM-HeldOut: Chooses λ from held-out documents
CTM: Correlated Topic Models
HDP: Hierarchical Dirichlet Process (Non-parametric)
LDA: Latent Dirichlet Allocation
RSM: Replicated Softmax (Undirected Topic Model)
RND: Random baseline
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs
39. Evocation Metric Results
k = 1 3 5 10 25 50 k = 1 3 5 10 25
Evoc-1 (Avg. Evoc. of Topics) Evoc-2 (Evoc. of Avg. Topic)
APM APM-LowReg APM-HeldOut CTM HDP LDA RSM RND
0
200
400
600
800
1000
1200
1400
1600
k = 1 3 5 10 25 50 k = 1 3 5
Evoc-1 (Avg. Evoc. of Topics) Evoc-2 (Ev
Evocation(m=50)
APM APM-LowReg APM-HeldOut CTM HDP LDA
0
200
400
600
800
1000
1200
1400
1600
k = 1 3 5 10 25 50 k = 1 3 5
Evoc-1 (Avg. Evoc. of Topics) Evoc-2 (E
Evocation(m=50)
APM APM-LowReg APM-HeldOut CTM HDP LDA
0
200
400
600
800
1000
1200
1400
1600
k = 1 3 5 10 25 50 k = 1 3 5
Evoc-1 (Avg. Evoc. of Topics) Evoc-2 (Ev
Evocation(m=50)
APM APM-LowReg APM-HeldOut CTM HDP LDA
5 10 25 50 k = 1 3 5 10 25 50
(Avg. Evoc. of Topics) Evoc-2 (Evoc. of Avg. Topic)
PM-LowReg APM-HeldOut CTM HDP LDA RSM RND
David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs