This document proposes a new approach called MGTM for detecting non-Gaussian geographical topics in tagged photo collections. MGTM improves upon existing topic models like LGTA by dynamically smoothing adjacent regions to allow the evolutionary creation and spread of topics during inference. The approach clusters documents into regions and allows exchange of topic information between adjacent clusters. An evaluation on several datasets found MGTM had better word perplexity and precision in detecting intruder topics than baseline models.
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections
1. Institute for Web Science & Technologies University of Koblenz ▪ Landau, Germany
Detecting Non-Gaussian Geographical Topics
in Tagged Photo Collections
Christoph Carl Kling, Jérôme Kunegis,
Sergej Sizov, Steffen Staab
4. Detecting Non-Gaussian Geographical Topics 4Christoph Carl Kling
Topics in topic modelling:
Latent variables that explain the co-occurrence of words
in documents.
5. Detecting Non-Gaussian Geographical Topics 5Christoph Carl Kling
Topics in topic modelling:
Latent variables that explain the co-occurrence of words
in documents.
Geographical topics:
Latent variables that explain the co-occurrence of words
both in documents and in the geographical space.
13. Detecting Non-Gaussian Geographical Topics 13Christoph Carl Kling
Cultural areas, country borders, geographical features and other
geographical observations exhibit complex spatial distributions
wikipedia.org
19. Detecting Non-Gaussian Geographical Topics 22Christoph Carl Kling
Cluster adjacency Dependencies of document-
specific topic distributions
Exchange of topic information between clusters
24. Detecting Non-Gaussian Geographical Topics 27Christoph Carl Kling
γ
M N
L
H
G
G
α0
G
Al
j
0
θjn
w
η s
d
l
δl
L: #regions
M: #documents in cluster
N: #words in document
G :⁰ Global topic distribution
G : Cluster-topic distribution
G : Document-topic distribution
s
d
MGTM
26. Detecting Non-Gaussian Geographical Topics 29Christoph Carl Kling
Datasets
Activities: 1.931 photos
Landscape: 5.791 photos
Manhattan: 28.922 photos
Car: 34.707 photos
Food: 151.747 photos
LGTA, Z. Yin et al., 2011
27. Detecting Non-Gaussian Geographical Topics 30Christoph Carl Kling
Compared models:
- LGTA: Model with regions
- Basic model: 3-level Hierarchical Dirichlet Process
- MGTM: Basic model plus dynamically
smoothed adjacent regions
28. Detecting Non-Gaussian Geographical Topics 31Christoph Carl Kling
manhattan (100 regions) landscape (200 regions)
activities (300 regions) car (500 regions) food (1000 regions)
Word Perplexity
29. Detecting Non-Gaussian Geographical Topics 32Christoph Carl Kling
User Study
Food dataset (1000 regions)
31 participants
Task: intrusion detection
Measure: precision
4 topics
avg / median
6 topics
avg / median
8 topics
avg / median
LGTA 0.67 / 0.64 0.57 / 0.57 0.60 / 0.58
Basic model 0.45 / 0.57 0.63 / 0.61 0.64 / 0.58
MGTM 0.79 / 0.80 0.82 / 0.81 0.78 / 0.75
30. Detecting Non-Gaussian Geographical Topics 33Christoph Carl Kling
west.uni-koblenz.de
Research → systems → MGTM
west.uni-koblenz.de liveandgov.eu
32. Detecting Non-Gaussian Geographical Topics 35Christoph Carl Kling
Summary
•
Geographical topics often exhibit a complex spatial
distribution
•
The detection of such complex topics can be supported
•
The dynamic smoothing of adjacent regions leads to an
evolutionary creation and spread of topics during
inference
33. Detecting Non-Gaussian Geographical Topics 36Christoph Carl Kling
ReferencesReferences
Hierarchical Dirichlet processes
by: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei
In: Journal of the American Statistical Association, Vol. 101 (2006) , p. 1566-1581.
GeoFolk: latent spatial semantics in web 2.0 social media.
by: Sergej Sizov
In: WSDM ACM (2010) , p. 281-290.
Geographical topic discovery and comparison.
by: Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. Huang
In: WWW ACM (2011) , p. 247-256.
A Nonparametric Bayesian Model of Multi-Level Category Learning.
by: Kevin Robert Canini, and Thomas L. Griffiths
In: AAAI AAAI Press (2011) .