Presentation slides at KDE seminar 2013/04/24, which introduces the paper "Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations."
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations
1. Towards Social User Profiling:
Unified and Discriminative Influence Model for
Inferring Home Locations
Rui Li, Shengjie Wang, Hongbo Deng,
Rui Wang, Kevin Chen-Chuan Chang
University of Illinois at Urbana-Champaign
13/04/24 KDE Seminar: Yuto Yamaguchi 1
Paper Introduction
Speaker: Yuto Yamaguchi
KDD ‘12
2. Introduction
• Users’ locations are important to many applications
• e.g.) Advertisement, Recommendation
• But most of users do not provide their location information
• On Twitter, only 16% of users register city level locations in their
profiles
• The objective of this paper is to profile users’ home
locations in social network.
13/04/24 KDE Seminar: Yuto Yamaguchi 2
3. General Ideas for Location Inference
• A user more likely to follow another user who lives near
• e.g.) A user in Chicago follows another user in Chicago
• [Backstorm et al., WWW ‘10],
• [Clodoveu et al., T-GIS ‘11] , …
• A user more likely to post about a near location to him
• e.g.) A user in Houston posts about rockets
• [Cheng et al., CIKM ‘10],
• [Chandra et al., SocialCom ’11],
• [Kinsella et al., SMUC ‘11], …
13/04/24 KDE Seminar: Yuto Yamaguchi 3
4. Challenges
• On Twitter, following network and tweets provide valuable
signals for profiling their home locations
• But there are two challenges,
• Scarce Signals
• 126 friends on average, but only 16% of them provide locations
• 6 location related terms in every 100 tweets
• Noisy Signals
• a user may follow another user who lives in a distant location
• a user may post about distant locations
13/04/24 KDE Seminar: Yuto Yamaguchi 4
5. Ideas in this paper
• The authors propose a unified discriminative influence
model UDI which has two features below
• Unified Signals (for scarce signal challenge)
• Integrates social network and user-centric data (i.e., tweets) in a
probabilistic framework, which is viewed as a heterogeneous graph
• Discriminative Influence (for noisy signal challenge)
• Users and locations have their own influence scope
e.g.) Lady Gaga (with a broad influence scope) is more likely to be
followed by a user far away
à users with broad scopes do not provide so strong signals for
location inference
13/04/24 KDE Seminar: Yuto Yamaguchi 5
6. Contributions
• Propose a unified discriminative influence model UDI
• Heterogeneous graph
• Influence scope
• Propose two location profiling methods using the above
model (introduced later)
• Local prediction method
• Global prediction method
• Conduct extensive experiments using Twitter dataset
• Their method can place 66% users within 100 miles error distance
13/04/24 KDE Seminar: Yuto Yamaguchi 6
8. Heterogeneous Graph
13/04/24 KDE Seminar: Yuto Yamaguchi 8
User nodes ui ∈U
vj ∈ VVenue nodes
If ui posts about vj, create an edge <ui, vj>
If ui follows uj, create an edge <ui, uj>
9. Location Profiling Problem
13/04/24 KDE Seminar: Yuto Yamaguchi 9
Given a Twitter Graph G, estimate a
location for each user ui so as to
make close to ui’s true location
ˆLui
ˆLui
Lui
11. Motivation 1/2
13/04/24 KDE Seminar: Yuto Yamaguchi 11
Near users (venues) are more likely
to be followed (tweeted) by other users
12. Motivation 2/2
13/04/24 KDE Seminar: Yuto Yamaguchi 12
Each user (venue) has
an influence scope of different size
Influential
user
regular
user
13. Basic Ideas for the Influence model
• Geographically influential user has a broad influence
scope
• e.g.) world wide celebrities such as Lady Gaga
• The fact that a user follows a geographically influential
user does NOT provide valuable signals for location
inference
• e.g.)
NOT VALUABLE: a user follows Lady Gaga
VALUABLE: a user follows a regular user in Chicago
13/04/24 KDE Seminar: Yuto Yamaguchi 13
14. Model Formulation
• The authors adopt a Gaussian distribution to model the
above characteristics
13/04/24 KDE Seminar: Yuto Yamaguchi 14
latitude
longitude
probability
to follow (tweet)
N(Lni
,Σni
)
node ni’s influence scope
15. Influence scope – users
13/04/24 KDE Seminar: Yuto Yamaguchi 15
latitude
longitude
probability
to follow
N(Lui
,Σui
)
user ui’s influence scope
High probability
to follow ui
Low probability
to follow ui
user ui’s home location
16. Influence scope – venues
13/04/24 KDE Seminar: Yuto Yamaguchi 16
latitude
longitude
probability
to tweet
N(Lvi
,Σvi
)
venue vi’s influence scope
High probability
to tweet
Low probability
to tweet
venue vi’s location
17. Different scope size – users
13/04/24 KDE Seminar: Yuto Yamaguchi 17
high influence
Regular user Geographically
influential user
More likely to be followed
by distant users
18. Different scope size – venues
13/04/24 KDE Seminar: Yuto Yamaguchi 18
high influence
Regular venue Geographically
influential venue
More likely to be tweeted
by distant users
19. Model Parameters
• Mean and variance for each Gaussian
• Mean is the location of node ni
• Variance decides the size of each influence scope
• The number of parameters is
13/04/24 KDE Seminar: Yuto Yamaguchi 19
N(Lni
,Σni
)
Lni
Σni
Σni
=
σni
0
0 σni
"
#
$
$
%
&
'
'
2 U + V( )
21. Basic Ideas for Location Profiling
13/04/24 KDE Seminar: Yuto Yamaguchi 21
Estimate such model parameters that maximize
the likelihood of obtaining the given Twitter graph
Lni
Σni
and
for each node ni
Parameters:
22. Local Prediction Method
• This method only considers the ego-network
• Maximize the likelihood of this network
13/04/24 KDE Seminar: Yuto Yamaguchi 22
tweet
follow
labeled user
labeled user
labeled user
unlabeled
user
labeled user:
his location is known
unlabeled user:
his location is unknown
ego-network
23. Likelihood Function of Local Method
13/04/24 KDE Seminar: Yuto Yamaguchi 23
P ego-network of ui | parameters( )=
P uj follows ui | Luj
, Lui
,Σui( )uj ∈Followers ui( )
∏ ×
P ui follows uj | Lui
, Luj
,Σuj( )uj ∈Followees ui( )
∏ ×
P ui tweets vj | Lui
, Lvj
,Σvj( )vj ∈Venues ui( )
∏
These are Gaussian
Maximize this function
24. Each Gaussian
13/04/24 KDE Seminar: Yuto Yamaguchi 24
P uj follows ui | Luj
, Lui
,Σui( )=
1
2πσui
2
exp
Xui
− Xuj( )
2
+ Yui
−Yuj( )
2
−2σui
2
#
$
%
%%
&
'
(
((
• High probability if ui and uj is close
• High probability if ui has broad influence scope
26. Global Prediction Method
• This method maximizes the likelihood of the whole network
• Predict locations of unknown users simultaneously
13/04/24 KDE Seminar: Yuto Yamaguchi 26
27. Likelihood Function of Global Method
13/04/24 KDE Seminar: Yuto Yamaguchi 27
P whole network | parameters( )=
P ui follows uj | Lui
, Luj
,Σuj( )ui,uj ∈FollowEdges
∏ ×
P ui tweets vj | Lui
, Lvj
,Σvj( )ui,vj ∈TweetEdges
∏
These are Gaussian
Maximize this function
28. Iterative Algorithm for Global Method
• Global method has no closed form solution
à Iterative algorithm
13/04/24 KDE Seminar: Yuto Yamaguchi 28
1. Initialize locations for all unlabeled users
2.
3. repeat
1. update for all nodes using
2. repeat
1. update for all unlabeled users using
3. until converge
4.
5.
4. until converge
Lu
σn
k
Lu
k
Lu ← Lu
k
k ←1
σn
k
Lu
Lu
k
Lu
k ← k +1
30. Dataset
• Twitter dataset
• Crawled Profiles, followers, and followees of 3,980,061 users
• Geocoded their location profiles into coordinates based on U.S.
Gazetteer
• 630,187 users are correctly geocoded ß labeled users
• 158,220 of labeled users have at least one labeled neighbor
• neighbor: follower or followee
• Crawled at most 600 tweets for each labeled user, and obtained
139,180 users’ tweets
• Other users are protected users
• Using this dataset, the authors conducted five-fold cross
validation
• 80% of 139,180 users are for training set, 20% are for test set
• Repeat 5 runs
13/04/24 KDE Seminar: Yuto Yamaguchi 30
31. Methods
• Compared 6 methods
• BaseU: Backstorm et al.’s method [1]
• Using only social graph
• BaseC: Cheng et al.’s method [2]
• Using only tweets
• UDIU: Local prediction method, but only uses user nodes
• UDIC: Local prediction method, but only uses venue nodes
• UDII: Local prediction method
• UDIG: Global prediction method
13/04/24 KDE Seminar: Yuto Yamaguchi 31
No influence model
[1] Backstorm et al., “Find me if you can: improving geographical prediction with
social and spatial proximity”, WWW’10
[2] Cheng et al., “You are where you tweet: a content-based approach to geo-
locating twitter users”, CIKM’10
32. Results – Prediction results
13/04/24 KDE Seminar: Yuto Yamaguchi 32
ACC: Ratio of correctly predicted users within 100 miles
AED@k%: Average error distance of top k% users
• Influence model is effective to predict locations
• Comparing BaseU and UDIU (BaseC and UDIC)
• Integrating both signals is effective to predict locations
• Comparing UDIU and UDII (UDIC and UDII)
• Global method improves Local one only 1.5%
• Comparing UDIG and UDII
33. Results – Global and Local
13/04/24 KDE Seminar: Yuto Yamaguchi 33
+9% in ACC
20% training users and 80% test users
In the case that most of users are unlabeled, the
global method improves the local one substantially
34. Results – Influence scope
13/04/24 KDE Seminar: Yuto Yamaguchi 34
• Users with a large number of followers do not
always have large σ
• e.g.) MythBusters Official have larger σ than Lady
Gaga but have smaller number of followers
36. Conclusion
• Proposed
• Unified discriminative influence model (UDI)
• Two location prediction method based on influence model
• global and local
• Conducted experiments using large Twitter dataset
• Proposed methods significantly outperform existing methods
• NO future work
13/04/24 KDE Seminar: Yuto Yamaguchi 36