SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Towards Social User Profiling:
Unified and Discriminative Influence Model for
Inferring Home Locations	
Rui Li, Shengjie Wang, Hongbo Deng,
Rui Wang, Kevin Chen-Chuan Chang
University of Illinois at Urbana-Champaign	
13/04/24 KDE Seminar: Yuto Yamaguchi 1
Paper Introduction
Speaker: Yuto Yamaguchi	
KDD ‘12
Introduction	
•  Users’ locations are important to many applications
•  e.g.) Advertisement, Recommendation
•  But most of users do not provide their location information
•  On Twitter, only 16% of users register city level locations in their
profiles
•  The objective of this paper is to profile users’ home
locations in social network.
13/04/24 KDE Seminar: Yuto Yamaguchi 2
General Ideas for Location Inference	
•  A user more likely to follow another user who lives near
•  e.g.) A user in Chicago follows another user in Chicago
•  [Backstorm et al., WWW ‘10],
•  [Clodoveu et al., T-GIS ‘11] , …
•  A user more likely to post about a near location to him
•  e.g.) A user in Houston posts about rockets
•  [Cheng et al., CIKM ‘10],
•  [Chandra et al., SocialCom ’11],
•  [Kinsella et al., SMUC ‘11], …	
13/04/24 KDE Seminar: Yuto Yamaguchi 3
Challenges	
•  On Twitter, following network and tweets provide valuable
signals for profiling their home locations
•  But there are two challenges,
•  Scarce Signals
•  126 friends on average, but only 16% of them provide locations
•  6 location related terms in every 100 tweets
•  Noisy Signals
•  a user may follow another user who lives in a distant location
•  a user may post about distant locations	
13/04/24 KDE Seminar: Yuto Yamaguchi 4
Ideas in this paper	
•  The authors propose a unified discriminative influence
model UDI which has two features below
•  Unified Signals (for scarce signal challenge)
•  Integrates social network and user-centric data (i.e., tweets) in a
probabilistic framework, which is viewed as a heterogeneous graph
•  Discriminative Influence (for noisy signal challenge)
•  Users and locations have their own influence scope
e.g.) Lady Gaga (with a broad influence scope) is more likely to be
followed by a user far away
à users with broad scopes do not provide so strong signals for
location inference
13/04/24 KDE Seminar: Yuto Yamaguchi 5
Contributions	
•  Propose a unified discriminative influence model UDI
•  Heterogeneous graph
•  Influence scope
•  Propose two location profiling methods using the above
model (introduced later)
•  Local prediction method
•  Global prediction method
•  Conduct extensive experiments using Twitter dataset
•  Their method can place 66% users within 100 miles error distance	
13/04/24 KDE Seminar: Yuto Yamaguchi 6
PROBLEM FORMULATION	
13/04/24 KDE Seminar: Yuto Yamaguchi 7
Heterogeneous Graph	
13/04/24 KDE Seminar: Yuto Yamaguchi 8
User nodes ui ∈U
vj ∈ VVenue nodes	
If ui posts about vj, create an edge <ui, vj>	
If ui follows uj, create an edge <ui, uj>
Location Profiling Problem	
13/04/24 KDE Seminar: Yuto Yamaguchi 9
Given a Twitter Graph G, estimate a
location for each user ui so as to
make close to ui’s true location	
ˆLui
ˆLui
Lui
INFLUENCE MODEL	
13/04/24 KDE Seminar: Yuto Yamaguchi 10
Motivation 1/2	
13/04/24 KDE Seminar: Yuto Yamaguchi 11
Near users (venues) are more likely
to be followed (tweeted) by other users
Motivation 2/2	
13/04/24 KDE Seminar: Yuto Yamaguchi 12
Each user (venue) has
an influence scope of different size	
Influential
user	
regular
user
Basic Ideas for the Influence model	
•  Geographically influential user has a broad influence
scope
•  e.g.) world wide celebrities such as Lady Gaga
•  The fact that a user follows a geographically influential
user does NOT provide valuable signals for location
inference
•  e.g.)
NOT VALUABLE: a user follows Lady Gaga
VALUABLE: a user follows a regular user in Chicago
13/04/24 KDE Seminar: Yuto Yamaguchi 13
Model Formulation	
•  The authors adopt a Gaussian distribution to model the
above characteristics
13/04/24 KDE Seminar: Yuto Yamaguchi 14
latitude	
longitude	
probability
to follow (tweet)	
N(Lni
,Σni
)
node ni’s influence scope
Influence scope – users	
13/04/24 KDE Seminar: Yuto Yamaguchi 15
latitude	
longitude	
probability
to follow	
N(Lui
,Σui
)
user ui’s influence scope	
High probability
to follow ui	
Low probability
to follow ui	
user ui’s home location
Influence scope – venues	
13/04/24 KDE Seminar: Yuto Yamaguchi 16
latitude	
longitude	
probability
to tweet	
N(Lvi
,Σvi
)
venue vi’s influence scope	
High probability
to tweet	
Low probability
to tweet	
venue vi’s location
Different scope size – users	
13/04/24 KDE Seminar: Yuto Yamaguchi 17
high influence	
Regular user Geographically
influential user	
More likely to be followed
by distant users
Different scope size – venues	
13/04/24 KDE Seminar: Yuto Yamaguchi 18
high influence	
Regular venue Geographically
influential venue	
More likely to be tweeted
by distant users
Model Parameters	
•  Mean and variance for each Gaussian
•  Mean is the location of node ni
•  Variance decides the size of each influence scope
•  The number of parameters is
13/04/24 KDE Seminar: Yuto Yamaguchi 19
N(Lni
,Σni
)
Lni
Σni
Σni
=
σni
0
0 σni
"
#
$
$
%
&
'
'
2 U + V( )
LOCATION PROFILING
METHODS	
Local prediction method
Global prediction method	
13/04/24 KDE Seminar: Yuto Yamaguchi 20
Basic Ideas for Location Profiling	
13/04/24 KDE Seminar: Yuto Yamaguchi 21
Estimate such model parameters that maximize
the likelihood of obtaining the given Twitter graph	
Lni
Σni
and	
 for each node ni	
Parameters:
Local Prediction Method	
•  This method only considers the ego-network
•  Maximize the likelihood of this network	
13/04/24 KDE Seminar: Yuto Yamaguchi 22
tweet	
follow	
labeled user	
labeled user	
labeled user	
unlabeled
user	
labeled user:
his location is known
unlabeled user:
his location is unknown	
ego-network
Likelihood Function of Local Method	
13/04/24 KDE Seminar: Yuto Yamaguchi 23
P ego-network of ui | parameters( )=
P uj follows ui | Luj
, Lui
,Σui( )uj ∈Followers ui( )
∏ ×
P ui follows uj | Lui
, Luj
,Σuj( )uj ∈Followees ui( )
∏ ×
P ui tweets vj | Lui
, Lvj
,Σvj( )vj ∈Venues ui( )
∏
These are Gaussian	
Maximize this function
Each Gaussian	
13/04/24 KDE Seminar: Yuto Yamaguchi 24
P uj follows ui | Luj
, Lui
,Σui( )=
1
2πσui
2
exp
Xui
− Xuj( )
2
+ Yui
−Yuj( )
2
−2σui
2
#
$
%
%%
&
'
(
((
•  High probability if ui and uj is close
•  High probability if ui has broad influence scope
Solution of Local Method	
13/04/24 KDE Seminar: Yuto Yamaguchi 25
Xui
=
Xuj
σuiuj ∈ followers ui( )
∑ +
Xuj
σujuj ∈ followees ui( )
∑ +
Xvj
σvivj ∈venues ui( )
∑
1
σuiuj ∈ followers ui( )
∑ +
1
σujuj ∈ followees ui( )
∑ +
1
σvivj ∈venues ui( )
∑
σui
2
=
Xui
− Xuj( )
2
+ Yui
−Yuj( )
2
2 followers ui( )uj ∈ followers ui( )
∑
Obtained as closed-form (no need to memorize)	
substitute
Global Prediction Method	
•  This method maximizes the likelihood of the whole network
•  Predict locations of unknown users simultaneously
13/04/24 KDE Seminar: Yuto Yamaguchi 26
Likelihood Function of Global Method	
13/04/24 KDE Seminar: Yuto Yamaguchi 27
P whole network | parameters( )=
P ui follows uj | Lui
, Luj
,Σuj( )ui,uj ∈FollowEdges
∏ ×
P ui tweets vj | Lui
, Lvj
,Σvj( )ui,vj ∈TweetEdges
∏
These are Gaussian	
Maximize this function
Iterative Algorithm for Global Method	
•  Global method has no closed form solution
à Iterative algorithm
13/04/24 KDE Seminar: Yuto Yamaguchi 28
1. Initialize locations for all unlabeled users
2. 
3. repeat
1. update for all nodes using
2. repeat
1. update for all unlabeled users using
3. until converge
4. 
5. 
4. until converge	
Lu
σn
k
Lu
k
Lu ← Lu
k
k ←1
σn
k
Lu
Lu
k
Lu
k ← k +1
EXPERIMENTS	
13/04/24 KDE Seminar: Yuto Yamaguchi 29
Dataset	
•  Twitter dataset
•  Crawled Profiles, followers, and followees of 3,980,061 users
•  Geocoded their location profiles into coordinates based on U.S.
Gazetteer
•  630,187 users are correctly geocoded ß labeled users
•  158,220 of labeled users have at least one labeled neighbor
•  neighbor: follower or followee
•  Crawled at most 600 tweets for each labeled user, and obtained
139,180 users’ tweets
•  Other users are protected users
•  Using this dataset, the authors conducted five-fold cross
validation
•  80% of 139,180 users are for training set, 20% are for test set
•  Repeat 5 runs
13/04/24 KDE Seminar: Yuto Yamaguchi 30
Methods	
•  Compared 6 methods
•  BaseU: Backstorm et al.’s method [1]
•  Using only social graph
•  BaseC: Cheng et al.’s method [2]
•  Using only tweets
•  UDIU: Local prediction method, but only uses user nodes
•  UDIC: Local prediction method, but only uses venue nodes	
•  UDII: Local prediction method	
•  UDIG: Global prediction method	
13/04/24 KDE Seminar: Yuto Yamaguchi 31
No influence model	
[1] Backstorm et al., “Find me if you can: improving geographical prediction with
social and spatial proximity”, WWW’10
[2] Cheng et al., “You are where you tweet: a content-based approach to geo-
locating twitter users”, CIKM’10
Results – Prediction results	
13/04/24 KDE Seminar: Yuto Yamaguchi 32
ACC: Ratio of correctly predicted users within 100 miles
AED@k%: Average error distance of top k% users	
•  Influence model is effective to predict locations
•  Comparing BaseU and UDIU (BaseC and UDIC)
•  Integrating both signals is effective to predict locations
•  Comparing UDIU and UDII (UDIC and UDII)
•  Global method improves Local one only 1.5%
•  Comparing UDIG and UDII
Results – Global and Local	
13/04/24 KDE Seminar: Yuto Yamaguchi 33
+9% in ACC	
20% training users and 80% test users	
In the case that most of users are unlabeled, the
global method improves the local one substantially
Results – Influence scope	
13/04/24 KDE Seminar: Yuto Yamaguchi 34
•  Users with a large number of followers do not
always have large σ
•  e.g.) MythBusters Official have larger σ than Lady
Gaga but have smaller number of followers
CONCLUSION	
13/04/24 KDE Seminar: Yuto Yamaguchi 35
Conclusion	
•  Proposed
•  Unified discriminative influence model (UDI)
•  Two location prediction method based on influence model
•  global and local
•  Conducted experiments using large Twitter dataset
•  Proposed methods significantly outperform existing methods
•  NO future work	
13/04/24 KDE Seminar: Yuto Yamaguchi 36

Contenu connexe

Similaire à Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations

Citizen Participatory Design Method Using VR and Blog
Citizen Participatory Design Method Using VR and BlogCitizen Participatory Design Method Using VR and Blog
Citizen Participatory Design Method Using VR and BlogTomohiro Fukuda
 
SocialCom 2009 - Social Synchrony
SocialCom 2009 - Social SynchronySocialCom 2009 - Social Synchrony
SocialCom 2009 - Social SynchronyMunmun De Choudhury
 
Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...lauratoni4
 
Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...Alpen-Adria-Universität
 
Scaling Up Learning Analytics
Scaling Up Learning AnalyticsScaling Up Learning Analytics
Scaling Up Learning AnalyticsDoug Clow
 
Situ8: browsing and capturing geolocated user-created content
Situ8: browsing and capturing geolocated user-created contentSitu8: browsing and capturing geolocated user-created content
Situ8: browsing and capturing geolocated user-created contentLiz FitzGerald
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
2016 iccgis module1_methods_andtechniques
2016 iccgis module1_methods_andtechniques2016 iccgis module1_methods_andtechniques
2016 iccgis module1_methods_andtechniquesUUUI ICA
 
Study in Open Sharing of a Process of Research and Creation in Interaction De...
Study in Open Sharing of a Process of Research and Creation in Interaction De...Study in Open Sharing of a Process of Research and Creation in Interaction De...
Study in Open Sharing of a Process of Research and Creation in Interaction De...Yosuke Sakai
 
A Virtual Infrastructure for Data intensive Analysis (VIDIA)
A Virtual Infrastructure for Data intensive Analysis (VIDIA)A Virtual Infrastructure for Data intensive Analysis (VIDIA)
A Virtual Infrastructure for Data intensive Analysis (VIDIA)Alexandra M. Pickett
 
Understanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsUnderstanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsFörderverein Technische Fakultät
 
Online taxonomy: Why do people engage?
Online taxonomy: Why do people engage?Online taxonomy: Why do people engage?
Online taxonomy: Why do people engage?Vince Smith
 
Geotecs: Exploiting Geographical, temporal, categorical, and social context f...
Geotecs: Exploiting Geographical, temporal, categorical, and social context f...Geotecs: Exploiting Geographical, temporal, categorical, and social context f...
Geotecs: Exploiting Geographical, temporal, categorical, and social context f...rameshraj
 
Offering Online Professional Development for Faculty Using a Cross-Platform S...
Offering Online Professional Development for Faculty Using a Cross-Platform S...Offering Online Professional Development for Faculty Using a Cross-Platform S...
Offering Online Professional Development for Faculty Using a Cross-Platform S...Jason Rhode
 
Presentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes LabPresentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes Labselan_rds
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?EDB
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksNatasha Mandal
 
Usability evaluations (part 2)
Usability evaluations (part 2) Usability evaluations (part 2)
Usability evaluations (part 2) Andres Baravalle
 

Similaire à Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations (20)

Recsys14 int rs_vassileva
Recsys14 int rs_vassilevaRecsys14 int rs_vassileva
Recsys14 int rs_vassileva
 
Citizen Participatory Design Method Using VR and Blog
Citizen Participatory Design Method Using VR and BlogCitizen Participatory Design Method Using VR and Blog
Citizen Participatory Design Method Using VR and Blog
 
SocialCom 2009 - Social Synchrony
SocialCom 2009 - Social SynchronySocialCom 2009 - Social Synchrony
SocialCom 2009 - Social Synchrony
 
Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...
 
Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...Understanding user interactivity for immersive communications and its impact ...
Understanding user interactivity for immersive communications and its impact ...
 
Scaling Up Learning Analytics
Scaling Up Learning AnalyticsScaling Up Learning Analytics
Scaling Up Learning Analytics
 
Situ8: browsing and capturing geolocated user-created content
Situ8: browsing and capturing geolocated user-created contentSitu8: browsing and capturing geolocated user-created content
Situ8: browsing and capturing geolocated user-created content
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
2016 iccgis module1_methods_andtechniques
2016 iccgis module1_methods_andtechniques2016 iccgis module1_methods_andtechniques
2016 iccgis module1_methods_andtechniques
 
Study in Open Sharing of a Process of Research and Creation in Interaction De...
Study in Open Sharing of a Process of Research and Creation in Interaction De...Study in Open Sharing of a Process of Research and Creation in Interaction De...
Study in Open Sharing of a Process of Research and Creation in Interaction De...
 
A Virtual Infrastructure for Data intensive Analysis (VIDIA)
A Virtual Infrastructure for Data intensive Analysis (VIDIA)A Virtual Infrastructure for Data intensive Analysis (VIDIA)
A Virtual Infrastructure for Data intensive Analysis (VIDIA)
 
Understanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsUnderstanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive Communications
 
Online taxonomy: Why do people engage?
Online taxonomy: Why do people engage?Online taxonomy: Why do people engage?
Online taxonomy: Why do people engage?
 
Geotecs: Exploiting Geographical, temporal, categorical, and social context f...
Geotecs: Exploiting Geographical, temporal, categorical, and social context f...Geotecs: Exploiting Geographical, temporal, categorical, and social context f...
Geotecs: Exploiting Geographical, temporal, categorical, and social context f...
 
Offering Online Professional Development for Faculty Using a Cross-Platform S...
Offering Online Professional Development for Faculty Using a Cross-Platform S...Offering Online Professional Development for Faculty Using a Cross-Platform S...
Offering Online Professional Development for Faculty Using a Cross-Platform S...
 
Presentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes LabPresentation Selan dos Santos 4Eyes Lab
Presentation Selan dos Santos 4Eyes Lab
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
 
Lamar 3d Map
Lamar 3d Map Lamar 3d Map
Lamar 3d Map
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social Networks
 
Usability evaluations (part 2)
Usability evaluations (part 2) Usability evaluations (part 2)
Usability evaluations (part 2)
 

Plus de Yuto Yamaguchi

When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...
When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...
When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...Yuto Yamaguchi
 
Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会
Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会
Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会Yuto Yamaguchi
 
Tensor Decomposition with Missing Indices
Tensor Decomposition with Missing IndicesTensor Decomposition with Missing Indices
Tensor Decomposition with Missing IndicesYuto Yamaguchi
 
When Does Label Propagation Fail? A View from a Network Generative Model
When Does Label Propagation Fail? A View from a Network Generative ModelWhen Does Label Propagation Fail? A View from a Network Generative Model
When Does Label Propagation Fail? A View from a Network Generative ModelYuto Yamaguchi
 
Robust Large-Scale Machine Learning in the Cloud
Robust Large-Scale Machine Learning in the CloudRobust Large-Scale Machine Learning in the Cloud
Robust Large-Scale Machine Learning in the CloudYuto Yamaguchi
 
Patterns in Interactive Tagging Networks
Patterns in Interactive Tagging NetworksPatterns in Interactive Tagging Networks
Patterns in Interactive Tagging NetworksYuto Yamaguchi
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceYuto Yamaguchi
 
OMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation
OMNI-Prop: Seamless Node Classification on Arbitrary Label CorrelationOMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation
OMNI-Prop: Seamless Node Classification on Arbitrary Label CorrelationYuto Yamaguchi
 
Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...
Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...
Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...Yuto Yamaguchi
 
SIGMOD2013勉強会:Social Media
SIGMOD2013勉強会:Social MediaSIGMOD2013勉強会:Social Media
SIGMOD2013勉強会:Social MediaYuto Yamaguchi
 
WWW2012勉強会:Information Diffusion in Social Networks
WWW2012勉強会:Information Diffusion in Social NetworksWWW2012勉強会:Information Diffusion in Social Networks
WWW2012勉強会:Information Diffusion in Social NetworksYuto Yamaguchi
 
ICDE2012勉強会:Social Media
ICDE2012勉強会:Social MediaICDE2012勉強会:Social Media
ICDE2012勉強会:Social MediaYuto Yamaguchi
 

Plus de Yuto Yamaguchi (12)

When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...
When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...
When Does Label Propagation Fail? A View from a Network Generative Model@ERAT...
 
Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会
Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会
Bridging Relational Learning Algorithms@ビッグデータ基盤勉強会
 
Tensor Decomposition with Missing Indices
Tensor Decomposition with Missing IndicesTensor Decomposition with Missing Indices
Tensor Decomposition with Missing Indices
 
When Does Label Propagation Fail? A View from a Network Generative Model
When Does Label Propagation Fail? A View from a Network Generative ModelWhen Does Label Propagation Fail? A View from a Network Generative Model
When Does Label Propagation Fail? A View from a Network Generative Model
 
Robust Large-Scale Machine Learning in the Cloud
Robust Large-Scale Machine Learning in the CloudRobust Large-Scale Machine Learning in the Cloud
Robust Large-Scale Machine Learning in the Cloud
 
Patterns in Interactive Tagging Networks
Patterns in Interactive Tagging NetworksPatterns in Interactive Tagging Networks
Patterns in Interactive Tagging Networks
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with Confidence
 
OMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation
OMNI-Prop: Seamless Node Classification on Arbitrary Label CorrelationOMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation
OMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation
 
Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...
Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...
Online User Location Inference Exploiting Spatiotemporal Correlations in Soci...
 
SIGMOD2013勉強会:Social Media
SIGMOD2013勉強会:Social MediaSIGMOD2013勉強会:Social Media
SIGMOD2013勉強会:Social Media
 
WWW2012勉強会:Information Diffusion in Social Networks
WWW2012勉強会:Information Diffusion in Social NetworksWWW2012勉強会:Information Diffusion in Social Networks
WWW2012勉強会:Information Diffusion in Social Networks
 
ICDE2012勉強会:Social Media
ICDE2012勉強会:Social MediaICDE2012勉強会:Social Media
ICDE2012勉強会:Social Media
 

Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations

  • 1. Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign 13/04/24 KDE Seminar: Yuto Yamaguchi 1 Paper Introduction Speaker: Yuto Yamaguchi KDD ‘12
  • 2. Introduction •  Users’ locations are important to many applications •  e.g.) Advertisement, Recommendation •  But most of users do not provide their location information •  On Twitter, only 16% of users register city level locations in their profiles •  The objective of this paper is to profile users’ home locations in social network. 13/04/24 KDE Seminar: Yuto Yamaguchi 2
  • 3. General Ideas for Location Inference •  A user more likely to follow another user who lives near •  e.g.) A user in Chicago follows another user in Chicago •  [Backstorm et al., WWW ‘10], •  [Clodoveu et al., T-GIS ‘11] , … •  A user more likely to post about a near location to him •  e.g.) A user in Houston posts about rockets •  [Cheng et al., CIKM ‘10], •  [Chandra et al., SocialCom ’11], •  [Kinsella et al., SMUC ‘11], … 13/04/24 KDE Seminar: Yuto Yamaguchi 3
  • 4. Challenges •  On Twitter, following network and tweets provide valuable signals for profiling their home locations •  But there are two challenges, •  Scarce Signals •  126 friends on average, but only 16% of them provide locations •  6 location related terms in every 100 tweets •  Noisy Signals •  a user may follow another user who lives in a distant location •  a user may post about distant locations 13/04/24 KDE Seminar: Yuto Yamaguchi 4
  • 5. Ideas in this paper •  The authors propose a unified discriminative influence model UDI which has two features below •  Unified Signals (for scarce signal challenge) •  Integrates social network and user-centric data (i.e., tweets) in a probabilistic framework, which is viewed as a heterogeneous graph •  Discriminative Influence (for noisy signal challenge) •  Users and locations have their own influence scope e.g.) Lady Gaga (with a broad influence scope) is more likely to be followed by a user far away à users with broad scopes do not provide so strong signals for location inference 13/04/24 KDE Seminar: Yuto Yamaguchi 5
  • 6. Contributions •  Propose a unified discriminative influence model UDI •  Heterogeneous graph •  Influence scope •  Propose two location profiling methods using the above model (introduced later) •  Local prediction method •  Global prediction method •  Conduct extensive experiments using Twitter dataset •  Their method can place 66% users within 100 miles error distance 13/04/24 KDE Seminar: Yuto Yamaguchi 6
  • 7. PROBLEM FORMULATION 13/04/24 KDE Seminar: Yuto Yamaguchi 7
  • 8. Heterogeneous Graph 13/04/24 KDE Seminar: Yuto Yamaguchi 8 User nodes ui ∈U vj ∈ VVenue nodes If ui posts about vj, create an edge <ui, vj> If ui follows uj, create an edge <ui, uj>
  • 9. Location Profiling Problem 13/04/24 KDE Seminar: Yuto Yamaguchi 9 Given a Twitter Graph G, estimate a location for each user ui so as to make close to ui’s true location ˆLui ˆLui Lui
  • 10. INFLUENCE MODEL 13/04/24 KDE Seminar: Yuto Yamaguchi 10
  • 11. Motivation 1/2 13/04/24 KDE Seminar: Yuto Yamaguchi 11 Near users (venues) are more likely to be followed (tweeted) by other users
  • 12. Motivation 2/2 13/04/24 KDE Seminar: Yuto Yamaguchi 12 Each user (venue) has an influence scope of different size Influential user regular user
  • 13. Basic Ideas for the Influence model •  Geographically influential user has a broad influence scope •  e.g.) world wide celebrities such as Lady Gaga •  The fact that a user follows a geographically influential user does NOT provide valuable signals for location inference •  e.g.) NOT VALUABLE: a user follows Lady Gaga VALUABLE: a user follows a regular user in Chicago 13/04/24 KDE Seminar: Yuto Yamaguchi 13
  • 14. Model Formulation •  The authors adopt a Gaussian distribution to model the above characteristics 13/04/24 KDE Seminar: Yuto Yamaguchi 14 latitude longitude probability to follow (tweet) N(Lni ,Σni ) node ni’s influence scope
  • 15. Influence scope – users 13/04/24 KDE Seminar: Yuto Yamaguchi 15 latitude longitude probability to follow N(Lui ,Σui ) user ui’s influence scope High probability to follow ui Low probability to follow ui user ui’s home location
  • 16. Influence scope – venues 13/04/24 KDE Seminar: Yuto Yamaguchi 16 latitude longitude probability to tweet N(Lvi ,Σvi ) venue vi’s influence scope High probability to tweet Low probability to tweet venue vi’s location
  • 17. Different scope size – users 13/04/24 KDE Seminar: Yuto Yamaguchi 17 high influence Regular user Geographically influential user More likely to be followed by distant users
  • 18. Different scope size – venues 13/04/24 KDE Seminar: Yuto Yamaguchi 18 high influence Regular venue Geographically influential venue More likely to be tweeted by distant users
  • 19. Model Parameters •  Mean and variance for each Gaussian •  Mean is the location of node ni •  Variance decides the size of each influence scope •  The number of parameters is 13/04/24 KDE Seminar: Yuto Yamaguchi 19 N(Lni ,Σni ) Lni Σni Σni = σni 0 0 σni " # $ $ % & ' ' 2 U + V( )
  • 20. LOCATION PROFILING METHODS Local prediction method Global prediction method 13/04/24 KDE Seminar: Yuto Yamaguchi 20
  • 21. Basic Ideas for Location Profiling 13/04/24 KDE Seminar: Yuto Yamaguchi 21 Estimate such model parameters that maximize the likelihood of obtaining the given Twitter graph Lni Σni and for each node ni Parameters:
  • 22. Local Prediction Method •  This method only considers the ego-network •  Maximize the likelihood of this network 13/04/24 KDE Seminar: Yuto Yamaguchi 22 tweet follow labeled user labeled user labeled user unlabeled user labeled user: his location is known unlabeled user: his location is unknown ego-network
  • 23. Likelihood Function of Local Method 13/04/24 KDE Seminar: Yuto Yamaguchi 23 P ego-network of ui | parameters( )= P uj follows ui | Luj , Lui ,Σui( )uj ∈Followers ui( ) ∏ × P ui follows uj | Lui , Luj ,Σuj( )uj ∈Followees ui( ) ∏ × P ui tweets vj | Lui , Lvj ,Σvj( )vj ∈Venues ui( ) ∏ These are Gaussian Maximize this function
  • 24. Each Gaussian 13/04/24 KDE Seminar: Yuto Yamaguchi 24 P uj follows ui | Luj , Lui ,Σui( )= 1 2πσui 2 exp Xui − Xuj( ) 2 + Yui −Yuj( ) 2 −2σui 2 # $ % %% & ' ( (( •  High probability if ui and uj is close •  High probability if ui has broad influence scope
  • 25. Solution of Local Method 13/04/24 KDE Seminar: Yuto Yamaguchi 25 Xui = Xuj σuiuj ∈ followers ui( ) ∑ + Xuj σujuj ∈ followees ui( ) ∑ + Xvj σvivj ∈venues ui( ) ∑ 1 σuiuj ∈ followers ui( ) ∑ + 1 σujuj ∈ followees ui( ) ∑ + 1 σvivj ∈venues ui( ) ∑ σui 2 = Xui − Xuj( ) 2 + Yui −Yuj( ) 2 2 followers ui( )uj ∈ followers ui( ) ∑ Obtained as closed-form (no need to memorize) substitute
  • 26. Global Prediction Method •  This method maximizes the likelihood of the whole network •  Predict locations of unknown users simultaneously 13/04/24 KDE Seminar: Yuto Yamaguchi 26
  • 27. Likelihood Function of Global Method 13/04/24 KDE Seminar: Yuto Yamaguchi 27 P whole network | parameters( )= P ui follows uj | Lui , Luj ,Σuj( )ui,uj ∈FollowEdges ∏ × P ui tweets vj | Lui , Lvj ,Σvj( )ui,vj ∈TweetEdges ∏ These are Gaussian Maximize this function
  • 28. Iterative Algorithm for Global Method •  Global method has no closed form solution à Iterative algorithm 13/04/24 KDE Seminar: Yuto Yamaguchi 28 1. Initialize locations for all unlabeled users 2.  3. repeat 1. update for all nodes using 2. repeat 1. update for all unlabeled users using 3. until converge 4.  5.  4. until converge Lu σn k Lu k Lu ← Lu k k ←1 σn k Lu Lu k Lu k ← k +1
  • 30. Dataset •  Twitter dataset •  Crawled Profiles, followers, and followees of 3,980,061 users •  Geocoded their location profiles into coordinates based on U.S. Gazetteer •  630,187 users are correctly geocoded ß labeled users •  158,220 of labeled users have at least one labeled neighbor •  neighbor: follower or followee •  Crawled at most 600 tweets for each labeled user, and obtained 139,180 users’ tweets •  Other users are protected users •  Using this dataset, the authors conducted five-fold cross validation •  80% of 139,180 users are for training set, 20% are for test set •  Repeat 5 runs 13/04/24 KDE Seminar: Yuto Yamaguchi 30
  • 31. Methods •  Compared 6 methods •  BaseU: Backstorm et al.’s method [1] •  Using only social graph •  BaseC: Cheng et al.’s method [2] •  Using only tweets •  UDIU: Local prediction method, but only uses user nodes •  UDIC: Local prediction method, but only uses venue nodes •  UDII: Local prediction method •  UDIG: Global prediction method 13/04/24 KDE Seminar: Yuto Yamaguchi 31 No influence model [1] Backstorm et al., “Find me if you can: improving geographical prediction with social and spatial proximity”, WWW’10 [2] Cheng et al., “You are where you tweet: a content-based approach to geo- locating twitter users”, CIKM’10
  • 32. Results – Prediction results 13/04/24 KDE Seminar: Yuto Yamaguchi 32 ACC: Ratio of correctly predicted users within 100 miles AED@k%: Average error distance of top k% users •  Influence model is effective to predict locations •  Comparing BaseU and UDIU (BaseC and UDIC) •  Integrating both signals is effective to predict locations •  Comparing UDIU and UDII (UDIC and UDII) •  Global method improves Local one only 1.5% •  Comparing UDIG and UDII
  • 33. Results – Global and Local 13/04/24 KDE Seminar: Yuto Yamaguchi 33 +9% in ACC 20% training users and 80% test users In the case that most of users are unlabeled, the global method improves the local one substantially
  • 34. Results – Influence scope 13/04/24 KDE Seminar: Yuto Yamaguchi 34 •  Users with a large number of followers do not always have large σ •  e.g.) MythBusters Official have larger σ than Lady Gaga but have smaller number of followers
  • 35. CONCLUSION 13/04/24 KDE Seminar: Yuto Yamaguchi 35
  • 36. Conclusion •  Proposed •  Unified discriminative influence model (UDI) •  Two location prediction method based on influence model •  global and local •  Conducted experiments using large Twitter dataset •  Proposed methods significantly outperform existing methods •  NO future work 13/04/24 KDE Seminar: Yuto Yamaguchi 36