This paper examines the location traces of 489 users of a location sharing social network for relationships between the users' mobility patterns and structural properties of their underlying social network. We introduce a novel set of location-based features for analyzing the social context of a geographic region, including location entropy, which measures the diversity of unique visitors of a location. Using these features, we provide a model for predicting friendship between two users by analyzing their location trails. Our model achieves significant gains over simpler models based only on direct properties of the co-location histories, such as the number of co-locations. We also show a positive relationship between the entropy of the locations the user visits and the number of social ties that user has in the network. We discuss how the offline mobility of users can have implications for both researchers and designers of online social networks.
Authors are Justin Cranshaw, Eran Toch, Jason Hong, Aniket Kittur, and Norman Sadeh
Bridging the Gap Between Physical Location and Online Social Networks, at Ubicomp 2010
1. 1
Bridging the Gap Between Physical
Location and Online Social Networks
Justin Cranshaw
Eran Toch
Jason Hong
Aniket Kittur
Norman Sadeh
Carnegie Mellon University
School of Computer Science
2. 2
On Facebook, we maintain a
set of social connection we
typically call Facebook
friends.
13. 13
Outline:
Goal: Define a set of observable properties of physical places
that convey information about the people that visit the location
and social interactions that there.
Evaluation: We will evaluate these properties on a prediction
task. We will attempting to discern Facebook friendships from
non-friendships based on the co-location network of the users.
Results: We’ll show that using these location based features
significantly improves the performance of a classifier.
14. 14
Related Work:
Several results affiliated with Sandy Pentland’s group
[Eagle & Pentland, 2009]
[Eagle, Pentland, and Lazer 2009]
Several results from Microsoft research:
[Zheng et. al, UbiComp, 2008]
[Zheng et al, GIS, 2008]
[Kostakos & Venkatanthan, 2010]
Our main point of difference in this
work is our focus on contextual
properties of the location histories.
15. 15
Co-location
Suppose A and B are co-located.
How might we deduce if they are
actually friends?
1. We can infer based on how they
socialize and interact
• We can infer based on how many other
times they’ve been co-located in the past
• We can infer based the context (where
they are and what they’re doing)
A B
A and B were co-located
16. 16
Co-location
Suppose A and B are co-located.
How might we deduce if they are
actually friends?
A B
A and B were co-located
1. We can infer based on how they
socialize and interact
• We can infer based on how many other
times they’ve been co-located in the past
• We can infer based the context (where
they are and what they’re doing)
17. 17
Co-location
Suppose A and B are co-located.
How might we deduce if they are
actually friends?
A B
They were observed
together on 100
occasions
On the same bus
1. We can infer based on how they
socialize and interact
• We can infer based on how many other
times they’ve been co-located in the past
• We can infer based the context (where
they are and what they’re doing)
A and B were co-located
If we just infer based on 2. we might guess that they are friends, when
it’s very likely they are not.
18. 18
Co-location
Suppose A and B are co-located.
How might we deduce if they are
actually friends?
1. We can infer based on how they
socialize and interact
• We can infer based on how many other
times they’ve been co-located in the past
• We can infer based the context (where
they are and what they’re doing)
A B
They were observed
together on 4 occasions
3 times at A’s house, and
1 time at B’s house
A and B were co-located
If we just infer based on 2. we might guess that they are not-
friends, when in fact it’s much more likely that they are.
19. 19
Co-location
Suppose A and B are co-located.
How might we deduce if they are
actually friends?
This example motivates two hypotheses: that the number
of co-locations of two people is a poor indicator of their
relationship between them, and that context about the
location can help in prediction.
A B
A and B were co-located
20. 20
How can we derive context on
a large scale, only from
location data?
21. 21
How can we derive context on
a large scale, only from
location data?
One Option:Location Diversity
22. 22
Location Diversity
For a given location we define:
Frequency: total number of observations at the location
User Count: total number of users observed at the location
Entropy: the entropy of the distribution of observation of
distinct users
Location diversity helps us identify the locations where chance co-
locations are most likely. Locations with high diversity have more
chance encounters.
23. 23
Location Diversity
Frequency: LOW
User count: LOW
Entropy: LOW
(40.46,-79.9)
(40.45,-79.9)(40.45,-80.0)
(40.46,-80.0)
9/14, 9:00AM
9/18, 10:00AM
9/18, 10:05AM
Observation = (user id, latitude, longitude, time)
Observations
A
A
A
A
Observation of user A
B
Observation of user B
C
Observation of user C
We look at all observations of users over time at a given
location.
24. 24
Location Diversity
Frequency: HIGH
User count: LOW
Entropy: LOW
(40.46,-79.9)
(40.45,-79.9)(40.45,-80.0)
(40.46,-80.0)
A
A
A
A
A
A
A
A
A
A
A
A
A
Observation of user A
B
Observation of user B
C
Observation of user C
We look at all observations of users over time at a given
location.
25. 25
Location Diversity
Frequency: HIGH
User count: HIGH
Entropy: LOW
(40.46,-79.9)
(40.45,-79.9)(40.45,-80.0)
(40.46,-80.0)
A
A
A
A
B
A
A
A
A
A
A
C
Here, co-locations are more likely to mean friendship.
A
Observation of user A
B
Observation of user B
C
Observation of user C
We look at all observations of users over time at a given
location.
26. 26
Location Diversity
Frequency: HIGH
User count: HIGH
Entropy: HIGH
(40.46,-79.9)
(40.45,-79.9)(40.45,-80.0)
(40.46,-80.0)
Here, co-locations are more likely to be due to chance.
A
Observation of user A
B
Observation of user B
C
Observation of user C
C
A
A
B
B
C
A
C
B
A
B
C
We look at all observations of users over time at a given
location.
27. 27
Connection to Biological Diversity:
Ecologists have been using entropy to
study location for over 50 years.
Uses: habitat determination, health of
an ecosystem, land use determinations
for conservation
28. 28
How does location diversity
relate to predicting
(Facebook) friendships
from co-location?
29. 29
A
B
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
B
B
B
B
C
C
C
C
A
B
An edge indicates
a co-location
Location 1 History
Location 2 History
A B
Case 1: Its difficult to
conclude that A and B.
Case 2: It’s more likely
that A and B are
actually friends.
HIGH
Entropy
LOW
Entropy
E
E
D
D
Recall these
diagrams show all
historical
observations at the
location over time.
An edge indicates the
users were there are
the same time.
31. 31
The history of A and B’s co-
location
An edge indicates
a co-location
Here it is much more likely that
there A and B are friends.
A B
A
B
A
A
A
A
A
B
A
A
A
A
A
A
A
B
B
B
B
A
B
B
A
A
D
D
D
D
D
D
D
D
A
B
Location 1 History
Location 2 History
Location 3 History
33. 33
Location Entropy
Pittsburgh, PA
Shopping and Dining
Universities
Shopping and Dining
Bars and Pubs
Residential
Residential
HIGH Entropy
LOW Entropy
HIGH Entropy
HIGH Entropy
LOW Entropy
HIGH Entropy
34. 34
The history of unique people that visit a
location over time tells us a great deal of
information about that location.
This in turn provides insight into the individuals
that visit the location, and the social
interactions that occur there.
35. 35
The history of unique people that visit a
location over time tells us a great deal of
information about that location.
This in turn provides insight into the individuals
that visit the location, and the social
interactions that occur there.
We used this general principal to define other
potentially useful features of co-location data.
36. 36
Feature Categories
Description
Intensity and
Duration
The size and spatial and temporal range of the set
of co-locations.
Location Diversity
Location diversity measures of the locations where
the users were co-located.
Specificity
Whether the locations the users were co-located are
“shared” with the community or “specific” to them.
Structural Properties
Relevant structural properties of the co-location
graph that are indicative of friendship.
37. 37
Feature Categories
Description
Intensity and
Duration
The size and spatial and temporal range of the set
of co-locations.
Location Diversity
Location diversity measures of the locations where
the users were co-located.
Specificity
Whether the locations the users were co-located are
“shared” with the community or “specific” to them.
Structural Properties
Relevant structural properties of the co-location
graph that are indicative of friendship.
These features use shallow properties
of the co-location history: how many
times, how many places, what time of
day, etc.
38. 38
Feature Categories
Description
Intensity and
Duration
The size and spatial and temporal range of the set
of co-locations.
Location Diversity
Location diversity measures of the locations where
the users were co-located.
Specificity
Whether the locations the users were co-located are
“shared” with the community or “specific” to them.
Structural Properties
Relevant structural properties of the co-location
graph that are indicative of friendship.
These features predominately use properties
derived from the history of location
observations, such as the location entropy.
39. 39
The Data
489 users with at least 1 month of tracking data from Locaccino
Area: Restricted to users in the Pittsburgh metro area
Recruitment: some from formal user studies, some were invited
friends of participants, other randomly joined
System use is possibly across non-overlapping time intervals
About 90% of the users were laptop users
In all over 4 million location observations
40. 40
Comparing the networks
Social Network Co-location Network
Intersection (co-located
friends)
Num Edges 1007 3636 360
Our goal it to differentiate meaningful edges in the co-locations
from co-locations of chance.
Co-location among users is pervasive, yet co-location among
friends is comparatively rare.
We would like to predict whether two users are friends from their
co-location history alone.
41. 41
Evaluation
Classifiers: trained 3 AdaBoost classifiers (with decision
stumps).
• One only used Intensity and Duration features
• One used Diversity, Structural, and Specificity features
• One used all features
Baseline: we classify solely based on the number of times the
users were co-located.
Goal: Compare Intensity and Duration features to
Diversity, Structural, and Specificity features.
42. 42
Using features such a location entropy significantly improves
performance over shallow features such as number of co-locations
43. 43
Using features such a location entropy significantly improves
performance over shallow features such as number of co-locations
44. 44
This highlights the variability in
online social network ties with
respect to behavior.
Overall classifier performance was
good for testing our hypotheses,
but was not great for classification
purposes.
Accuracy is high, but
precision/recall trade-offs are poor
do to unbalanced class
proportions (many more non-
friends than friends)
If the end goal is classification,
perhaps more specialized
approaches might be best.
45. 45
Additional Findings
We also looked at the relationship between an
individuals location history, and the number of
Facebook friends a user has.
We found a convincing positive relationship between
the entropy of places a user goes to and the number
friends the user has.
46. 46
Correlation of mobility features with number of friends
The location diversity variables and the mobility regularity variables show very
strong correlations.
Users that have irregular routines, and users who visit diverse locations have
more connections in the Locaccino social network.
47. 47
Limitations
Many users, spread over different time periods.
Most of the users were laptop users, which offers a
course approximation of mobility.
Population is homogenous.
48. 48
Future Work
Non binary ties:
Numeric ties -- tie strength
from colocation
Categorical ties --
relationship types
More data from smart phones
More specialized learning
models
49. 49
I’d be happy to take your questions!
Thank you for your time and attention.
Justin Cranshaw
jcransh@cs.cmu.edu
Illustration by David Pearson, in William Safire, On Language, New York Times Magazine, June 26,
2009.
52. 52
User Mobility
Look at the history of locations of each user
We define a set of features of the location history of each user that is
predictive of the number of friends they have in the Locacciono network.
53. 53
User Mobility Features
Description
Intensity and
Duration
These features describe the size and spatial and temporal
range of the set observations of the user.
Location Diversity
These features describe the diversity of observations
collected at the locations the user visits.
Regularity
These features describe temporal regularity of the location
observations of the user. Do their observations follow a
regular routine or are they random?
54. 54
Structural Comparisons
Social Network Co-location Network
Intersection (co-located
friends)
Num Vertices 489 489 489
Num Non-Isolate
Vertices 366 245 127
Num Edges 1007 3636 360
Num Connected
Components 44 91 99
Largest Components
Size 299 293 84
Density 0.013 0.063 0.005
Connectedness 0.59 0.56 0.06
Transitivity 0.41 0.48 0.42
55. 55
Why do we want to do this?
The relationship between online social networks and
physical location is understudied.
Partitioning the social graph is a hard and important
problem
Could have implications in creating better (context
based) social network privacy controls