The network structure of visited locations according to geotagged social media photos
1. The network structure of visited locations according to
geotagged social media photos
Christian Junker, Zaenal Akbar, Mart´ı Cuquet
September 20, 2017
3. Introduction (1)
The new reality of data-rich environments.
Challenges –
from technical and expanded into the economic, social,
ethical, legal, and political fields
Issues –
concerns on data quality, reliability and trust, privacy,
protection and accountability
Opportunities –
increase efficiency and innovation speed, the appearance
of new business models
3
4. Introduction (2)
Research is quickly embracing the potential of using and
analyzing this expanding number of data sources.
Big data analytic, machine learning, data mining, natural
language processing tools, etc.
An ideal framework to study of autonomous entities
cooperating to achieve a common or compatible goal:
i. What are the relations between main actors in a network?
ii. How they collaborate?
iii. What are the building blocks for a successful ecosystem?
4
5. Introduction (3)
This intensive data-driven network research should benefit the
tourism sector as well
Business, tourism attractions, public transportation hubs
and other points of interest are not isolated but part of a
collaborative system – still been hardly explored.
How to make such collaborative network surface?
Use data-rich environments to assists the
reconstruction of these collaborative networks
– how their members operate and reveal a potential for
value creation via collaborative approaches
5
6. Introduction (4)
Social media data – a potential data source to reveal
business and point of interests relationships
A potential data source – tourists are sharing their
experiences during and after trips on social media
For example, with geo-tagged data of tourists:
to show the destination preference and hotspots in a city
(Paldino et al., 2015),
describe city and global mobility patterns (Hawelka et al.,
2014),
predict taxi trip duration (Zarmehri and Soares, 2016)
6
7. This Work
We reconstructed a European network of locations visited
by tourists using geo-tagged photos shared on Flickr1
The network design relies on the use of collaboratively
contributed data by users
1. Vertices/Nodes – locations where photos were taken
2. Edges – connecting two locations if at least 2 users took
photos in both locations
1
https://www.flickr.com/
7
8. This Work – Objectives
1. To perform a characterization of collaborative network of
tourists geo-tagged photos and its basic properties
i. shows the feasibility and potential of using social media data
in the collaborative networks field
ii. reconstructs the relationships between relevant places for
tourists – what constitutes the central and most relevant
points of interest
2. To lay the ground for future research on tourism
segmentation based on locations visited – multilayered
collaborative networks
i. different collaborative networks form layers
ii. each layer corresponds to different segmentation of users
(e.g. local or tourists or by country of origin)
8
9. Dataset
YFCC100M – The Yahoo Flickr Creative Commons 100 Milion
Dataset2
Released in 2014, a public dataset of 100 million media
objects uploaded to Flickr, covers period 2000 - 2014
Metadata:
i. a photo identifier
ii. the user that created it
iii. tags used by users to annotate it
iv. camera used
v. time (taken and uploaded)
vi. location
vii. license
In total, 48 million objects are annotated with the
geo-location
2
https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67
9
10. Network Reconstruction (1)
A vertex corresponds to the geo-location (latitude,
longitude) of a media object in the dataset
Precision:
– 10−3 degrees both in latitude and longitude
– 111 meters of latitude, 79 meters of longitude
– granularity: street or neighborhood
Two vertices u and v are connected if at least two different
users have a media object in the two corresponding
locations
The weight Wuv of an edge (u, v) is the number of users
that visited locations u and v
10
11. Network Reconstruction (2)
The network has 178k vertices and 32M edges
One giant connected component of 175k nodes (97.8%),
other small components with size 2 - 29
11
12. Network Analysis (1)
One of the most important characteristics of real-world
networks is their degree distribution pk (Barab´asi, 1999)
Real-world networks typically have a larger number of
nodes of high degree, and follow a distribution that decays
as a power law (Barab´asi, 1999)
degree weight
12
13. Network Analysis (2)
Analyze if there is a correlation on how locations are linked
to each other in terms of the location degree.
Economic, technological, biological networks tend to show
disassortative mixing (Newman, 2002)
– nodes of high degree tend to connect to nodes of low
degree
Correlation coefficient
r = −2.36 × 10−6
It indicates no assortative
mixing
13
14. Network Analysis – Finding
1. The network displays a complex structure with a scale-free
degree and weight distribution
in line with other social, economic, and technological
networks (Albert and Barab´asi, 2002)
2. Analysis of degree-degree correlations shows no
assortative mixing
as opposed to different results in other real-world networks
(Newman, 2002)
14
15. Conclusion
1. It is feasible and potential of using social media data in the
collaborative networks field:
Link local business, landmarks, and other points of interest
based on social media users visiting them.
2. It enables for further data-driven studies that make use of
the richness of the metadata of similar sources – allow
future research on multilayered collaborative networks
assist in the segmentation of users via community detection
identify the role of different user segments ties in the
collaborative possibilities of tourism
15
16. Future works
1. Enhance the granularity
– from street/neighborhood level to points of interest level
(e.g. local business, landmarks, transportation hubs)
– associate nodes with the features of every business (e.g.
using GeoNames Feature Codes of geonames.org)
2. Multilayered collaborative networks:
i. segmentation of tourists
ii. detection of communities of business and points of interest
iii. identification of motifs and business functions within the
networks
16