PhD defense contributions:
Providing a study of human generated recommendation on Twitter and its effect.
García-Gavilanes et al. Follow My Friends This Friday! An Analysis of Human-generated Friendship Recommendations. SocInfo’13 [Best paper award]
Describing the evolution of user behavior over time regarding the content they generate.
García-Gavilanes et al. Who are my Audiences? A Study of the Evolution of Target Audiences in Microblogs. SocInfo’14
Describing differences and similarities of users across countries regarding the way people tweet and connect with others.
García-Gavilanes et al. Microblogging without Borders: Differences and Similarities. Websci’11.
w/ Poblete et al. Do All Birds Tweet the Same? Characterizing Twitter Around the World. In CIKM’11
Proposing how to combine anthropological studies of culture with large scale data.
Correlating how and when people tweet with dimensions of national culture and pace of life
García-Gavilanes et al. Cultural Dimensions in Twitter: Time, Individualism and Power. ICWSM’13 [Honorable mention]
Improving the prediction of the communication strength between users from different countries by taking into account several cultural and socio-economic indicators taken from diverse sources.
García-Gavilanes et al. Twitter ain’t Without Frontiers: Economic, Social, and Cultural Boundaries in International Communication. CSCW’14.
USER BEHAVIOR IN MICROBLOGS WITH A CULTURAL EMPHASIS
1. User Behavior in Microblogs
with a Cultural Emphasis
Ruth García-Gavilanes
Advisor :
Ricardo Baeza-Yates
Web Research Group
Universitat Pompeu Fabra
& Yahoo Labs
PhD Thesis Defense
February 26, 2015
2. 2
• The study of the trails left behind by users when they use the web:
Interactions, choices, searches, purchases, etc.
• User interactions are increasingly mediated and shaped by algorithms
and computational methods.
• Massive amount of data
• Great cultural value
User Behavior
• Rise of Computational Social Sciences
Introduction
3. CSS AGENDA
PRESENT
FUTURE
Develop new instruments to tap into the
potential of found data and crowds ‘:
building a telescope for the Social Sciences
Online impacts offline! Build new algorithms
and tools to shift the current configurations
of societies towards better futures.
3Introduction
Claudia Wagner
5. 5
User behavior
Microblogs Culture
• All users: human
recommendations,
behavior evolution
• Cross-country comparisons
Cultural Emphasis
Introduction
6. 6
Friendship links in Twitter do not need to be reciprocal
I follow you
information
Case Study : Twitter
Introduction
#HASHTAGS
@Mentions & Retweets
8. 8Introduction
Background
Kwak et al. What is Twitter, a Social Network or a News Media? WWW’10
Cha et al. Measuring User Influence in Twitter: The Million Follower
Fallacy. ICWSM’10
Bakshy et al. Everyone’s an Influencer: Quantifying influence on Twitter.
WSDM’11
De Choudhury et al. How Does the Data Sampling Strategy Impact the
Discovery of Information Difussion in Social Media? ICWSM’10.
9. 9Introduction
Goals
• Study the effect of recommendations made by users
• Compare the evolution of user behavior through time
• Find differences and similarities across countries
• Study how cultural models can be used with data
• Use cultural models socio economic indicators to
study user behavior
In Microblogs :
10. Contributions
• Providing a study of human generated recommendation on Twitter and its effect.
o García-Gavilanes et al. Follow My Friends This Friday! An Analysis of Human-generated Friendship
Recommendations. SocInfo’13 [Best paper award]
• Describing the evolution of user behavior over time regarding the content they
generate.
o García-Gavilanes et al. Who are my Audiences? A Study of the Evolution of Target Audiences in
Microblogs. SocInfo’14
• Describing differences and similarities of users across countries regarding the way
people tweet and connect with others.
o García-Gavilanes et al. Microblogging without Borders: Differences and Similarities. Websci’11.
o w/ Poblete et al. Do All Birds Tweet the Same? Characterizing Twitter Around the World. In CIKM’11
• Proposing how to combine anthropological studies of culture with large scale data.
• Correlating how and when people tweet with dimensions of national culture and pace
of life
o García-Gavilanes et al. Cultural Dimensions in Twitter: Time, Individualism and Power. ICWSM’13
[Honorable mention]
• Improving the prediction of the communication strength between users from different
countries by taking into account several cultural and socio-economic indicators taken
from diverse sources.
o García-Gavilanes et al. Twitter ain’t Without Frontiers: Economic, Social, and Cultural Boundaries in
International Communication. CSCW’14.
10Introduction
11. 11
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q5) Does culture
influences the way we
tweet online?
Q6) Can culture influence
online interactions with
users from other
nations?
Thesis Structure
Q4) What cultural
models to use?
Introduction
12. 12
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q5) Does culture
influences the way we
tweet online?
Q6) Can culture influence
online interactions with
users from other
nations?
Thesis Structure
Q4) What cultural
models to use?
Introduction
13. Q1) Human Recommendations
Recommendations 13
[Garcia-Gavilanes et al. Follow My Friends This Friday!, SocInfo’13]
Friendship
Recommendations
• Self organized
• Trendy
• Measurable
Track recommendations during 24 weeks
14. Q1) Acceptance
Recommendations 14
Total
Recommendation Instances 59,055,205
Accepted Recommendation
Instances
354,687
Social link recommendations made by
current friends have a measurable effect on
link formation
0.60% instance acceptance
Receiver Recommender Recommendation Week
[Garcia-Gavilanes et al. Follow My Friends This Friday!, SocInfo’13]
4M users
15. Recommendations 15
Follow Friday recommendations
outperform the two alternative
conditions.
Q1) Acceptance
The accepted recommendations
have more longevity than other
links.
[Garcia-Gavilanes et al. Follow My Friends This Friday!, SocInfo’13]
16. Q1) Results
Recommendations 16
Features MAP
All 0.496
User-based 0.074
Relation-based 0.398
Recommendation-based 0.062
User + Relation 0.518
User + Format 0.079
Relation + Format 0.379
USER-BASED
(per user)
• Attention
• Activity
RELATION-BASED
(per pair)
• Tie Strength
• Similarity
RECOMMENDATION-BASED
(per recommendation)
• Repetitions
• Format
The link formation is influenced mostly by the user
and relation-based characteristics
Rotation Forest
140
features
[Garcia-Gavilanes et al. Follow My Friends This Friday!, SocInfo’13]
17. 17
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Q4) What cultural
models to use?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q4) Does culture
influences the way
we tweet online?
Q5) Can culture
influence online
interactions with
users from other
nations?
Thesis Structure
Evolution
18. Active in 2011 & 2013
2011 2013
Users 1,315,313 1,125,968
English
Tweets
406,719,999 256,330,241
Min 1 and max 22 tweet per
working day.
8M
4.3M
770K
1.1M
2011
2013
2011 2013
Q2) DATA
18
Inactive < 1 tweet per day
Hyperactive > 22 per day
530K 570K
1.3M
[García-Gavilanes et al. Evolution of Target Audiences. SocInfo’14]
Evolution
19. Q2)Tweeting Behavior
19
No
Mentions
Tweets
With links
Original tweets (OT)
Without links
Mentions
Re-tweets (RT)
No
Mentions
With links Without links
Mentions
% % % % % % 2011
% % % % % % 2013
Evolution
[García-Gavilanes et al. Evolution of Target Audiences. SocInfo’14]
21. Q2) Users 2011 vs 2013
21
Majority of users remain in the same cluster except the echoers’ group.
Increase in Generalists and Link Feeders.
Mature users tend to use Twitter more as news media.
[García-Gavilanes et al. Evolution of Target Audiences. SocInfo’14]
22. 22
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Q4) What cultural
models to use?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q4) Does culture
influences the way we
tweet online?
Q5) Can culture influence
online interactions with
users from other
nations?
Thesis Structure
Cross-country
23. Q3) Cross-country comparison
• Data: analysis of one year of Tweets for 10 most active
countries
• Content: languages, sentiment, structure (retweets, hashtags,..)
• Structure: network (modularity, average path length, reciprocity,
connectivity)
23Cross-country
24. Q3) Activity and Engagement
24
[Garcia-Gavilanes et al. Microblogging without Borders: Differences and Similarities, WebSci’11]
Cross-country
12M active users
6M with valid location
4M user from 10 most active countries.
5B tweets during 2010
.
25. 25
Countries with more users not
necessarily the most engaged
Cross-country
[Garcia-Gavilanes et al. Microblogging without Borders: Differences and Similarities, WebSci’11]
Q3) Activity and Engagement
26. English
Portuguese
Japanese
Spanish
Bahasa−Indonesia
Bahasa−Malay
Korean
Dutch
German
Italian
Arabic
Users
50M 100M 200M 500M 1000M 2000M
Q3) Languages & Sentiment
26
Netherlands >10%,
Indonesia >10%,
Mexico >10%,
South Korea >10%
English is the most common language
More than 10% in non-english speaking
countries
Non-western countries seem to be more
Positive
Based in Dodds et al., 2011
Cross-country
[Garcia-Gavilanes et al. Microblogging without Borders: Differences and Similarities, WebSci’11]
27. Tweet function
27
Country URL (%) Hashtag (%) Mention (%) Retweet (%)
Indonesia 14.95 7.63 58.24 9.71
Japan 16.30 6.81 39.14 5.65
Brazil 19.23 13.41 45.57 12.80
Netherlands 24.40 18.24 42.33 9.12
UK 27.11 13.03 45.61 11.65
US 32.64 14.32 40.03 11.78
Australia 31.37 14.89 43.27 11.73
Mexico 17.49 12.38 49.79 12.61
South Korea 19.67 5.83 58.02 9.02
Canada 31.09 14.68 42.50 12.50
Some Asian countries seem to chat more (except Japan), use less
URLs, hashtags.
Asian countries seemed to retweet less.
Cross-country
[Garcia-Gavilanes et al. Microblogging without Borders: Differences and Similarities, WebSci’11]
28. Q3) Network
28
Country Reciprocity Avg. Clust.
Coef
Modularity
Indonesia 0.27 0.06 0.54
Japan 0.32 0.06 0.46
Brazil 0.13 0.07 0.46
Netherlands 0.22 0.10 0.41
UK 0.17 0.10 0.39
US 0.19 0.07 0.42
Australia 0.24 0.10 0.45
Mexico 0.17 0.08 0.36
South Korea 0.28 0.09 0.31
Canada 0.26 0.10 0.56
0
5
10
15
20
25
30
35
40
45
Brazil
UK
Mexico
USA
NetherlandsAustralia
Canada
Indonesia
South_KoreaJapan
Countries
Diameter
Avg. Path Length
Reciprocity seems to be significant
specially for Asian countries
High clustering coefficient and less
reciprocity may indicate hierarchical links
Indonesia has highest diameter, which
agrees with the modularity coefficient.
[w/ Poblete et al. Do all Birds Tweet the Same?, CIKM, 2011]
Cross-country
32. • Need cultural models to understand
differences across countries in
Microblogs
32Cross-country
33. 33
` Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Q4) What cultural
models to use?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q4) Does culture
influences the way
we tweet online?
Q5) Can culture influence
online interactions with
users from other
nations?
Thesis Structure
Cultural Models
38. MEASURE CULTURE
• Geert Hofstede: Cultural dimensions
o Different cultural dimensions : Individualism,
Power Distance and others.
• Robert Levine: Pace of Life (Geography of time)
o Different perception of time
• Edward T. Hall: Monochronic vs Polychronic
o Different ways of executing tasks
• Samuel Huntington: Clash of Civilizations
o Politics of identity replacing politics of interest.
38Cultural Models
40. Can such differences also
be captured from online interactions?
40Cultural Models
41. 41
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Q4) What cultural
models to use?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q5) Does culture
influences the way we
tweet online?
Q6) Can culture influence
online interactions with
users from other
nations?
Thesis Structure
Culture
42. Q5) Culture in Tweeting Behavior
• Pace of Life
o Predictability (tweets, mentions)
o Measure entropy of posting tweets in working hours
• Individualism vs. Collectivism
o Users interacting with others (mentions)
• Power Distance : Popularity
o Follow, recommend and accept recommendation
preferentially from more popular users
(in-degree imbalance).
42
[Garcia-Gavilanes et al. Cultural Dimensions in Twitter, ICWSM, 2013]
Culture
43. 43
Tweets Correlation
1. Pace of life
1.1 The higher the pace of life the less
fraction of users will tweet during working
hours
1.2 The higher the pace of life, the more
predictability
1.1 Users **-0.58
1.2 Mentions **0.68
1.2 Tweets **0.62
2. Individualism
2.1 User chat less with others in more
individualistic countries
2.1 Conversation ***−0.55
3. Power Distance
3.1 Users prefer to follow and
3.2 recommend more popular users than
themselves in countries with a higher power
distance
Users followees **0.62
Users and
recommended
users
**0.56
p ≤ 0.005 (***), 0.005 < p ≤ 0.05 (**), and 0.05 < p ≤ 0.1 (*)
Q5) Correlations
Culture
[Garcia-Gavilanes et al. Cultural Dimensions in Twitter, ICWSM, 2013]
44. ●
●
●
●●●
●
● ●
●● ● ●●
●
●
●
● ●
● ●● ●●
●
●
●
● ●
●
Indonesia
Venezuela
Mexico
JapanBrazilColombia
Chile
South Korea Argentina
Philippines
Malaysia Spain NetherlandsTurkey
UKSouth Africa
Singapore Ireland Canada
FranceBelgium
Sweden
Australia
United States
Norway
New Zealand
Italy
Russia India
Germany
80
85
90
95
100
0 25 50 75
Individualism Index
FractionofEngagement
Introduction 44
Q5) Individualism
[Hong et al..
“Language
matters in twitter:
A large scale study”
ICWSM 11]
45. ●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●●
● ●● ● ●●●●
●
●
●
●
Indonesia
Venezuela
Norway
Malaysia
Singapore
Chile
Mexico Philippines
Colombia
United States
South Korea
IndiaBrazil
Canada
ArgentinaAustralia
RussiaItalyNew Zealand
SpainGermany
Japan FranceSouth AfricaUKIreland TurkeyNetherlands BelgiumSweden
−1000
0
1000
2000
3000
4000
5000
30 60 90
Power Distance Index
In−degreeImbalance
Introduction 45
27% of all blog
trends are about
artists and
celebrities [Silang et
al, 2011]
Q5) Power
46. Q5) Why is this important?
46
Indicator Pace of Time:
Predictibility
Individualism:
Mentions
Power
Distance
ImbalanceMentions Users (%)
GDP per capita ***0.55 **-0.57 **-0.41 **-0.48
Education ***0.58 **-0.51 -0.24 ***-0.60
Inequality ***-0.53 **0.49 *0.39 ***0.58
In almost all cases, the findings are are also correlated
with GDP per capita, education and inequality
Culture
[Garcia-Gavilanes et al. Cultural Dimensions in Twitter, ICWSM, 2013]
p ≤ 0.005 (***), 0.005 < p ≤ 0.05 (**), and 0.05 < p ≤ 0.1 (*)
47. 47
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q4) What cultural
models to use?
Q2) How do user behavior
evolve over time?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q5) Does culture
influences the way
we tweet online?
Q6) Can culture
influence online
interactions with
users from other
nations?
Thesis Structure
Communication
49. 5K country – country pairs
interactions
see you next time @pedro
@John @pedro
49
10 weeks
Q6) Country-country Interactions
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW 2014]
Communication
111 countries
3B Geolocated Tweets
Example:
13M Geolocated users
50. 5K country – country pairs
interactions
50
10 weeks
Q6)Social, economic and cult. features
Communication
Distance
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW 2014]
51. 51
Q6) Top 1000 strongest edges
Using the gravity model the network is largely
clustered according to their geography
Communication
Asia
Latin America
Middle East
The West
Edges: gravity model
Force-directed algorithm
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW 2014]
52. Edges: Unique Mentions
Force-directed algorithm
52
Q6) Top 1000 strongest edges
Communication
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW 2014]
53. Unique Mentions
53
Q6) Top 1000 strongest edges
Communication
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW 2014]
54. Introduction 54
Argentina
Australia
Brazil Canada
Chile
Colombia
Dominican Republic
France
Germany
India
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand Nigeria
Philippines
Puerto Rico
Singapore
South Africa
South Korea
Spain
Sweden
United Kingdom
United States
Venezuela
Q6) Top 50 strongest edges
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW, 2014]
55. 5K country – country pairs
interactions
481 country – country pairs
with social, economic and
cultural features
55
10 weeks
Q6)Social, economic and cult. features
Communication
Distance +
Economics +
Social +
Cultural
[ Garcia-Gavilanes et al. Twitter ain’t without Frontiers, CSCW 2014]
58. Predictor P-value
Trade
6.34
***
Cultural Dimension
3.91
***
Gravity Model x Exports 3.78
**
Gravity Model
2.79
***
Language
2.70
.
β(%)
Culture
Distance
Economic
Social
Communication 58
Q6)Features
59. 59
Data Mining Cultural
All users
Q1) What is the effect
on users from
Human generated
recommendations?
Q2) How do user behavior
evolve over time?
Q4) What cultural
models to use?
Cross-country
Q3) Do all users from
different countries tweet the
same?
Q5) Does culture
influences the way we
tweet online?
Q6) Can culture influence
online interactions with
users from other
nations?
Thesis Structure
Communication
60. Conclusions & Future Work
Conlusions 60
Human recommendations
Evolution of behavior
• Recommendations by users have a
measurable effect on link formation
• Adoption of microblogs as a news
media rather than as a social network
• Replicate studies in other platforms
• Cross-cultural recommendation
• Self-organized trends and monetary
consequences
• Cross-cultural evolution
next
61. Conlusions 61
Cross-country comparison
Tweeting behavior
Communication
• The collective behavior differ in certain
characteristics: chatting engagement,
reciprocity, modularity, communities.
• National culture determine the temporal
patterns with which Twitter users post,
or the extent to which they mention,
follow, recommend and befriend others.
• In addition to distance, socio-economic
and cultural features also impact
international communication.
Conclusions & Future Work
next
• Application to improve communication across-
cultures like machine translation (already existent:
WeChat)
• China and the rest of the world: two online worlds that
will meet
63. The End 63
Acknowledgements: Ricardo Baeza-Yates, Daniele Quercia, Yelena Mejova
Neil O’Hare, Luca Maria Aiello, Alejandro Jaimes, Barbara Poblete,
Marcelo Mendoza, Andreas Kaltenbrunner, Diego Sáez-Trumper, Pablo Aragón,
David Laniado, Ilaria Bordino, Sara Haijan, Amin Mantrach .
Acknowledgements
64. Publications
• Ruth García-Gavilanes, Barbara Poblete, Marcelo Mendoza, Alejandro Jaimes.
Microblogging without Borders: Differences and Similarities. In The 3rd International
Conference on Information and Knowledge Management (Websci), ACM, 2011.
• Barbara Poblete, Ruth García-Gavilanes, Marcelo Mendoza, Alejandro Jaimes. Do All Birds
Tweet the Same? Characterizing Twitter Around the World. In The 20th International
Conference on Information and Knowledge Management (CIKM), ACM, 2011
• Ruth García-Gavilanes, Neil O’Hare, Luca Maria Aiello, Alejandro Jaimes. Follow My
Friends This Friday! An Analysis of Human- generated Friendship Recommendations. In
The 5th International Conference on Social Informatics (SocInfo), Springer 2013. [Best
paper award]
• Ruth García-Gavilanes, Andreas Kaltenbrunner, Diego Sáez-Trumper, Ricardo Baeza-
Yates, Pablo Aragòn and David Laniado. Who are my Audiences? A Study of the Evolution
of Target Audiences in Microblogs. In The 6th International Conference on Social
Informatics (SocInfo), Springer 2014.
• Ruth García-Gavilanes, Daniele Quercia, Alejandro Jaimes. Cultural Dimensions in Twitter:
Time, Individualism and Power. In The 7th International AAAI Conference on WebLogs and
Social Media (ICWSM), 2013. [Honorable mention]
• Ruth García-Gavilanes, Yelena Mejova, Daniele Quercia. Twitter ain’t Without Frontiers:
Economic, Social, and Cultural Boundaries in International Communication. In The 17th
ACM Conference on Computer Supported Cooperative Work and Social Computing
(CSCW), 2014.
The End 64
65. Selected References
• Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter,
a Social Network or a News Media? In Proceedings of the 19th international
conference on World Wide Web, ACM 2010
• Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P.
Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In
International AAAI Conference on Weblogs and Social Media (ICWSM)
• Katharina Reinecke, Minh Khoa Nguyen, Abraham Bernstein, Michael Naf, and
Krzysztof Z. Gajos. Doodle Around the World: Online Scheduling Behavior
Reflects Cultural Differences in Time Perception and Group Decision-Making. In
Proceedings of the 16th ACM Conference on Computer Supported Cooperative
Work and Social Computing (CSCW’13)
• Peter S. Dodds, Kameron D. Harris, Isabel M. Kloumann, Catherine A. Bliss, and
Christopher M. Danforth. Temporal patterns of happiness and information in a
global social network: Hedonometrics and Twitter. PLOS ONE, 2011.
• Geert Hofstede, Gert Jan Hofstede, and Michael Minkov. Cultures and
Organizations: Software of the Mind. McGraw-Hill, 2010.
The End 65