In this paper, we extensively study the impact of social signals (users' actions) obtained from several social networks on search ranking task. Social signals associated with web resources (documents) can be considered as an additional information that can play a vital role to estimate a priori importance of these resources. Particularly, we are interested in the freshness of signals and their diversity. We hypothesize that the moment (the date) when the user actions occur and the diversity of actions may impact the search performance. We propose to model these heterogeneous social features as document prior. We evaluate the effectiveness of our approach by carrying out extensive experiments on two different INEX datasets, namely SBS and IMDb, enriched with several social signals collected from social networks. Our experimental results consistently demonstrate the interest of integrating fresh and diverse signals in the retrieval process.
Fresh and Diverse Social Signals: Any Impacts on Search?
1. Fresh and Diverse Social Signals
Ismaïl BADACHE*, Mohand BOUGHANEM**
*LSIS Lab, Aix-Marseille University, France
**IRIT Lab, Toulouse University, France
{badache, boughanem}@irit.fr
Any Impacts on Search ?
3. 1.1 Emergence of social Web
1
Source:
blogdumoderateur.com
% Sharing in SN
More than 3.5 billion Internet users
74% among them are resgistred at least in one SN
Social content per 1 minute
70000 Posts
2.3 Million « Like »
~410 GB of Data
Facebook
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
6. Video
Image
Web Page
Web Resources
Resource
.
.
.
Social Networks
Bookmark
Comment
Share/Recommend
Rating/Vote
Like/+1
Interactions
Social Signals
(User Generated Content)
Nature
Origin
Signification
Temporality
Diversity
Rating
5
4
3
2
1
3
7. 1.2 Research Issues
How to take into account Social Signals and their temporality (e.g. age of
each action of rating) to estimate the importance of a resource?
1
What is the impact of Signals’ diversity on IR process?2
What theoretical model to combine a priori relevance of resource with its
topical relevance?
3
4
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
8. 2.1 Related Work
Sources of evidence (Social Features) Properties Models Authors
Time-Independent Social Signals Approaches
• Number of : clicks, votes, records and
recommendations, likes, share, +1, tweet, etc
Popularity
Reputation
LM and Linear
Combination
(Karweg et al., 2011)
(Badache et al., 2015)
• Number of : like, dislike, comments on
YouTube.
• The playcount (number of times a user
listens to a track on lastfm)
Importance
Machine
Learning and
Linear
Combination
(Chelaru et al., 2012)
(Khodaei et al. 2012)
(Buijs et Spruit, 2014)
• Nombre de retweets. Popularity
Machine
learning
(Yang et al., 2012)
(Hong et al., 2011)
Time-Dependent Social Signals Approaches
• Analysis of social signals to classify user
interest and interactions in 5 classes: recent,
ongoing, seasonal, past et random.
Temporal
interests
Statistic Study
(Khodaei et Alonso,
2012)
• Exploit the time click called ClickBuzz to
measure the interest of a document over
time.
Buzz over
time
Machine
learning
(Inagaki et al., 2010)
5
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
9. 2.2 Positioning
Evaluating the impact of the freshness of signals on the search performance
by using their creation date,
1
Considering diversity of signals as an additional factor in the estimation of
the resource relevance.
3
6
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Normalizing the distribution of signals on the resource using the age of the
resource
2
11. 3.1 « Social » Representation of a document
7
Resource (Document)
- Like
- +1
- Share
- Rating
- ….
Mono-valued Signal
e.g. Like (number of like)
Multi-valued Signal
e.g. Rating (number of Rating and the value of Rating)
Textual Representation
• Keywords : 𝐷 𝑤={𝑤1, 𝑤2, … 𝑤𝑧}
Social Representation
• Social Signals : 𝐷 𝑎={𝑎1, 𝑎2, … 𝑎 𝑚}
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
12. 3.2 A priori Importance of Document
8
𝑃 𝐷 𝑄 = 𝑟𝑎𝑛𝑘 𝑃 𝑫 ∙ 𝑃 𝑄 𝐷)
𝑃 𝑎𝑖
a priori probability of
document D
Textual Model
Query/Content
ෑ
𝑎𝑖∈ 𝐴
𝑃 𝑎𝑖
Individually Grouped
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
13. 3.3 Estimating Priors
9
𝐵𝐴 𝐷 =
𝑚𝑜𝑦 𝑟 ∙ 𝑟 + σ 𝐷′∈𝑅 𝑚𝑜𝑦 𝑟′ ∙ |𝑟′|
𝑟 + σ 𝐷′∈𝑅 |𝑟′|
𝑃 𝑎𝑖 = 𝑅𝑎𝑡𝑖𝑛𝑔 =
1 + log(1 + 𝐵𝐴 𝐷 )
1 + log(1 + σ 𝐷′∈𝑅 𝐵𝐴(𝐷′))
Estimated by maximum
likelihood and smoothing by
Dirichlet
𝑷 𝒂𝒊
𝑃 𝑎𝑖 =
𝐶𝑜𝑢𝑛𝑡(𝑎𝑖, 𝐷)
𝐶𝑜𝑢𝑛𝑡(𝑎., 𝐷)
Bayesian Average BA
Multi-valued : RatingMono-valued
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
15. 4.1 Hypothesis (1) : Freshness of Signal
10
Time
Simple CountingResource R1
+1
1
1+1
1+1
Weighted Counting
by action date
Resource R1
+1
0.5
0.75+1.5
0.72+1.2
Boosting resources associated with Fresh signals
t1 t2 t3 t4 t5
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
16. 4.2 Freshness of Signal
11
• Signal biased by its creation date
𝐶𝑜𝑢𝑛𝑡𝑡 𝑎
𝑡𝑗,𝑎𝑖
, 𝐷 =
𝑗=1
𝑘
𝑓 𝑡𝑗,𝑎𝑖
, 𝐷 𝑓 𝑡𝑗,𝑎 𝑖
, 𝐷 = 𝑒𝑥𝑝 −
∥ 𝑡 𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑡𝑗,𝑎𝑖
∥2
2𝜎2
• Rating biased by its creation date
𝐵𝐴 𝑡 𝐷 =
𝑚𝑜𝑦 𝑟𝑡 ∙ 𝑟𝑡 + σ 𝐷′∈𝑅 𝑚𝑜𝑦 𝑟𝑡′ ∙ |𝑟𝑡′|
𝑟𝑡 + σ 𝐷′∈𝑅 |𝑟𝑡′|
𝑟𝑡 = 𝑟 ∙ 𝑒𝑥𝑝 −
∥ 𝑡 𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑡 𝑟 ∥2
2𝜎2
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
17. 4.3 Hypothesis (2) : Resource Age
12
Resource R1 : Age = 1 month
188
52
12+1
Resource R2 : Age = 1 day
2
1
0+1
The "old" resources are more likely to get more signals that the "recent"
Normalization of signals by the age of the resource
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
18. 4.4 Signals Normalization with Resource Age
13
𝐶𝑜𝑢𝑛𝑡𝑡 𝐷
𝑎𝑖, 𝐷 = 𝐶𝑜𝑢𝑛𝑡 𝑎𝑖, 𝐷 ∙ 𝐴(𝐷)
𝐴(𝐷) = 𝑒𝑥𝑝 −
‖𝑡 𝑎𝑐𝑡𝑢𝑒𝑙 − 𝑡 𝐷‖2
2𝜎2
• Normalize the distribution of signals by the publication date of the
resource (age of the resource).
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
19. 4.5 Evaluation
14
• Evaluation Framework :
- Two INEX Datasets: IMDb et SBS.
- Collection of social signals for each document (IMDb and SBS).
Dataset Documents Topics
IMDb 2011 1.5 millions 30
SBS 2015 : Suggestion Track 2.8 millions 208
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
• Objectives
1) Evaluate the impact of signals on the performance of IR systems,
2) Evaluate the impact of signal freshness and resource age.
20. 4.5 Evaluation : IMDb Dataset
15
Textual Content: INEX IMDb 2011
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
21. Textual Content: INEX SBS 2015
4.5 Evaluation : SBS Dataset
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Field Field Field FieldStatus Status Status Status
16
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
22. 4.6 Evaluation : Social Signals
17
Social Content: 9 signals are collected.
DELICIOUS
Bookmark IMDb
WITTER
Tweet IMDb
GOOGLE+
+1 IMDb
Share
LINKED
IMDb
MAZON
Tag
Rating
SBS
ACEBOOK
Like
Share
Comment
SBS,
IMDb
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
23. 4.7 Evaluation : Social Signals
18
Example:
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
24. 4.8 Evaluation : Results on IMDb
0,4325
nDCG
LM.Hiemstra
0,3403
P@20
LM.Hiemstra
Baseline
Facebook
0,513
0,5262
0,5121
0,4769
0,5017
0,4621 0,4566
nDCG
Like Share Comment Tweet +1 Bookmark Share(LIn)
0,362
0,3649
0,3551
0,3512
0,3468
0,3414 0,3432
P@20
Like Share Comment Tweet +1 Bookmark Share(LIn)
Without considering Time
+6%+7%
+16%
+10%
+18%
+22%
+18%
+11%+12%
+9% +8%
+7%
+5% +6% 0,362
0,3721
0,3683
0,3579
0,3511
0,3427 0,3449
P@20
Like Share Comment Tweet +1 Bookmark Share(LIn)
+18%
+15%
+10%
+14%
+10%
With considering Resource Age
0,5308 0,5544 0,5285
0,4903
0,5246
0,4671 0,4606
nDCG
Like Share Comment Tweet +1 Bookmark Share(LIn)
+20%+30%+23% +23% +13% +7% +7%
Facebook
+5% +6%
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
25. 4.8 Evaluation : Results on SBS
0,0689 0,0711 0,0678
0,0559 0,0531
P@20
Like Share Comment Rating Tag
Without considering Time
0,1864
0,19
0,1807
0,1748 0,1742
nDCG
Like Share Comment Rating Tag
+7%+8%
+11%
+17%
+15%
+23% +27%
Baseline
0,05
P@20
LM.Hiemstra
0,162
nDCG
LM.Hiemstra
Facebook
+21%
+11% +5%
Considering Resource Age
0,0708
0,0796
0,0711 0,0695
0,058
P@20
Like Share Comment Rating Tag
0,19
0,2001
0,1882
0,1855
0,1771
nDCG
Like Share Comment Rating Tag
10%
+30%+25%
+39%
+26%
+9%
+13%
+15%
+22%
+17%
Date of Signal
0,0732
P@20
Rating
0,1904
nDCG
Rating
+33%
+10%
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
26. 3.1 Proposed Approach4.9 Evaluation : Features Selection Algorithms (SBS)
21
--- : Highly selected
--- : Moderately selected
--- : Less selected
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
28. 5.1 Hypothesis
22
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
• Diversity of signals on a resource is a clue that may indicate an interest
beyond a social network or a community, i.e., a resource dominated by a
single signal should be disadvantaged against a resource with an equitable
distribution of the signals.
• Nature
• Origin
29. 5.2 Estimating Diversity of Signals
23
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
• Estimation of diversity using Shannon diversity index.
• The interest (importance) of a resource.
𝑃 𝐷 = ෑ
𝑎 𝑖 ∈𝐴
𝑃 𝑎𝑖 ∙ 𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 𝐷
𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 𝐷 = −
𝑖=1
𝑚
𝑃(𝑎𝑖) ∙ 𝑙𝑜𝑔 𝑃 𝑎𝑖
30. 5.3 Evaluation : Results on IMDb & SBS
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
0,0868
0,197
0,0988
0,2095
P@20 nDCG
TotalFacebook All Criteria
0,4102
0,5681
0,4262
0,5974
P@20 nDCG
TotalFacebook All Criteria
0,081
0,1937
0,0787
0,1981
P@20 nDCG
TotalFB All Criteria
0,4187
0,5713
0,4318
0,6174
P@20 nDCG
TotalFacebook All Criteria
0,084
0,1945
0,0915
0,2031
P@20 nDCG
TotalFB All Criteria
Without Diversity With Diversity
IMDb
SBS +7%
+4%
+1%
+16%
+1%
SBS
Diversity & Resource Age
0,4289
0,5966
0,4334
0,6311
P@20 nDCG
TotalFacebook All Criteria
IMDb
+5%
+5%
+6%
+7% +26%
+2%
+2%
+7%
+1%
+1%
32. 3.1 Proposed Approach6. Conclusion and Perspectives
25
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Contributions
• IR model based on social signals.
• Taking into account the temporal aspect.
- « Freshness » of signal,
- Normalization of signal by resource age.
• Diversity of signals in the resource.
- Diversity in terms of nature and origin.
• Enrichment of test datasets (IMDb, SBS).
Lessons
• Social signals are fruitful for IR.
Facebook signals and Ratings are the
most relevant features.
Bookmark is the weakest signal in
terms of relevance.
• Temporality and diversity have a positive
impact on results.
Perspectives
• Taking into account the weight of the signals
• Studying the importance of social networks
and the actors of these signals and their
impact on the relevance,
• Studying the personalization of search
results according to the user's social
interactions and their temporality (Profile),
• Studying the polarity of the text content of
signals,
• Leveraging signals in other frameworks,
such as: analyzing the behavior of users of
social networks in a disaster, monitoring
patients (suicide attempt), recommendation.