Diversifying Contextual Suggestions from Location-based Social Networks
M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis
A talk at the IIiX 2014 conference in Resenburg
1. Diversifying Contextual Suggestions from
Location-based Social Networks
M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis
University of Glasgow
IIiX 2014, Regensburg, Germany
@dyaaa
2. The Task of Contextual
Suggestions
Entertain me!
Elfreths Alley Museum
Eastern State Penitentiary
Round Guys Brewing Co
c
Darlings Cafe
Reading Terminal Market
Chinatown
Location ( Springfield )
This is an important IR task when considering new Smart City
environments (recent i-ASC 2014 workshop in ECIR)
2
Zero-query
3. Challenges in Contextual
Suggestion
Ambiguity of the zero-query
• Accurately representing the user’s preferences.
• Existing approaches (e.g. [1]) model the direct low-level interests of the user.
• Collaborative Filtering approaches (e.g. [2]) can be employed to infer higher
level of interests (need a large number of users in a social network setting).
Ranked list of suggestions
Abraham Lincoln Presidential Library
& Museum
Illinois State Museum
Dana-Thomas House
Lincoln Home Visitor's Center
President Abraham Lincoln Hotel
Redundancy of suggestions
• If there are lots of museums in an area,
then we are likely to recommend many of
them to a user who is interested in
museums – but would a user like to visit
multiple in a single trip?
[1] P. Yang and H. Fang. Opinion-based User Profile Modeling for Contextual Suggestions. In Proceedings of ICTIR, 2013.
[2] A. Noulas, S. Scellato, N. Lathia, and C. Mascolo. A Random Walk around the City: New Venue Recommendation in
Location-based Social Networks. In Proceedings of PASSAT, 2012 3
4. Contributions
Adapt a diversification approach to deal with ambiguity and
redundancy
• We adapt of a state-of-the-art approach, the xQuAD framework [3].
• Aim is to balance between matching the user’s low-level interests and
covering the inferred high-level venue categories. (restaurants, shops..)
• Categories obtained from Location-based Social Networks (LBSNs), namely
FourSquare and Yelp.
Alleviate the limitations of a social network setting
• We have extended our approach by developing a classifier for predicting the
category of a venue from its public profile (a web page)
Thorough evaluation using the TREC 2013 Contextual Suggestion
track (it serves as a user study!)
4
[3] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting Query Reformulations for Web Search Result Diversification. In Proceedings
of WWW, 2010.
5. Outline
• Language Modelling for Contextual Suggestions
• Category Diversification
• Venue Category Prediction
• Evaluation
• Conclusions
5
7. Contextual Suggestion
The aim is to rank venues for a
location and a given user
− Venues can be obtained from a LBSN
or the web.
Ranking venues in a location based
on a language model
− Build a language model of the venue
(description of the venue from its
home page)
− Build a profile of the user from
venues they rated explicitly before.
Location ( Springfield )
c
7
r ( , ) ?
8. Building the User Profile
8
user
Elfreths Alley Museum
Eastern State Penitentiary
Round Guys Brewing Co
c
Darlings Cafe
Reading Terminal Market
Chinatown
Museum
Alley
Brewing
History
Elfreths
Beers
......
Positive User Profile
Bakery
Farmer
Market
Chinatown
......
Negative User Profile
9. Ranking Venues
user
Location ( Springfield )
c
α. KL r ( , ) = ( || ) - (1- α). KL ( || )
• Divergence between the language model of the venue
(the document) and the user profile (the query)
• Linear combination for both profiles to estimate the final
Dana Thomas House
architecture
museum
house
art glass
historic
preservation
Venue Profile
score.
r ( , ) ?
9
11. Incorporating Diversity
Recall that due to bias towards top categories, we may
recommend many similar venues
− e.g. Lots of museums in Washington
Our diversification approach aims to
− Maximise coverage of venue categories in top ranked results
−Incorporate the user’s preference for specific venue categories
(personalised diversification)
Diversified Suggestions
Abraham Lincoln Presidential Library &
Museum
National Museum of Surveying
Del's Popcorn Shop
The Globe Tavern
Illinois State Museum
11
Ranked list of suggestions
Abraham Lincoln Presidential Library &
Museum
Illinois State Museum
Dana-Thomas House
Lincoln Home Visitor's Center
President Abraham Lincoln Hotel
12. xQuAD for diversifying
contextual suggestions
Adapt an explicit web search results diversification approach
− Consider the high-level venue categories underlying a user profile to be
equivalent to query aspects
−Adapt the xQuAD [3] framework due to its effectiveness in Web Search
12
Category importance:
Personalised vs. non-personalised
Venue
relevance Venue
Categories
?
Category
importance
category
coverage
Venue Novelty
Can be estimated using
our LM approach
Finite state of categories.
Categorisation schemes in
LBSN (Yelp, FourSquare)
Coverage: it is calculated based on the
category of the venue
r ( , )
[3] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting Query Reformulations for Web Search
Result Diversification. In Proceedings of WWW, 2010.
13. Category Importance
?
To estimate the category importance in the xQuAD framework
1. Non-Personalised diversification: same importance for all categories and
all users.
Uniform: with 10 categories = 1/10 for any category and all users.
2. Personalised diversification: infer the category of interest to the user
from her positive and negative profiles.
How? Marginalisation of probabilities over all the venue in the original
ranking using the LM approach
13
Venue category
What if the venue is not in the LBSN??
Venue
relevance
Can be estimated using
our LM approach
Can be obtained from
the LSBN.
? r ( , )
15. Venue Category Prediction
Predicting the category of a venue
−Venues may not be available in LBSNs. (e.g. when we consider the web for
recommendation)
−Generalise our approach beyond a single LBSN
−We developed an approach for estimating given a web page that
represents the venue
How?
−Using a textual classifier trained with top search results from a large web
collection (ClueWeb12) for a large sample of venues in two LBSNs (Yelp and
FourSquare)
15
16. Venue Category Prediction
16
Venue: Tierra Cafe
Category: restaurant
d1
Web Collection
Tierra Cafe - Downtown
- Los Angeles, CA | Yelp
www.yelp.com/biz/tierra-cafe-los-angeles
d2
dk
Tierra Cafe & Grill, Harrisburg -
Restaurant Reviews -
TripAdvisor
www.tripadvisor.com/...erra_Cafe_
Grill-
Harrisburg_Pennsylvania.html
Tierra Cafe & Grill -
Harrisburg | Urbanspoon
www.urbanspoon.com/r/160/1657
133/restaurant
Retrieved web documents
(d1, restaurant)
(d2, restaurant)
(dk, restaurant)
Learning instances
Classifier
(supervised machine learning)
2. retrieve
1.sample
3. train
Features: document terms
17. Venue Category Prediction
Home Page
classify
http://artsbma.org/
Classifier
Category Prob.
Arts and Entertainment 0.5
Shopping 0.4
Food 0.05
…
v
Category: ?
Evaluation
• Samples from 2 LBSNs (5000 from FourSquare & 5000 from Yelp)
• Retrieval models : BM25 & LTR approaches (AFS and LambdaMART)
• Supervised learning: Naïve Bayes, J48, Random Forests and SVM.
• Results are consistent on both LBSNs.
− Best accuracy is achieved with LambdaMART (for retrieval) and Random Forests (for
supervised learning). F-1=0.60 approximately. 17
19. Evaluation using the TREC 2013 Contextual Suggestion track
• 223 unique pairs of users and contexts (locations): 115 users in 36 unique
locations (city centres)
• Each user has explicitly rated 50 sample venues
Venue Sources & Categories
• Crawled venues from FourSquare and Yelp for the considered city centres
using 4km2 grids centred at those locations
Web
Collection
Experimental Setup
ClueWeb12 CS FourSquare Cats. (6) 30,144
ClueWeb12 CS Yelp Cats. (10) 30,144
19
Venue Sources Categories # Venues
Specific LBSN FourSquare FourSquare Cats.(6) 60,212
Yelp Yelp Cats (10) 7,096
Apply our
venue
category
prediction
approach
Models Setup
• α=0.5 (Equal weights for the positive and negative profiles)
• λ=0.5 for xQuAD (Equal weights for the relevance and diversity components)
20. Research Questions
RQ1: Can our diversification approach improve the
quality of contextual suggestion over the LM baseline?
RQ2: What is the contribution of the diversity to the
effectiveness of recommendation for different types of
users?
20
21. 0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
LM baseline
Non-personalised xQuAD
Personalised xQuAD
+4.5%
-2.4%
+6.9%
-1.6%
p@3 P@5 MRR
+2.5%
-0.6%
Results - FourSquare
21
• Personalised diversification improves
effectiveness over the LM baseline.
• Better Improvements at higher cut-offs.
• Non-personalised diversification harms
effectiveness marginally
• Similar patterns observed in the Yelp
dataset (details in the paper)
LM Baseline Non-pers. xQuAD Pers. xQuAD
judged@5 67.98% 63.94% 68.43%
22. Results – ClueWeb12 CS
22
FourSquare Categories Yelp Categories
LM baseline
Non-personalised xQuAD
Personalised xQuAD
+10.17%
-5.86%
+8.89%
+1.23%
LM Baseline Non-pers. xQuAD Pers. xQuAD
0.250
0.200
0.150
0.100
0.050
j@5 26.78% 27.22% 28.10%
LM baseline
Non-personalised xQuAD
Personalised xQuAD
+7.72%
-10.22%
+10.00%
0.00%
LM Baseline Non-pers. xQuAD Pers. xQuAD
26.78% 27.04% 26.60%
• As before, consistent improvement for the personalised
diversification over the LM baseline for the various measures
• Using either categorisations (FourSquare or Yelp) produces consistent
results
0.000
p@3 P@5 MRR
+4.47%
-4.71%
p@3 P@5 MRR
+2.24%
-3.30%
23. Analysis
Users are different in terms of the variety of their interests
• To measure this variation, we measure the entropy of category probability
distribution for a given user
• The difference is mostly negative
• The difference is minimal for most
• Low entropy users have few venue categories of interest
• High entropy users have a variety of equal interests to many venue categories
23
(86% of users)
• Diversification approach succeeds in
providing a diverse list of venues
matching the user’s interests
users.
• However in 30% of the cases, the
original ranking was better
Top 50 users ranked by category entropy Least 50 users ranked by category entropy
25. Conclusions
Diversification can improve effectiveness of contextual
suggestions when it is personalised.
• Up to 10% over a LM baseline in p@5
• Consistent results on different datasets
Users with higher variety of interests benefits most from
diversification of contextual suggestions
• 86% of high-variety users benefited from diversification
25