SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Understanding Similarity Metrics in
Neighbour-based Recommender Systems
Alejandro Bellogín, Arjen de Vries
Information Access
CWI

ICTIR, October 2013
Motivation


Why some recommendation methods perform better than others?

2

Alejandro Bellogín – ICTIR, October 2013
Motivation


Why some recommendation methods perform better than others?



Focus: nearest-neighbour recommenders
• What aspects of the similarity functions are more important?
• How can we exploit that information?

3

Alejandro Bellogín – ICTIR, October 2013
Context


Recommender systems
• Users interact (rate, purchase, click) with items

4

Alejandro Bellogín – ICTIR, October 2013
Context


Recommender systems
• Users interact (rate, purchase, click) with items

5

Alejandro Bellogín – ICTIR, October 2013
Context


Recommender systems
• Users interact (rate, purchase, click) with items

6

Alejandro Bellogín – ICTIR, October 2013
Context


Recommender systems
• Users interact (rate, purchase, click) with items

• Which items will the user like?
7

Alejandro Bellogín – ICTIR, October 2013
Context


Nearest-neighbour recommendation methods
• The item prediction is based on “similar” users

8

Alejandro Bellogín – ICTIR, October 2013
Context


Nearest-neighbour recommendation methods
• The item prediction is based on “similar” users

9

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – different neighbours

10

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – different recommendations

11

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – different recommendations

s(

,

)

sim(

12

Alejandro Bellogín – ICTIR, October 2013

, )s(

,

)
Research question


How does the choice of a similarity metric
determine the quality of the recommendations?

13

Alejandro Bellogín – ICTIR, October 2013
Problem: sparsity


Too many items exist, not enough ratings will be available



A user’s neighbourhood is likely to introduce not-so-similar users

14

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – which one is better?


Consider Cosine vs Pearson similarity



Most existing studies report Pearson correlation to lead superior
recommendation accuracy

15

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – which one is better?


Consider Cosine vs Pearson similarity



Common variations to deal with sparsity
• Thresholding: threshold to filter out similarities (no observed difference)
• Item selection: use full profiles or only the overlap
• Imputation: default value for unrated items

16

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – which one is better?



Which similarity metric is better?
• Cosine is not superior for every variation



Which variation is better?
• They do not show consistent results



Why some variations improve/decrease performance?
→ Analysis of similarity features

17

Alejandro Bellogín – ICTIR, October 2013
Analysis of similarity metrics


Based on
• Distance/Similarity distribution
• Nearest-neighbour graph

18

Alejandro Bellogín – ICTIR, October 2013
Analysis of similarity metrics


Distance distribution



In high dimensions, nearest neighbour is unstable:
If the distance from query point to most data points is less than (1 + ε)
times the distance from the query point to its nearest neighbour

Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
19
Alejandro Bellogín – ICTIR, October 2013
Analysis of similarity metrics


Distance distribution
• Quality q(n, f): fraction of users for which the similarity function has ranked at
least n percentage of the whole community within a factor f of the nearest
neighbour’s similarity value

21

Alejandro Bellogín – ICTIR, October 2013
Analysis of similarity metrics


Distance distribution
• Quality q(n, f): fraction of users for which the similarity function has ranked at
least n percentage of the whole community within a factor f of the nearest
neighbour’s similarity value
• Other features:

22

Alejandro Bellogín – ICTIR, October 2013
Analysis of similarity metrics


Nearest neighbour graph (NNk)
• Binary relation of whether a user belongs or not to a neighbourhood

23

Alejandro Bellogín – ICTIR, October 2013
Experimental setup


Dataset
• MovieLens 1M: 6K users, 4K items, 1M ratings
• Random 5-fold training/test split



JUNG library for graph related metrics



Evaluation
• Generate a ranking for each relevant item, containing 100 not relevant items
• Metric: mean reciprocal rank (MRR)

24

Alejandro Bellogín – ICTIR, October 2013
Performance analysis


Correlations between performance and features of each similarity
(and its variations)

25

Alejandro Bellogín – ICTIR, October 2013
Performance analysis – quality


Correlations between performance and characteristics of each
similarity (and its variations)



For a user
• If most of the user population is far away, low quality correlates with
effectiveness (discriminative similarity)
• If most of the user population is close, high quality correlates with
ineffectiveness (not discriminative enough)
Quality q(n, f): fraction of users for which the similarity function has ranked at least n percentage of
the whole community within a factor f of the nearest neighbour’s similarity value
26

Alejandro Bellogín – ICTIR, October 2013
Performance analysis – examples

27

Alejandro Bellogín – ICTIR, October 2013
Conclusions (so far)


We have found similarity features correlated with their final
performance
• They are global properties, in contrast with query performance predictors
• Compatible results with those in database: the stability of a metric is related
with its ability to discriminate between good and bad neighbours

28

Alejandro Bellogín – ICTIR, October 2013
Application


Transform “bad” similarity metrics into “better performing” ones
• Adjusting their values according to the correlations found



Transform their distributions
• Using a distribution-based normalisation [Fernández, Vallet, Castells, ECIR 06]
• Take as ideal distribution ( F ) the best performing similarity (Cosine Full0)

29

Alejandro Bellogín – ICTIR, October 2013
Application


Transform “bad” similarity metrics into “better performing” ones
• Adjusting their values according to the correlations found



Transform their distributions
• Using a distribution-based normalisation [Fernández, Vallet, Castells, ECIR 06]
• Take as ideal distribution ( F ) the best performing similarity (Cosine Full0)



Results

The rest of the
characteristics
are not
(necessarily)
inherited
30

Alejandro Bellogín – ICTIR, October 2013
Conclusions


We have found similarity features correlated with their final
performance
• They are global properties, in contrast with query performance predictors
• Compatible results with those in database: the stability of a metric is related
with its ability to discriminate between good and bad neighbours



Not conclusive results when transforming bad-performing
similarities based on distribution normalisations
• We want to explore (and adapt to) other features, e.g., graph distance
• We aim to develop other applications based on these results, e.g., hybrid
recommendation

31

Alejandro Bellogín – ICTIR, October 2013
Thank you

Understanding Similarity Metrics in
Neighbour-based Recommender Systems
Alejandro Bellogín, Arjen de Vries
Information Access
CWI
ICTIR, October 2013
32

Alejandro Bellogín – ICTIR, October 2013
Different similarity metrics – all the results


Performance results for variations of two metrics
• Cosine
• Pearson



Variations
• Thresholding: threshold to filter out similarities (no observed difference)
• Imputation: default value for unrated items

33

Alejandro Bellogín – ICTIR, October 2013
Beyer’s “quality”

34

Alejandro Bellogín – ICTIR, October 2013

Contenu connexe

Similaire à Understanding Similarity Metrics in Neighbour-based Recommender Systems

Efficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced RecommendationEfficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced Recommendation
Toshihiro Kamishima
 
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
Experiments on Generalizability of User-Oriented Fairness in Recommender SystemsExperiments on Generalizability of User-Oriented Fairness in Recommender Systems
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
Hossein A. (Saeed) Rahmani
 
How Social Institutions Impact E-Accessibility Policy
How Social Institutions Impact E-Accessibility PolicyHow Social Institutions Impact E-Accessibility Policy
How Social Institutions Impact E-Accessibility Policy
George Giannoumis
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
PennDOT Planning Partners Conference 2009
PennDOT Planning Partners Conference 2009PennDOT Planning Partners Conference 2009
PennDOT Planning Partners Conference 2009
jonathanmalpass
 

Similaire à Understanding Similarity Metrics in Neighbour-based Recommender Systems (20)

Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
 
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Correcting Popularity Bias by Enhancing Recommendation Neutrality
Correcting Popularity Bias by Enhancing Recommendation NeutralityCorrecting Popularity Bias by Enhancing Recommendation Neutrality
Correcting Popularity Bias by Enhancing Recommendation Neutrality
 
Efficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced RecommendationEfficiency Improvement of Neutrality-Enhanced Recommendation
Efficiency Improvement of Neutrality-Enhanced Recommendation
 
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
Experiments on Generalizability of User-Oriented Fairness in Recommender SystemsExperiments on Generalizability of User-Oriented Fairness in Recommender Systems
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
How Social Institutions Impact E-Accessibility Policy
How Social Institutions Impact E-Accessibility PolicyHow Social Institutions Impact E-Accessibility Policy
How Social Institutions Impact E-Accessibility Policy
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
Contribution to proactivity in mobile context-aware recommender systems
Contribution to proactivity in mobile context-aware recommender systemsContribution to proactivity in mobile context-aware recommender systems
Contribution to proactivity in mobile context-aware recommender systems
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
 
Law Enforcement Analytic Standards Department
Law Enforcement Analytic Standards DepartmentLaw Enforcement Analytic Standards Department
Law Enforcement Analytic Standards Department
 
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
 
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
 
Information Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System Evaluation
 
PennDOT Planning Partners Conference 2009
PennDOT Planning Partners Conference 2009PennDOT Planning Partners Conference 2009
PennDOT Planning Partners Conference 2009
 

Plus de Alejandro Bellogin

Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Alejandro Bellogin
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...
Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Alejandro Bellogin
 

Plus de Alejandro Bellogin (19)

Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?
 
Revisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenariosRevisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenarios
 
Evaluating decision-aware recommender systems
Evaluating decision-aware recommender systemsEvaluating decision-aware recommender systems
Evaluating decision-aware recommender systems
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Implicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationImplicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix Factorization
 
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationRiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
 
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013
 
CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?
 
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesPredicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - Slides
 
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamPredicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slam
 
Predicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterPredicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - Poster
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 

Dernier (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Understanding Similarity Metrics in Neighbour-based Recommender Systems

  • 1. Understanding Similarity Metrics in Neighbour-based Recommender Systems Alejandro Bellogín, Arjen de Vries Information Access CWI ICTIR, October 2013
  • 2. Motivation  Why some recommendation methods perform better than others? 2 Alejandro Bellogín – ICTIR, October 2013
  • 3. Motivation  Why some recommendation methods perform better than others?  Focus: nearest-neighbour recommenders • What aspects of the similarity functions are more important? • How can we exploit that information? 3 Alejandro Bellogín – ICTIR, October 2013
  • 4. Context  Recommender systems • Users interact (rate, purchase, click) with items 4 Alejandro Bellogín – ICTIR, October 2013
  • 5. Context  Recommender systems • Users interact (rate, purchase, click) with items 5 Alejandro Bellogín – ICTIR, October 2013
  • 6. Context  Recommender systems • Users interact (rate, purchase, click) with items 6 Alejandro Bellogín – ICTIR, October 2013
  • 7. Context  Recommender systems • Users interact (rate, purchase, click) with items • Which items will the user like? 7 Alejandro Bellogín – ICTIR, October 2013
  • 8. Context  Nearest-neighbour recommendation methods • The item prediction is based on “similar” users 8 Alejandro Bellogín – ICTIR, October 2013
  • 9. Context  Nearest-neighbour recommendation methods • The item prediction is based on “similar” users 9 Alejandro Bellogín – ICTIR, October 2013
  • 10. Different similarity metrics – different neighbours 10 Alejandro Bellogín – ICTIR, October 2013
  • 11. Different similarity metrics – different recommendations 11 Alejandro Bellogín – ICTIR, October 2013
  • 12. Different similarity metrics – different recommendations s( , ) sim( 12 Alejandro Bellogín – ICTIR, October 2013 , )s( , )
  • 13. Research question  How does the choice of a similarity metric determine the quality of the recommendations? 13 Alejandro Bellogín – ICTIR, October 2013
  • 14. Problem: sparsity  Too many items exist, not enough ratings will be available  A user’s neighbourhood is likely to introduce not-so-similar users 14 Alejandro Bellogín – ICTIR, October 2013
  • 15. Different similarity metrics – which one is better?  Consider Cosine vs Pearson similarity  Most existing studies report Pearson correlation to lead superior recommendation accuracy 15 Alejandro Bellogín – ICTIR, October 2013
  • 16. Different similarity metrics – which one is better?  Consider Cosine vs Pearson similarity  Common variations to deal with sparsity • Thresholding: threshold to filter out similarities (no observed difference) • Item selection: use full profiles or only the overlap • Imputation: default value for unrated items 16 Alejandro Bellogín – ICTIR, October 2013
  • 17. Different similarity metrics – which one is better?  Which similarity metric is better? • Cosine is not superior for every variation  Which variation is better? • They do not show consistent results  Why some variations improve/decrease performance? → Analysis of similarity features 17 Alejandro Bellogín – ICTIR, October 2013
  • 18. Analysis of similarity metrics  Based on • Distance/Similarity distribution • Nearest-neighbour graph 18 Alejandro Bellogín – ICTIR, October 2013
  • 19. Analysis of similarity metrics  Distance distribution  In high dimensions, nearest neighbour is unstable: If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999 19 Alejandro Bellogín – ICTIR, October 2013
  • 20. Analysis of similarity metrics  Distance distribution • Quality q(n, f): fraction of users for which the similarity function has ranked at least n percentage of the whole community within a factor f of the nearest neighbour’s similarity value 21 Alejandro Bellogín – ICTIR, October 2013
  • 21. Analysis of similarity metrics  Distance distribution • Quality q(n, f): fraction of users for which the similarity function has ranked at least n percentage of the whole community within a factor f of the nearest neighbour’s similarity value • Other features: 22 Alejandro Bellogín – ICTIR, October 2013
  • 22. Analysis of similarity metrics  Nearest neighbour graph (NNk) • Binary relation of whether a user belongs or not to a neighbourhood 23 Alejandro Bellogín – ICTIR, October 2013
  • 23. Experimental setup  Dataset • MovieLens 1M: 6K users, 4K items, 1M ratings • Random 5-fold training/test split  JUNG library for graph related metrics  Evaluation • Generate a ranking for each relevant item, containing 100 not relevant items • Metric: mean reciprocal rank (MRR) 24 Alejandro Bellogín – ICTIR, October 2013
  • 24. Performance analysis  Correlations between performance and features of each similarity (and its variations) 25 Alejandro Bellogín – ICTIR, October 2013
  • 25. Performance analysis – quality  Correlations between performance and characteristics of each similarity (and its variations)  For a user • If most of the user population is far away, low quality correlates with effectiveness (discriminative similarity) • If most of the user population is close, high quality correlates with ineffectiveness (not discriminative enough) Quality q(n, f): fraction of users for which the similarity function has ranked at least n percentage of the whole community within a factor f of the nearest neighbour’s similarity value 26 Alejandro Bellogín – ICTIR, October 2013
  • 26. Performance analysis – examples 27 Alejandro Bellogín – ICTIR, October 2013
  • 27. Conclusions (so far)  We have found similarity features correlated with their final performance • They are global properties, in contrast with query performance predictors • Compatible results with those in database: the stability of a metric is related with its ability to discriminate between good and bad neighbours 28 Alejandro Bellogín – ICTIR, October 2013
  • 28. Application  Transform “bad” similarity metrics into “better performing” ones • Adjusting their values according to the correlations found  Transform their distributions • Using a distribution-based normalisation [Fernández, Vallet, Castells, ECIR 06] • Take as ideal distribution ( F ) the best performing similarity (Cosine Full0) 29 Alejandro Bellogín – ICTIR, October 2013
  • 29. Application  Transform “bad” similarity metrics into “better performing” ones • Adjusting their values according to the correlations found  Transform their distributions • Using a distribution-based normalisation [Fernández, Vallet, Castells, ECIR 06] • Take as ideal distribution ( F ) the best performing similarity (Cosine Full0)  Results The rest of the characteristics are not (necessarily) inherited 30 Alejandro Bellogín – ICTIR, October 2013
  • 30. Conclusions  We have found similarity features correlated with their final performance • They are global properties, in contrast with query performance predictors • Compatible results with those in database: the stability of a metric is related with its ability to discriminate between good and bad neighbours  Not conclusive results when transforming bad-performing similarities based on distribution normalisations • We want to explore (and adapt to) other features, e.g., graph distance • We aim to develop other applications based on these results, e.g., hybrid recommendation 31 Alejandro Bellogín – ICTIR, October 2013
  • 31. Thank you Understanding Similarity Metrics in Neighbour-based Recommender Systems Alejandro Bellogín, Arjen de Vries Information Access CWI ICTIR, October 2013 32 Alejandro Bellogín – ICTIR, October 2013
  • 32. Different similarity metrics – all the results  Performance results for variations of two metrics • Cosine • Pearson  Variations • Thresholding: threshold to filter out similarities (no observed difference) • Imputation: default value for unrated items 33 Alejandro Bellogín – ICTIR, October 2013