SlideShare a Scribd company logo
1 of 24
5th ACM International Conference on
                     Recommender Systems – RecSys 2011




                        Rank and Relevance
                 in Novelty and Diversity Metrics
                    for Recommender Systems

                          Saúl Vargas and Pablo Castells
                        Universidad Autónoma de Madrid
                                http://ir.ii.uam.es



IRG
                      Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                        5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                               Chicago, IL, 23-27 October 2011
Beyond accuracy: novelty and diversity
         You bought                                     So you are recommended…
        (or browsed)
                 Revolver

                                                    Rubber Soul            With The              Beatles             Let it be         Help!
                                                                           Beatles               for Sale




                                                     A Hard Day’s       Sgt. Pp’s Lonely         Yellow           Magical Mystery   The White
                                                        Night          Hearts Club Band        Submarine               Tour           Album

             Abbey Road


                            The recommendedPlease are…
                                                 items 1967-1970                               1962-1966           Past Masters     Past Masters
                              Very similar to each other (Blue)
                                              Please me                                          (Red)                                  Vol 2

                              Very similar to what the
                               user has already seen                                                              …          More Beatles’
                                                                                                                               albums
                              Very widely known
                                              Dark Side Some Girls                              Bob Dylan
                                                     of the Moon


IRG
                                    Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                      5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                             Chicago, IL, 23-27 October 2011
Novelty and diversity in Recommender Systems

     Algorithms to enhance novelty and diversity
          Greedy optimization of objective functions (accuracy + diversity), promotion of long-tail items, etc.
                 (Ziegler 2005, Zhang 2008, Celma 2008)

     Metrics and methodologies to measure and evaluate novelty and diversity
          Inverse popularity –mean self-information (Zhou 2010)  recommend in the long tail
                                 1
                       MSI  
                                 R
                                      log
                                     iR
                                             2   p i                              Novelty

          Intra-list diversity –average pairwise distance (Ziegler 2005, Zhang 2008)

                                     2
                       ILD                   R d ik , il 
                                 R  R  1 ik ,il                                 Diversity
                                                    k l


          Other: temporal diversity (Lathia 2010), diversity relative to other users & to other systems
                 (Bellogín 2010), aggregate diversity (Adomavicius 2011), unexpectedness (Adamopoulos 2011), etc.



IRG
                                      Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                        5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                               Chicago, IL, 23-27 October 2011
Some limitations

                               R1                                    R2

                                                                                         Metrics are insensitive to the
                 Diverse




                                                       Not diverse
                                                                                         order of recommender items




                                                                                               Same item sets  same
                                                                                               measured diversity/novelty
                 Not diverse




                                                       Diverse
                               …




                                                                      …




IRG
                                    Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                      5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                             Chicago, IL, 23-27 October 2011
Some limitations



                            Accuracy and diversity/novelty measured independently



                            Method A is better than B                                           Which one is better?

                                                 Method A
                                                 Method B
                 Accuracy




                                     Diversity




IRG
                                       Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                         5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                                Chicago, IL, 23-27 October 2011
Our research goals


           1. Further formalize recommendation novelty and diversity
                 metrics based on a few basic fundamental principles

           2. Build a unified metric framework where:
                  –   As many state of the art novelty and diversity metrics as possible
                      are related and generalized

                  –   New metrics can be defined

           3. Enhance the novelty and diversity metrics with rank
                 sensitivity and relevance awareness


IRG
                               Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                 5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                        Chicago, IL, 23-27 October 2011
Basic fundamental principles to build metrics upon

        Our approach: define and formalize novelty and diversity metrics
                 based on models of how users interact with items

        Three basic fundamental principles in user-item interaction
                  – Discovery – an item is seen by a user
                  – Relevance – an item would be liked by (or useful for, etc.) a user
                  – Choice – an item is actually accepted (bought, consumed, etc.) by a user

        Formalized as binary random variables
                  – seen, rel, choose taking values in {true, false}
                                                                                                   seen choose        rel
        Simplifying assumptions:
                  – seen and rel are mutually independent
                  – If a user sees an item that is relevant for her,                         p choose            p seen  p rel 
                     she chooses it


IRG
                                      Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                        5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                               Chicago, IL, 23-27 October 2011
Proposed metric framework

                     Expected effective novelty of items when a user interacts
         R
                     with a ranked list of recommended items in a context 


                     m  R    C  p choose i, u, R  nov i  
                                                 iR

                     Novelty is relative: item novelty context 
                 i
                     To (what we know about) what someone has seen sometime somewhere
                      Someone  the target user, a set of users, all users…
                      Sometime  a specific past time period, an ongoing session, “ever”…
                      Somewhere  past recommendations, the current recommendation R,
                       recommendations by other systems, “anywhere”…
                      “What we know about that”  context of observation: available observations
         …




IRG
                                Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                  5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                         Chicago, IL, 23-27 October 2011
Metric framework components




                 m  R    C  p choose i, u, R  nov i  
                                         iR



                    Item novelty model                            nov i  

                    Choice model                                  p choose i, u, R 




IRG
                        Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                          5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                 Chicago, IL, 23-27 October 2011
Item novelty models

                        Item novelty model nov(i|)
    Discovery-based                     (negative popularity)

                 – Popularity complement                nov i    1  p seen i,                            Forced discovery

                 – Self-information (surprisal) nov i     log2 p i seen,                                 Free discovery


    Distance-based                   ( here represents a set of items)

                 – Expected item distance                nov i     p  j choose, i,   d i, j 
                                                                                j 


                 – Minimum item distance                 nov i    min d i, j 
                                                                                 j 




IRG
                                   Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                     5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                            Chicago, IL, 23-27 October 2011
Metric framework components




                 m  R    C  p choose i, u, R  nov i  
                                         iR



                    Item novelty model                            nov i  

                    Choice model                                  p choose i, u, R 




IRG
                        Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                          5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                 Chicago, IL, 23-27 October 2011
Choice model


                     Choice model p(choose|i,u,R)

                 p choose     p seen  p rel 


                 p choose i, u, R           p seen i, u, R  p rel i, u 

                                                     Browsing                     Relevance                  Independent
                                                      model                        model                     from R




IRG
                               Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                 5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                        Chicago, IL, 23-27 October 2011
Browsing model

                 R
                         Browsing model where p(seen|ik,u,R) should decrease with k
         1
                          Can be formalized as different probabilistic discount functions
         2
                           (see e.g. Carterette 2011)
         3
                          In general, p(seen|ik,u,R) = disc(k)
         4
                             disc k 
         5
                               p k 1                       exponential, as in RBP (Moffat 2008)
k=6                  ?
                               1 log k  1               as in nDCG
         7
                               1k                           Zipfian, as in MRR, MAP, etc.
         8
                               1                            no discount
         9
                               ...                          many others...
                 …




IRG
                             Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                               5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                      Chicago, IL, 23-27 October 2011
Wrapping up: resulting metric scheme




                  m  R    C  disc k  p rel ik , u  nov ik  
                                         ik R
                                                       Rank                        Item                         Item
                                                     discount                   relevance                      novelty


                 Normalization – to get the novelty ratio by expected number of browsed items

                                 1
                         C 
                              disc k 
                                 ik R
                                                                              Expected browsing depth




IRG
                                 Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                   5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                          Chicago, IL, 23-27 October 2011
Implementation

       Ground model estimates

         observed interaction between all users and items in the system

        Discovery distributions can be estimated from rating data or access records

                  – Forced discovery p(seen|i,)  IUF (ratio of users who have interacted with i)

                  – Free discovery: p(i|seen,)  ICF (ratio of interactions involving i)

        Relevance distribution p(rel|i,u) is estimated by a mapping from ratings

                 to relevance (see definition of ERR in Chapelle 2009)




IRG
                                   Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                     5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                            Chicago, IL, 23-27 October 2011
Novelty and diversity metrics




                                 Putting all together
                 Some metric framework instantiations




IRG
                     Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                       5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                              Chicago, IL, 23-27 October 2011
Putting all together: metric framework instantiations


       Discovery-based metrics

         observed interaction between all users and items in the system

        Expected popularity complement

                   EPC R   C  disc k  p rel ik , u  1  p seen ik 
                                  ik R
                                                                                                        
                                                                                                                          Novelty
        Expected free discovery

                   EFD R   C  disc k  p rel ik , u  log p ik seen 
                                    ik R
                                                                                                                1
                 Without rank and relevance reduces to  MSI  R   
                                                                                                                R
                                                                                                                     log p i seen 
                                                                                                                    iR




IRG
                                  Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                    5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                           Chicago, IL, 23-27 October 2011
Putting all together: metric framework instantiations

   Distance-based metrics

     the observed interaction of the target user only
    Expected profile distance
                                                                                                                      Unexpectedness
                 EPD  R   C u  disc k  p rel ik , u  p rel j, u  d ik , j                                 (user-specific)
                                   ik R
                                    j u


     the recommended items the target user can see in R
    Expected intra-list diversity
                                                                                                                                    Diversity
                 EILD  R       C disc k  disc l k  p rel i , u  p rel i , u  d i , i 
                                ik R
                                           k                                             k                    l          k   l

                                il R
                                 k l                                2
             Without rank and relevance reduces to 
                                                     ILD  R                 d ik , il 
                                                                 R  R  1 ik ,il R
                                                                                                                                 k l



IRG
                                        Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                          5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                                 Chicago, IL, 23-27 October 2011
Novelty and diversity metrics




                              Some experiments




IRG
                  Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                    5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                           Chicago, IL, 23-27 October 2011
Experiments

      Datasets                                                                     Recommender algorithms
                 – MovieLens 1M                                                            – CB                    Content-based (ML only)
                 – Last.fm data by Òscar Celma                                             – UB                    User-based kNN
      Experiment design                                                                   – MF                    Matrix factorization
                 – Run baseline recommenders                                               – AVG                   Average rating
                 – Rerank top 500 recommended items                                        – RND                   Random
                   by diversification algorithms
                 – Measure metrics on top 50 items                                  Diversification algorithms
      Metrics                                                                             – MMR                   Greedy optimization
                                                                                                                   of relevance + diversity
                 – EPC@50          Novelty                                                                         (Zhang 2008)
                                   (popularity complement)
                                                                                           – IA-Select             Adaptation of IR
                 – EPD@50          Unexpectednes                                                                   diversity algorithm
                                   (profile distance)                                                              (Agrawal 2008)
                 – EILD@50         Intra-list diversity                                    – NGD                   Greedy optimization
                 Distance function: complement of Jaccard                                                          of relevance + novelty
                 (MovieLens genres) and Pearson (Last.fm)                                  – Random

IRG
                                     Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                       5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                              Chicago, IL, 23-27 October 2011
Experimental results on baseline recommenders (no rank discount)

                                  MovieLens 1M                                        Last.fm
                                                                                                                               Without relevance
                   1.0                                         1.00                                               CB
                   0.9                                                                                            MF
                                                                                                                                CB is good for long-
                                                               0.97
No relevance




                   0.8                                                                                            UB              tail, not so good at
                                                               0.94
                   0.7                                                                                            AVG             unexpectedness
                                                               0.91                                               RND
                   0.6                                                                                                            and diversity
                   0.5                                         0.88
                                                                                                                                AVG rating and RND
                   0.4                                         0.85
                         EPC@50      EPD@50      EILD@50                EPC@50       EPD@50        EILD@50                        stand out, especial-
                                                                                                                                  ly on Last.fm


                                  MovieLens 1M                                        Last.fm
                                                                                                                               With relevance
                  0.07                                          0.5                                               CB
                                                                                                                                MF stands out on
Relevance-aware




                  0.06                                                                                            MF
                                                                0.4
                  0.05                                                                                            UB              MovieLens
                  0.04                                          0.3                                               AVG
                  0.03                                                                                            RND
                                                                                                                                UB stands out on
                                                                0.2
                  0.02                                                                                                            Last.fm
                  0.01                                          0.1
                  0.00                                          0.0
                                                                                                                                AVG rating and RND
                         EPC@50     EPD@50    EILD@50                   EPC@50       EPD@50       EILD@50                         drop drastically




IRG
                                                 Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                                   5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                                          Chicago, IL, 23-27 October 2011
Experimental results with diversification algorithms
   Wilcoxon                                  MovieLens 1M                                                              Last.fm
   p < 0.001                  EPC@50            EPD@50                EILD@50                   EPC@50                 EPD@50        EILD@50
                disc (k)      1    0.85k–1     1       0.85k–1        1       0.85k–1          1       0.85k–1         1   0.85k–1   1      0.85k–1
                MF          0.9124 0.8876 0.7632 0.7466 0.7164 0.6191                      0.8754 0.8481 0.8949 0.8895 0.8862 0.7954
 No relevance




                IA-Select   0.9045 0.8886 0.8080 0.7577 0.8289 0.7483                      0.8840 0.9089 0.8912 0.8909 0.8878 0.8274
                MMR         0.9063 0.8769 0.7605 0.7428 0.7191 0.6247                      0.9068 0.8903 0.9133 0.9107 0.9166 0.8398
                NGD         0.9851 0.9795 0.7725 0.7551 0.6563 0.5430                      0.9722 0.9571 0.9423 0.9398 0.9485 0.8784
                Random      0.9525 0.9527 0.7699 0.7699 0.7283 0.6719                      0.9359 0.9357 0.9278 0.9279 0.9318 0.8619

                MF          0.0671 0.1043 0.0580 0.0944 0.0471 0.0551                      0.2501 0.2115 0.2671 0.2587 0.2518 0.1900
                IA-Select
 Relevance




                            0.0705 0.1161 0.0639 0.1032 0.0537 0.0648                      0.3343 0.4752 0.3462 0.3994 0.3343 0.4154
                MMR         0.0719 0.1131 0.0620 0.1020 0.0510 0.0610                      0.2351 0.1936 0.2439 0.2340 0.2360 0.1759
                NGD         0.0155 0.0223 0.0128 0.0200 0.0067 0.0017                      0.2286 0.3077 0.2212 0.2593 0.2165 0.2656
                Random      0.0222 0.0218 0.0182 0.0179 0.0117 0.0058                      0.1362 0.1368 0.1407 0.1405 0.1342 0.1113

  Improvement w.r.t. random reranking is clearer with relevance                                                                         best
  Rank sensitivity uncovers further improvements by diversification algorithms                                                          > random
  Different metrics appreciate different diversification algorithms consistently                                                        < baseline


IRG
                                         Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                           5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                                  Chicago, IL, 23-27 October 2011
Experimental results

          The metrics behave consistently
                  – E.g. content-based recommender scores high on novelty (long-tail) but low on
                     unexpectedness and diversity

                  – Diversified recommendations score higher than baselines

                  – Different diversification strategies met their specific target

          Relevance makes a large difference
                  – Probe recommenders such as random and average rating score high without
                     relevance and rank discount –and they drop with relevance

                  – Same effect for random diversification

          Rank sensitiveness uncovers further improvements by diversification
                 algorithms which otherwise go unnoticed



IRG
                                     Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                       5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                              Chicago, IL, 23-27 October 2011
Conclusion

    General metric framework for recommendation novelty and diversity evaluation

    Flexible and configurable, supports a fair range of variants and configurations
                 – Key configuration components: item novelty models, context , rank and relevance

    Unifies and generalizes state of the art metrics
                 – Further metrics can be unified taking alternative  : temporal novelty/diversity,
                   inter-system diversity, inter-user diversity

    Provides for rank sensitivity and relevance awareness (as an option)

    Provides for single metric assessing accuracy and diversity/novelty

    Further ongoing empirical testing, wide space for further exploration!




IRG
                                    Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
                                      5th ACM International Conference on Recommender Systems (RecSys 2011)
IR Group @ UAM                                             Chicago, IL, 23-27 October 2011

More Related Content

What's hot

Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
Liang Xiang
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 

What's hot (20)

Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Trust and Recommender Systems
Trust and  Recommender SystemsTrust and  Recommender Systems
Trust and Recommender Systems
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Adversarial Attacks for Recommender Systems
Adversarial Attacks for Recommender SystemsAdversarial Attacks for Recommender Systems
Adversarial Attacks for Recommender Systems
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
 
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
 
Coco dataset
Coco datasetCoco dataset
Coco dataset
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Collaborative Recommender System for Music using PyTorch
Collaborative Recommender System for Music using PyTorchCollaborative Recommender System for Music using PyTorch
Collaborative Recommender System for Music using PyTorch
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 

Viewers also liked

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 

Viewers also liked (7)

Setting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender System EvaluationsSetting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender System Evaluations
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
The continuous cold-start problem in e-commerce recommender systems
The continuous cold-start problem in e-commerce recommender systemsThe continuous cold-start problem in e-commerce recommender systems
The continuous cold-start problem in e-commerce recommender systems
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 

More from Pablo Castells

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
Pablo Castells
 

More from Pablo Castells (9)

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendation
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
 
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems

  • 1. 5th ACM International Conference on Recommender Systems – RecSys 2011 Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems Saúl Vargas and Pablo Castells Universidad Autónoma de Madrid http://ir.ii.uam.es IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 2. Beyond accuracy: novelty and diversity You bought So you are recommended… (or browsed) Revolver Rubber Soul With The Beatles Let it be Help! Beatles for Sale A Hard Day’s Sgt. Pp’s Lonely Yellow Magical Mystery The White Night Hearts Club Band Submarine Tour Album Abbey Road The recommendedPlease are… items 1967-1970 1962-1966 Past Masters Past Masters  Very similar to each other (Blue) Please me (Red) Vol 2  Very similar to what the user has already seen … More Beatles’ albums  Very widely known Dark Side Some Girls Bob Dylan of the Moon IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 3. Novelty and diversity in Recommender Systems Algorithms to enhance novelty and diversity  Greedy optimization of objective functions (accuracy + diversity), promotion of long-tail items, etc. (Ziegler 2005, Zhang 2008, Celma 2008) Metrics and methodologies to measure and evaluate novelty and diversity  Inverse popularity –mean self-information (Zhou 2010)  recommend in the long tail 1 MSI   R  log iR 2 p i  Novelty  Intra-list diversity –average pairwise distance (Ziegler 2005, Zhang 2008) 2 ILD  R d ik , il  R  R  1 ik ,il  Diversity k l  Other: temporal diversity (Lathia 2010), diversity relative to other users & to other systems (Bellogín 2010), aggregate diversity (Adomavicius 2011), unexpectedness (Adamopoulos 2011), etc. IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 4. Some limitations R1 R2 Metrics are insensitive to the Diverse Not diverse order of recommender items Same item sets  same measured diversity/novelty Not diverse Diverse … … IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 5. Some limitations Accuracy and diversity/novelty measured independently Method A is better than B Which one is better? Method A Method B Accuracy Diversity IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 6. Our research goals 1. Further formalize recommendation novelty and diversity metrics based on a few basic fundamental principles 2. Build a unified metric framework where: – As many state of the art novelty and diversity metrics as possible are related and generalized – New metrics can be defined 3. Enhance the novelty and diversity metrics with rank sensitivity and relevance awareness IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 7. Basic fundamental principles to build metrics upon  Our approach: define and formalize novelty and diversity metrics based on models of how users interact with items  Three basic fundamental principles in user-item interaction – Discovery – an item is seen by a user – Relevance – an item would be liked by (or useful for, etc.) a user – Choice – an item is actually accepted (bought, consumed, etc.) by a user  Formalized as binary random variables – seen, rel, choose taking values in {true, false} seen choose rel  Simplifying assumptions: – seen and rel are mutually independent – If a user sees an item that is relevant for her,  p choose  p seen  p rel  she chooses it IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 8. Proposed metric framework Expected effective novelty of items when a user interacts R with a ranked list of recommended items in a context  m  R    C  p choose i, u, R  nov i   iR Novelty is relative: item novelty context  i To (what we know about) what someone has seen sometime somewhere  Someone  the target user, a set of users, all users…  Sometime  a specific past time period, an ongoing session, “ever”…  Somewhere  past recommendations, the current recommendation R, recommendations by other systems, “anywhere”…  “What we know about that”  context of observation: available observations … IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 9. Metric framework components m  R    C  p choose i, u, R  nov i   iR  Item novelty model nov i    Choice model p choose i, u, R  IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 10. Item novelty models Item novelty model nov(i|)  Discovery-based (negative popularity) – Popularity complement nov i    1  p seen i,   Forced discovery – Self-information (surprisal) nov i     log2 p i seen,   Free discovery  Distance-based ( here represents a set of items) – Expected item distance nov i     p  j choose, i,   d i, j  j  – Minimum item distance nov i    min d i, j  j  IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 11. Metric framework components m  R    C  p choose i, u, R  nov i   iR  Item novelty model nov i    Choice model p choose i, u, R  IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 12. Choice model Choice model p(choose|i,u,R) p choose  p seen  p rel  p choose i, u, R  p seen i, u, R  p rel i, u  Browsing Relevance Independent model model from R IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 13. Browsing model R Browsing model where p(seen|ik,u,R) should decrease with k 1  Can be formalized as different probabilistic discount functions 2 (see e.g. Carterette 2011) 3  In general, p(seen|ik,u,R) = disc(k) 4 disc k  5 p k 1 exponential, as in RBP (Moffat 2008) k=6 ? 1 log k  1 as in nDCG 7 1k Zipfian, as in MRR, MAP, etc. 8 1 no discount 9 ... many others... … IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 14. Wrapping up: resulting metric scheme m  R    C  disc k  p rel ik , u  nov ik   ik R Rank Item Item discount relevance novelty Normalization – to get the novelty ratio by expected number of browsed items 1 C   disc k  ik R Expected browsing depth IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 15. Implementation Ground model estimates   observed interaction between all users and items in the system  Discovery distributions can be estimated from rating data or access records – Forced discovery p(seen|i,)  IUF (ratio of users who have interacted with i) – Free discovery: p(i|seen,)  ICF (ratio of interactions involving i)  Relevance distribution p(rel|i,u) is estimated by a mapping from ratings to relevance (see definition of ERR in Chapelle 2009) IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 16. Novelty and diversity metrics Putting all together Some metric framework instantiations IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 17. Putting all together: metric framework instantiations Discovery-based metrics   observed interaction between all users and items in the system  Expected popularity complement EPC R   C  disc k  p rel ik , u  1  p seen ik  ik R   Novelty  Expected free discovery EFD R   C  disc k  p rel ik , u  log p ik seen  ik R 1 Without rank and relevance reduces to  MSI  R    R  log p i seen  iR IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 18. Putting all together: metric framework instantiations Distance-based metrics   the observed interaction of the target user only  Expected profile distance Unexpectedness EPD  R   C u  disc k  p rel ik , u  p rel j, u  d ik , j  (user-specific) ik R j u   the recommended items the target user can see in R  Expected intra-list diversity Diversity EILD  R    C disc k  disc l k  p rel i , u  p rel i , u  d i , i  ik R k k l k l il R k l 2 Without rank and relevance reduces to  ILD  R    d ik , il  R  R  1 ik ,il R k l IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 19. Novelty and diversity metrics Some experiments IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 20. Experiments  Datasets  Recommender algorithms – MovieLens 1M – CB Content-based (ML only) – Last.fm data by Òscar Celma – UB User-based kNN  Experiment design – MF Matrix factorization – Run baseline recommenders – AVG Average rating – Rerank top 500 recommended items – RND Random by diversification algorithms – Measure metrics on top 50 items  Diversification algorithms  Metrics – MMR Greedy optimization of relevance + diversity – EPC@50 Novelty (Zhang 2008) (popularity complement) – IA-Select Adaptation of IR – EPD@50 Unexpectednes diversity algorithm (profile distance) (Agrawal 2008) – EILD@50 Intra-list diversity – NGD Greedy optimization Distance function: complement of Jaccard of relevance + novelty (MovieLens genres) and Pearson (Last.fm) – Random IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 21. Experimental results on baseline recommenders (no rank discount) MovieLens 1M Last.fm Without relevance 1.0 1.00 CB 0.9 MF  CB is good for long- 0.97 No relevance 0.8 UB tail, not so good at 0.94 0.7 AVG unexpectedness 0.91 RND 0.6 and diversity 0.5 0.88  AVG rating and RND 0.4 0.85 EPC@50 EPD@50 EILD@50 EPC@50 EPD@50 EILD@50 stand out, especial- ly on Last.fm MovieLens 1M Last.fm With relevance 0.07 0.5 CB  MF stands out on Relevance-aware 0.06 MF 0.4 0.05 UB MovieLens 0.04 0.3 AVG 0.03 RND  UB stands out on 0.2 0.02 Last.fm 0.01 0.1 0.00 0.0  AVG rating and RND EPC@50 EPD@50 EILD@50 EPC@50 EPD@50 EILD@50 drop drastically IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 22. Experimental results with diversification algorithms Wilcoxon MovieLens 1M Last.fm p < 0.001 EPC@50 EPD@50 EILD@50 EPC@50 EPD@50 EILD@50 disc (k) 1 0.85k–1 1 0.85k–1 1 0.85k–1 1 0.85k–1 1 0.85k–1 1 0.85k–1 MF 0.9124 0.8876 0.7632 0.7466 0.7164 0.6191 0.8754 0.8481 0.8949 0.8895 0.8862 0.7954 No relevance IA-Select 0.9045 0.8886 0.8080 0.7577 0.8289 0.7483 0.8840 0.9089 0.8912 0.8909 0.8878 0.8274 MMR 0.9063 0.8769 0.7605 0.7428 0.7191 0.6247 0.9068 0.8903 0.9133 0.9107 0.9166 0.8398 NGD 0.9851 0.9795 0.7725 0.7551 0.6563 0.5430 0.9722 0.9571 0.9423 0.9398 0.9485 0.8784 Random 0.9525 0.9527 0.7699 0.7699 0.7283 0.6719 0.9359 0.9357 0.9278 0.9279 0.9318 0.8619 MF 0.0671 0.1043 0.0580 0.0944 0.0471 0.0551 0.2501 0.2115 0.2671 0.2587 0.2518 0.1900 IA-Select Relevance 0.0705 0.1161 0.0639 0.1032 0.0537 0.0648 0.3343 0.4752 0.3462 0.3994 0.3343 0.4154 MMR 0.0719 0.1131 0.0620 0.1020 0.0510 0.0610 0.2351 0.1936 0.2439 0.2340 0.2360 0.1759 NGD 0.0155 0.0223 0.0128 0.0200 0.0067 0.0017 0.2286 0.3077 0.2212 0.2593 0.2165 0.2656 Random 0.0222 0.0218 0.0182 0.0179 0.0117 0.0058 0.1362 0.1368 0.1407 0.1405 0.1342 0.1113  Improvement w.r.t. random reranking is clearer with relevance best  Rank sensitivity uncovers further improvements by diversification algorithms > random  Different metrics appreciate different diversification algorithms consistently < baseline IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 23. Experimental results  The metrics behave consistently – E.g. content-based recommender scores high on novelty (long-tail) but low on unexpectedness and diversity – Diversified recommendations score higher than baselines – Different diversification strategies met their specific target  Relevance makes a large difference – Probe recommenders such as random and average rating score high without relevance and rank discount –and they drop with relevance – Same effect for random diversification  Rank sensitiveness uncovers further improvements by diversification algorithms which otherwise go unnoticed IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 24. Conclusion  General metric framework for recommendation novelty and diversity evaluation  Flexible and configurable, supports a fair range of variants and configurations – Key configuration components: item novelty models, context , rank and relevance  Unifies and generalizes state of the art metrics – Further metrics can be unified taking alternative  : temporal novelty/diversity, inter-system diversity, inter-user diversity  Provides for rank sensitivity and relevance awareness (as an option)  Provides for single metric assessing accuracy and diversity/novelty  Further ongoing empirical testing, wide space for further exploration! IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011) IR Group @ UAM Chicago, IL, 23-27 October 2011