Presentation given at the Workshop on Recommendation Utility Evaluation: Beyond RMSE in conjunction with the conference on recommender systems (ACM) on September 9, 2012
1. Modeling Difficulty in Recommender Systems
Benjamin Kille (@bennykille)
Competence Center Information Retrieval & Machine Learning
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012)
2. Outline
► Recommender System Evaluation
► Problem definition
► Difficulty in Recommender Systems
► Future work
► Conclusions
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 2
3. Recommender Systems Evaluation
► Definition of Evaluation measure:
RMSE (rating prediction scenario)
nDCG (ranking scenario)
Precision@N (top-N scenario)
► Splitting data into training and test partition
► Reporting results as average over the full set of users
► Is recommending to all users equally difficult?
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 3
4. Observed Differences
► Users differ with respect to
Demographics (e.g., age, gender and location)
Taste
Needs
Expectations
Consumption patterns
…
► Recommendation algorithms do not perform equally for each
single user
users should not be evaluated all in the same way!
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 4
5. Risks of disregarding users‘ differences
► A subset of users receives worse recommendations than
possible
► recommendation algorithm optimization targets all users
equally:
„easy“ users costs could be saved
„difficult“ users insufficient optimization
Control optimization towards those users who really require it!
How to determine difficulty?
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 5
6. Problem Formulation
► Measuring how difficult it will be to recommend items to a
user
► Ideally: deriving difficulty directly from user attributes
► Problem: unkown correlation between (combinations of)
attributes and difficulty
► We need a method to calculate the correlation of user
attributes and the recommendation difficulty
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 6
7. Difficulty in Information Retrieval
► Target object: query
► Method: Query
IR-System IR-System IR-System IR-System IR-System
Doc 1 Doc 1 Doc 1 Doc 2 Doc 1
Doc 2 Doc 2 Doc 3 Doc 1 Doc 2
Doc 3 Doc 3 Doc 2 Doc 4 Doc 4
… … … … …
Difficulty = Diversity of returned list of documents
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 7
8. Difficulty in Recommender Systems
► Selecting several recommendation methods (state-of-the-art)
► Measure the diversity of their output for a specific user
► Based on the methods‘ agreement with respect to predicted
rating / ranking / top-N items, we conclude:
high agreement low difficulty
low agreement high difficulty
► Target correlation (user attributes ~ difficulty) can be
estimated using the observed difficulties for a sufficiently
large set of users
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 8
9. Future Work
► Experimentally verify feasability of difficulty estimation
► Evaluate observed correlation (user attributes ~ difficulty) on
data sets
► Investigate business rationale (reduced costs through
controlled optimization efforts)
► How to deal with sparsity / cold-start issues
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 9
10. Conclusions
► Users should not be treated equally when evaluating
recommender systems
► Difficulty of recommendation tasks varies between users
► Difficulty will allow to control optimization towards those users
who require it
► Diversity metrics could be used to estimate difficulty scores
(analogously to information retrieval)
► Proposed method needs to be evaluated
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 10
11. Thank you for your attention!
Questions
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 11
12. References
[He2008] J. He, M. Larson, and M. De Rijke. Using
coherence-based measures to predict query
difficulty. ECIR 2008
[Herlocker2004] J. Herlocker, J. Konstan, L. Terveen, and J.
Riedl. Evaluating collaborative filtering
recommender systems. ACM TOIS 22(1) 2004
[Kuncheva2003] L. Kuncheva and C. Whitaker. Measures of
diversity in classifier ensembles and their
relationship with the ensemble accuracy.
Machine Learning 51 2003
[Vargas2011] S. Vargas and P. Castells. Rank and relevance in
novelty and diversity metrics for
recommender systems. RecSys 2011
September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 12