SlideShare une entreprise Scribd logo
1  sur  12
Modeling Difficulty in Recommender Systems


Benjamin Kille (@bennykille)
Competence Center Information Retrieval & Machine Learning




               September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)
Outline


►   Recommender System Evaluation

►   Problem definition

►   Difficulty in Recommender Systems

►   Future work

►   Conclusions




              September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012) 2
Recommender Systems Evaluation


►   Definition of Evaluation measure:
        RMSE (rating prediction scenario)

        nDCG (ranking scenario)

        Precision@N (top-N scenario)

►   Splitting data into training and test partition

►   Reporting results as average over the full set of users

►   Is recommending to all users equally difficult?




               September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   3
Observed Differences

► Users differ with respect to
      Demographics (e.g., age, gender and location)
      Taste
      Needs
      Expectations
      Consumption patterns
     …
► Recommendation algorithms do not perform equally for each

  single user
users should not be evaluated all in the same way!




           September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   4
Risks of disregarding users‘ differences

► A subset of users receives worse recommendations than
  possible
► recommendation algorithm optimization targets all users

  equally:
      „easy“ users  costs could be saved
      „difficult“ users  insufficient optimization
 Control optimization towards those users who really require it!




   How to determine difficulty?




              September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   5
Problem Formulation

► Measuring how difficult it will be to recommend items to a
  user
► Ideally: deriving difficulty directly from user attributes

► Problem: unkown correlation between (combinations of)

  attributes and difficulty

►   We need a method to calculate the correlation of user
    attributes and the recommendation difficulty




              September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   6
Difficulty in Information Retrieval

► Target object: query
► Method:                          Query


IR-System    IR-System           IR-System           IR-System             IR-System

    Doc 1      Doc 1              Doc 1                 Doc 2                 Doc 1

    Doc 2      Doc 2              Doc 3                 Doc 1                 Doc 2

    Doc 3      Doc 3              Doc 2                 Doc 4                 Doc 4

     …            …                 …                      …                    …



            Difficulty = Diversity of returned list of documents



             September 9, 2012      Recommendation Utility Evaluation: Beyond RMSE (2012)   7
Difficulty in Recommender Systems

► Selecting several recommendation methods (state-of-the-art)
► Measure the diversity of their output for a specific user

► Based on the methods‘ agreement with respect to predicted

  rating / ranking / top-N items, we conclude:
      high agreement  low difficulty
      low agreement  high difficulty

►   Target correlation (user attributes ~ difficulty) can be
    estimated using the observed difficulties for a sufficiently
    large set of users




               September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   8
Future Work


►   Experimentally verify feasability of difficulty estimation

►   Evaluate observed correlation (user attributes ~ difficulty) on
    data sets

►   Investigate business rationale (reduced costs through
    controlled optimization efforts)

►   How to deal with sparsity / cold-start issues




                September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   9
Conclusions


►   Users should not be treated equally when evaluating
    recommender systems

►   Difficulty of recommendation tasks varies between users

►   Difficulty will allow to control optimization towards those users
    who require it

►   Diversity metrics could be used to estimate difficulty scores
    (analogously to information retrieval)

►   Proposed method needs to be evaluated


               September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   10
Thank you for your attention!



Questions




          September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   11
References

[He2008]        J. He, M. Larson, and M. De Rijke. Using
                coherence-based measures to predict query
                difficulty. ECIR 2008
[Herlocker2004] J. Herlocker, J. Konstan, L. Terveen, and J.
                Riedl. Evaluating collaborative filtering
                recommender systems. ACM TOIS 22(1) 2004
[Kuncheva2003] L. Kuncheva and C. Whitaker. Measures of
                diversity in classifier ensembles and their
                relationship with the ensemble accuracy.
                Machine Learning 51 2003
[Vargas2011]    S. Vargas and P. Castells. Rank and relevance in
                novelty and diversity metrics for
                recommender systems. RecSys 2011


            September 9, 2012   Recommendation Utility Evaluation: Beyond RMSE (2012)   12

Contenu connexe

Similaire à Modeling Difficulty in Recommender Systems

Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation systemZhenv5
 
Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods IJERA Editor
 
Thesis Part II EMGT 699
Thesis Part II EMGT 699Thesis Part II EMGT 699
Thesis Part II EMGT 699Karthik Murali
 
Process Improvement for better Software Technical Quality under Global Crisis...
Process Improvement for better Software Technical Quality under Global Crisis...Process Improvement for better Software Technical Quality under Global Crisis...
Process Improvement for better Software Technical Quality under Global Crisis...Optimyth Software
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
TERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONTERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONIRJET Journal
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
Non Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdfNon Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdfJeevaPadmini
 
Thesis Part I EMGT 698
Thesis Part I EMGT 698Thesis Part I EMGT 698
Thesis Part I EMGT 698Karthik Murali
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Daniel Roggen
 
Effects Based Planning And Assessment
Effects Based Planning And AssessmentEffects Based Planning And Assessment
Effects Based Planning And Assessmentahmad bassiouny
 
Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...
Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...
Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...Jānis Grabis
 
Evaluating the effectiveness of data quality framework in software engineering
Evaluating the effectiveness of data quality framework in  software engineeringEvaluating the effectiveness of data quality framework in  software engineering
Evaluating the effectiveness of data quality framework in software engineeringIJECEIAES
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET Journal
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendationKatrien Verbert
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 

Similaire à Modeling Difficulty in Recommender Systems (20)

Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods Selection of Equipment by Using Saw and Vikor Methods
Selection of Equipment by Using Saw and Vikor Methods
 
Thesis Part II EMGT 699
Thesis Part II EMGT 699Thesis Part II EMGT 699
Thesis Part II EMGT 699
 
Process Improvement for better Software Technical Quality under Global Crisis...
Process Improvement for better Software Technical Quality under Global Crisis...Process Improvement for better Software Technical Quality under Global Crisis...
Process Improvement for better Software Technical Quality under Global Crisis...
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
master_thesis.pdf
master_thesis.pdfmaster_thesis.pdf
master_thesis.pdf
 
TERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONTERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTION
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Evaluation of eLearning
Evaluation of eLearningEvaluation of eLearning
Evaluation of eLearning
 
Non Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdfNon Functional Requirements in Requirement Engineering.pdf
Non Functional Requirements in Requirement Engineering.pdf
 
Thesis Part I EMGT 698
Thesis Part I EMGT 698Thesis Part I EMGT 698
Thesis Part I EMGT 698
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
Effects Based Planning And Assessment
Effects Based Planning And AssessmentEffects Based Planning And Assessment
Effects Based Planning And Assessment
 
Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...
Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...
Endurant Ecosystems: Model-based Assessment of Resilience of Digital Business...
 
Evaluating the effectiveness of data quality framework in software engineering
Evaluating the effectiveness of data quality framework in  software engineeringEvaluating the effectiveness of data quality framework in  software engineering
Evaluating the effectiveness of data quality framework in software engineering
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User Interest
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendation
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 

Modeling Difficulty in Recommender Systems

  • 1. Modeling Difficulty in Recommender Systems Benjamin Kille (@bennykille) Competence Center Information Retrieval & Machine Learning September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012)
  • 2. Outline ► Recommender System Evaluation ► Problem definition ► Difficulty in Recommender Systems ► Future work ► Conclusions September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 2
  • 3. Recommender Systems Evaluation ► Definition of Evaluation measure:  RMSE (rating prediction scenario)  nDCG (ranking scenario)  Precision@N (top-N scenario) ► Splitting data into training and test partition ► Reporting results as average over the full set of users ► Is recommending to all users equally difficult? September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 3
  • 4. Observed Differences ► Users differ with respect to  Demographics (e.g., age, gender and location)  Taste  Needs  Expectations  Consumption patterns … ► Recommendation algorithms do not perform equally for each single user users should not be evaluated all in the same way! September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 4
  • 5. Risks of disregarding users‘ differences ► A subset of users receives worse recommendations than possible ► recommendation algorithm optimization targets all users equally:  „easy“ users  costs could be saved  „difficult“ users  insufficient optimization  Control optimization towards those users who really require it!  How to determine difficulty? September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 5
  • 6. Problem Formulation ► Measuring how difficult it will be to recommend items to a user ► Ideally: deriving difficulty directly from user attributes ► Problem: unkown correlation between (combinations of) attributes and difficulty ► We need a method to calculate the correlation of user attributes and the recommendation difficulty September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 6
  • 7. Difficulty in Information Retrieval ► Target object: query ► Method: Query IR-System IR-System IR-System IR-System IR-System Doc 1 Doc 1 Doc 1 Doc 2 Doc 1 Doc 2 Doc 2 Doc 3 Doc 1 Doc 2 Doc 3 Doc 3 Doc 2 Doc 4 Doc 4 … … … … … Difficulty = Diversity of returned list of documents September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 7
  • 8. Difficulty in Recommender Systems ► Selecting several recommendation methods (state-of-the-art) ► Measure the diversity of their output for a specific user ► Based on the methods‘ agreement with respect to predicted rating / ranking / top-N items, we conclude:  high agreement  low difficulty  low agreement  high difficulty ► Target correlation (user attributes ~ difficulty) can be estimated using the observed difficulties for a sufficiently large set of users September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 8
  • 9. Future Work ► Experimentally verify feasability of difficulty estimation ► Evaluate observed correlation (user attributes ~ difficulty) on data sets ► Investigate business rationale (reduced costs through controlled optimization efforts) ► How to deal with sparsity / cold-start issues September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 9
  • 10. Conclusions ► Users should not be treated equally when evaluating recommender systems ► Difficulty of recommendation tasks varies between users ► Difficulty will allow to control optimization towards those users who require it ► Diversity metrics could be used to estimate difficulty scores (analogously to information retrieval) ► Proposed method needs to be evaluated September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 10
  • 11. Thank you for your attention! Questions September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 11
  • 12. References [He2008] J. He, M. Larson, and M. De Rijke. Using coherence-based measures to predict query difficulty. ECIR 2008 [Herlocker2004] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM TOIS 22(1) 2004 [Kuncheva2003] L. Kuncheva and C. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 2003 [Vargas2011] S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recommender systems. RecSys 2011 September 9, 2012 Recommendation Utility Evaluation: Beyond RMSE (2012) 12