SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
A	
  Top-­‐N	
  Recommender	
  System	
  Evalua8on	
  
Protocol	
  Inspired	
  by	
  Deployed	
  Systems	
  
Alan	
  Said,	
  Alejandro	
  Bellogín,	
  Arjen	
  De	
  Vries	
  
CWI	
  
@alansaid,	
  @abellogin,	
  @arjenpdevries	
  
Outline	
  
•  Evalua8on	
  
–  Real	
  world	
  	
  
–  Offline	
  

•  Not	
  algorithmic	
  comparison!	
  	
  
•  Comparison	
  of	
  evalua8on	
  

•  Protocol	
  
•  Experiments	
  &	
  Results	
  
•  Conclusions	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

2	
  
EVALUATION	
  
2013-­‐10-­‐13	
  

LSRS'13	
  

3	
  
Evalua8on	
  
•  Does	
  p@10	
  in	
  [Smith,2010a]	
  measure	
  the	
  same	
  quality	
  as	
  p@10	
  in	
  [Smith,
2012b]?	
  
–  Even	
  if	
  it	
  does	
  
•  is	
  the	
  underlying	
  data	
  the	
  same?	
  
•  was	
  cross-­‐valida8on	
  performed	
  similarly?	
  
•  etc.	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

4	
  
Evalua8on	
  
•  What	
  metrics	
  should	
  we	
  use?	
  
•  How	
  should	
  we	
  evaluate?	
  
–  Relevance	
  criteria	
  for	
  test	
  items	
  
–  Cross	
  valida8on	
  (n-­‐fold,	
  random)	
  

•  Should	
  all	
  users	
  and	
  items	
  be	
  treated	
  the	
  same	
  way?	
  
–  Do	
  certain	
  users	
  and	
  items	
  reflect	
  different	
  evalua8on	
  quali8es?	
  

	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

5	
  
Offline	
  Evalua8on	
  
Recommender	
  System	
  accuracy	
  evalua8on	
  is	
  currently	
  based	
  on	
  methods	
  
from	
  IR/ML	
  
– 
– 
– 
– 
– 

One	
  training	
  set	
  
One	
  test	
  set	
  
(One	
  valida8on	
  set)	
  
Algorithms	
  are	
  trained	
  on	
  the	
  training	
  set	
  
Evaluate	
  using	
  metric@N	
  (e.g.	
  p@N	
  –	
  a	
  page	
  size)	
  
•  Even	
  when	
  N	
  is	
  larger	
  than	
  the	
  number	
  of	
  test	
  items	
  
•  p@N	
  =	
  1.0	
  is	
  (almost)	
  impossible	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

6	
  
Evalua8on	
  in	
  produc8on	
  
•  One	
  dynamic	
  training	
  set	
  
–  All	
  of	
  the	
  available	
  data	
  at	
  a	
  certain	
  point	
  in	
  8me	
  
–  Con8nuously	
  updated	
  

•  No	
  test	
  set	
  	
  
–  Only	
  live	
  user	
  interac8ons	
  

•  Clicked/purchased	
  items	
  are	
  good	
  recommenda8ons	
  
Can	
  we	
  simulate	
  this	
  offline?	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

7	
  
Evalua8on	
  Protocol	
  
• 
• 
• 
• 

Based	
  on	
  “real	
  world”	
  concepts	
  
Uses	
  as	
  much	
  available	
  data	
  as	
  possible	
  
Trains	
  algorithms	
  once	
  per	
  user	
  and	
  evalua8on	
  selng	
  (e.g.	
  N)	
  
Evaluates	
  p@N	
  when	
  there	
  are	
  exactly	
  N	
  correct	
  items	
  in	
  the	
  test	
  set	
  
–  possible	
  p@N	
  =	
  1	
  (gold	
  standard)	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

8	
  
Evalua8on	
  Protocol	
  
Three	
  concepts:	
  
1.  Personalized	
  training	
  &	
  test	
  sets	
  
2. 
3. 

–  Use	
  all	
  available	
  informa8on	
  about	
  the	
  system	
  for	
  the	
  candidate	
  user	
  
–  Different	
  test/training	
  sets	
  for	
  different	
  levels	
  of	
  N	
  

Candidate	
  item	
  selec8on	
  (items	
  in	
  test	
  sets)	
  

–  Only	
  “good”	
  items	
  go	
  in	
  test	
  sets	
  (no	
  random	
  80%-­‐20%	
  splits)	
  
–  How	
  “good”	
  an	
  item	
  is	
  is	
  based	
  on	
  each	
  user’s	
  personal	
  preference	
  

Candidate	
  user	
  selec8on	
  (users	
  in	
  test	
  sets)	
  

–  Candidate	
  users	
  must	
  have	
  items	
  in	
  the	
  training	
  set	
  
–  When	
  evalua8ng	
  p@N,	
  each	
  user	
  in	
  test	
  set	
  should	
  have	
  N	
  items	
  in	
  test	
  set	
  
•  Effec8vely	
  precision	
  becomes	
  R-­‐precision	
  

Train	
  each	
  algorithm	
  once	
  for	
  each	
  user	
  in	
  the	
  test	
  set	
  and	
  once	
  for	
  each	
  N.	
  	
  
	
  
2013-­‐10-­‐13	
  

LSRS'13	
  

9	
  
Evalua8on	
  Protocol	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

10	
  
EXPERIMENTS	
  
2013-­‐10-­‐13	
  

LSRS'13	
  

11	
  
Experiments	
  
–  Movielens	
  100k	
  
• 
• 
• 
• 

Minimum	
  20	
  ra8ngs	
  per	
  user	
  
943	
  users	
  
6.43%	
  density	
  
Not	
  realis8c	
  

–  Movielens	
  1M	
  sample	
  
•  100k	
  ra8ngs	
  
•  1000	
  users	
  
•  3.0%	
  density	
  

• 

number	
  of	
  users	
  

Datasets:	
  

10	
  

1	
  
10	
  

100	
  
number	
  of	
  raAngs	
  

1000	
  

100	
  
number	
  of	
  raAngs	
  

1000	
   12	
  

100	
  

Algorithms	
  

–  SVD	
  
–  User-­‐based	
  CF	
  (kNN)	
  
–  Item-­‐based	
  CF	
  

2013-­‐10-­‐13	
  

number	
  of	
  users	
  

• 

100	
  

10	
  

1	
  

LSRS'13	
   10	
  
Experimental	
  Selngs	
  
According	
  to	
  proposed	
  protocol:	
  
•  Evaluate	
  R-­‐precision	
  for	
  
N=[1,5,10,20,50,100]	
  
•  Users	
  evaluated	
  at	
  N	
  must	
  have	
  at	
  
least	
  N	
  items	
  rated	
  above	
  the	
  
relevance	
  threshold	
  (RT)	
  
•  RT	
  depends	
  on	
  the	
  users	
  mean	
  
ra8ng	
  and	
  standard	
  devia8on	
  

Baseline	
  
•  Evaluate	
  p@N	
  for	
  
N=[1,5,10,20,50,100]	
  
•  80%-­‐20%	
  training-­‐test	
  split	
  

•  Number	
  of	
  runs:	
  |N|*|users|	
  

•  Number	
  of	
  runs:	
  1	
  

	
  

2013-­‐10-­‐13	
  

–  Items	
  in	
  test	
  set	
  rated	
  at	
  least	
  3	
  

	
  

LSRS'13	
  

13	
  
Results	
  

User-­‐based	
  CF	
  ML1M	
  sample	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

14	
  
User-­‐based	
  CF	
  ML1M	
  sample	
  

User-­‐based	
  CF	
  ML100k	
  

SVD	
  ML1M	
  sample	
  

SVD	
  ML1M	
  sample	
  

2013-­‐10-­‐13	
  

Results	
  

LSRS'13	
  

15	
  
Results	
  
What	
  about	
  8me?	
  
–  |N|*|users|	
  vs.	
  1?	
  
–  Trade-­‐off	
  between	
  a	
  realis8c	
  
evalua8on	
  and	
  complexity?	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

16	
  
Conclusions	
  
•  We	
  can	
  emulate	
  a	
  realis8c	
  produc8on	
  scenario	
  by	
  crea8ng	
  personalized	
  
training/test	
  sets	
  and	
  evalua8ng	
  them	
  for	
  each	
  candidate	
  user	
  separately	
  
•  We	
  can	
  see	
  how	
  well	
  a	
  recommender	
  performs	
  at	
  different	
  levels	
  of	
  recall	
  
(page	
  size)	
  
•  We	
  can	
  compare	
  towards	
  a	
  gold	
  standard	
  
•  We	
  can	
  reduce	
  evalua8on	
  8me	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

17	
  
Ques8ons?	
  
•  Thanks!	
  
	
  
•  Also:	
  check	
  out	
  
–  ACM	
  TIST	
  Special	
  Issue	
  on	
  RecSys	
  Benchmarking	
  –	
  bit.ly/RecSysBe	
  	
  
–  The	
  ACM	
  RecSys	
  Wiki	
  –	
  www.recsyswiki.com	
  	
  

2013-­‐10-­‐13	
  

LSRS'13	
  

18	
  

Contenu connexe

Tendances

Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Nathan Thompson
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
Chapter7 abm book_noha_nagi
Chapter7 abm book_noha_nagiChapter7 abm book_noha_nagi
Chapter7 abm book_noha_nagiNoha Nagi
 
yelp data challenge
yelp data challengeyelp data challenge
yelp data challengeAMR koura
 
Testify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeedTestify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeedMinh Nguyen
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Matthias Braunhofer
 
Orthogonal array testing course
Orthogonal array testing courseOrthogonal array testing course
Orthogonal array testing courseNarayanan Palani
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Abdel Salam Sayyad
 
Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010
Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010
Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010TEST Huddle
 
Developing a Computerized Adaptive Test
Developing a Computerized Adaptive TestDeveloping a Computerized Adaptive Test
Developing a Computerized Adaptive TestNathan Thompson
 
Senior TestAnalyst_Anjana Manoharan
Senior TestAnalyst_Anjana ManoharanSenior TestAnalyst_Anjana Manoharan
Senior TestAnalyst_Anjana ManoharanAnjana Manoharan
 
Instance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringInstance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringAldeida Aleti
 
Odin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionOdin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionMinh Nguyen
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyAbdel Salam Sayyad
 
Predicting best classifier using properties of data sets
Predicting best classifier using properties of data setsPredicting best classifier using properties of data sets
Predicting best classifier using properties of data setsAbhishek Vijayvargia
 
Orthogonal Array Testing
Orthogonal Array TestingOrthogonal Array Testing
Orthogonal Array TestingHiraQureshi22
 
Test Case Naming 02
Test Case Naming 02Test Case Naming 02
Test Case Naming 02SriluBalla
 
Test Driven Simulation Modelling
Test Driven Simulation Modelling Test Driven Simulation Modelling
Test Driven Simulation Modelling stephanong
 
Countries’ presentation on internal quality control: Malaysia
Countries’ presentation on internal quality control: MalaysiaCountries’ presentation on internal quality control: Malaysia
Countries’ presentation on internal quality control: MalaysiaExternalEvents
 

Tendances (20)

Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Chapter7 abm book_noha_nagi
Chapter7 abm book_noha_nagiChapter7 abm book_noha_nagi
Chapter7 abm book_noha_nagi
 
yelp data challenge
yelp data challengeyelp data challenge
yelp data challenge
 
Testify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeedTestify smart testoptimization-ecfeed
Testify smart testoptimization-ecfeed
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
 
Orthogonal array testing course
Orthogonal array testing courseOrthogonal array testing course
Orthogonal array testing course
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
 
Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010
Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010
Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010
 
Developing a Computerized Adaptive Test
Developing a Computerized Adaptive TestDeveloping a Computerized Adaptive Test
Developing a Computerized Adaptive Test
 
Senior TestAnalyst_Anjana Manoharan
Senior TestAnalyst_Anjana ManoharanSenior TestAnalyst_Anjana Manoharan
Senior TestAnalyst_Anjana Manoharan
 
Instance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringInstance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software Engineering
 
Odin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionOdin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_Prediction
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
Predicting best classifier using properties of data sets
Predicting best classifier using properties of data setsPredicting best classifier using properties of data sets
Predicting best classifier using properties of data sets
 
Orthogonal Array Testing
Orthogonal Array TestingOrthogonal Array Testing
Orthogonal Array Testing
 
Test Case Naming 02
Test Case Naming 02Test Case Naming 02
Test Case Naming 02
 
Test Driven Simulation Modelling
Test Driven Simulation Modelling Test Driven Simulation Modelling
Test Driven Simulation Modelling
 
Countries’ presentation on internal quality control: Malaysia
Countries’ presentation on internal quality control: MalaysiaCountries’ presentation on internal quality control: Malaysia
Countries’ presentation on internal quality control: Malaysia
 

Similaire à A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems

Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Tamas Jambor
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsAravindharamanan S
 
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAminaRepo
 
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing TestsHo Chi Minh City Software Testing Club
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4Khadija Atiya
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET Journal
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationMaruf Aytekin
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
Recommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersRecommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersOindrila Sen
 
Ensemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized RecommendationEnsemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized RecommendationLiang Tang
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 
20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptxMinh Nguyen
 

Similaire à A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems (20)

Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
 
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
 
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing Tests
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation System
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
SLALOM Project Technical Webinar 20151111
SLALOM Project Technical Webinar 20151111 SLALOM Project Technical Webinar 20151111
SLALOM Project Technical Webinar 20151111
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Recommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersRecommend Products To Intsacart Customers
Recommend Products To Intsacart Customers
 
Ensemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized RecommendationEnsemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized Recommendation
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx
 

Plus de Alan Said

Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems ResearchAlan Said
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsAlan Said
 
Information Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationAlan Said
 
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...Alan Said
 
A 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System EvaluationA 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System EvaluationAlan Said
 
State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012Alan Said
 
RecSysChallenge Opening
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge OpeningAlan Said
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Estimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyAlan Said
 
Users and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsAlan Said
 
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Alan Said
 
CaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationAlan Said
 
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesAlan Said
 
Inferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceAlan Said
 
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityAlan Said
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsAlan Said
 

Plus de Alan Said (16)

Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems Research
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
 
Information Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System Evaluation
 
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
 
A 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System EvaluationA 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System Evaluation
 
State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012
 
RecSysChallenge Opening
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge Opening
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Estimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User Study
 
Users and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender Systems
 
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
 
CaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening Presentation
 
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
 
Inferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender Performance
 
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 

Dernier

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems

  • 1. A  Top-­‐N  Recommender  System  Evalua8on   Protocol  Inspired  by  Deployed  Systems   Alan  Said,  Alejandro  Bellogín,  Arjen  De  Vries   CWI   @alansaid,  @abellogin,  @arjenpdevries  
  • 2. Outline   •  Evalua8on   –  Real  world     –  Offline   •  Not  algorithmic  comparison!     •  Comparison  of  evalua8on   •  Protocol   •  Experiments  &  Results   •  Conclusions   2013-­‐10-­‐13   LSRS'13   2  
  • 4. Evalua8on   •  Does  p@10  in  [Smith,2010a]  measure  the  same  quality  as  p@10  in  [Smith, 2012b]?   –  Even  if  it  does   •  is  the  underlying  data  the  same?   •  was  cross-­‐valida8on  performed  similarly?   •  etc.   2013-­‐10-­‐13   LSRS'13   4  
  • 5. Evalua8on   •  What  metrics  should  we  use?   •  How  should  we  evaluate?   –  Relevance  criteria  for  test  items   –  Cross  valida8on  (n-­‐fold,  random)   •  Should  all  users  and  items  be  treated  the  same  way?   –  Do  certain  users  and  items  reflect  different  evalua8on  quali8es?     2013-­‐10-­‐13   LSRS'13   5  
  • 6. Offline  Evalua8on   Recommender  System  accuracy  evalua8on  is  currently  based  on  methods   from  IR/ML   –  –  –  –  –  One  training  set   One  test  set   (One  valida8on  set)   Algorithms  are  trained  on  the  training  set   Evaluate  using  metric@N  (e.g.  p@N  –  a  page  size)   •  Even  when  N  is  larger  than  the  number  of  test  items   •  p@N  =  1.0  is  (almost)  impossible   2013-­‐10-­‐13   LSRS'13   6  
  • 7. Evalua8on  in  produc8on   •  One  dynamic  training  set   –  All  of  the  available  data  at  a  certain  point  in  8me   –  Con8nuously  updated   •  No  test  set     –  Only  live  user  interac8ons   •  Clicked/purchased  items  are  good  recommenda8ons   Can  we  simulate  this  offline?   2013-­‐10-­‐13   LSRS'13   7  
  • 8. Evalua8on  Protocol   •  •  •  •  Based  on  “real  world”  concepts   Uses  as  much  available  data  as  possible   Trains  algorithms  once  per  user  and  evalua8on  selng  (e.g.  N)   Evaluates  p@N  when  there  are  exactly  N  correct  items  in  the  test  set   –  possible  p@N  =  1  (gold  standard)   2013-­‐10-­‐13   LSRS'13   8  
  • 9. Evalua8on  Protocol   Three  concepts:   1.  Personalized  training  &  test  sets   2.  3.  –  Use  all  available  informa8on  about  the  system  for  the  candidate  user   –  Different  test/training  sets  for  different  levels  of  N   Candidate  item  selec8on  (items  in  test  sets)   –  Only  “good”  items  go  in  test  sets  (no  random  80%-­‐20%  splits)   –  How  “good”  an  item  is  is  based  on  each  user’s  personal  preference   Candidate  user  selec8on  (users  in  test  sets)   –  Candidate  users  must  have  items  in  the  training  set   –  When  evalua8ng  p@N,  each  user  in  test  set  should  have  N  items  in  test  set   •  Effec8vely  precision  becomes  R-­‐precision   Train  each  algorithm  once  for  each  user  in  the  test  set  and  once  for  each  N.       2013-­‐10-­‐13   LSRS'13   9  
  • 12. Experiments   –  Movielens  100k   •  •  •  •  Minimum  20  ra8ngs  per  user   943  users   6.43%  density   Not  realis8c   –  Movielens  1M  sample   •  100k  ra8ngs   •  1000  users   •  3.0%  density   •  number  of  users   Datasets:   10   1   10   100   number  of  raAngs   1000   100   number  of  raAngs   1000   12   100   Algorithms   –  SVD   –  User-­‐based  CF  (kNN)   –  Item-­‐based  CF   2013-­‐10-­‐13   number  of  users   •  100   10   1   LSRS'13   10  
  • 13. Experimental  Selngs   According  to  proposed  protocol:   •  Evaluate  R-­‐precision  for   N=[1,5,10,20,50,100]   •  Users  evaluated  at  N  must  have  at   least  N  items  rated  above  the   relevance  threshold  (RT)   •  RT  depends  on  the  users  mean   ra8ng  and  standard  devia8on   Baseline   •  Evaluate  p@N  for   N=[1,5,10,20,50,100]   •  80%-­‐20%  training-­‐test  split   •  Number  of  runs:  |N|*|users|   •  Number  of  runs:  1     2013-­‐10-­‐13   –  Items  in  test  set  rated  at  least  3     LSRS'13   13  
  • 14. Results   User-­‐based  CF  ML1M  sample   2013-­‐10-­‐13   LSRS'13   14  
  • 15. User-­‐based  CF  ML1M  sample   User-­‐based  CF  ML100k   SVD  ML1M  sample   SVD  ML1M  sample   2013-­‐10-­‐13   Results   LSRS'13   15  
  • 16. Results   What  about  8me?   –  |N|*|users|  vs.  1?   –  Trade-­‐off  between  a  realis8c   evalua8on  and  complexity?   2013-­‐10-­‐13   LSRS'13   16  
  • 17. Conclusions   •  We  can  emulate  a  realis8c  produc8on  scenario  by  crea8ng  personalized   training/test  sets  and  evalua8ng  them  for  each  candidate  user  separately   •  We  can  see  how  well  a  recommender  performs  at  different  levels  of  recall   (page  size)   •  We  can  compare  towards  a  gold  standard   •  We  can  reduce  evalua8on  8me   2013-­‐10-­‐13   LSRS'13   17  
  • 18. Ques8ons?   •  Thanks!     •  Also:  check  out   –  ACM  TIST  Special  Issue  on  RecSys  Benchmarking  –  bit.ly/RecSysBe     –  The  ACM  RecSys  Wiki  –  www.recsyswiki.com     2013-­‐10-­‐13   LSRS'13   18