Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Linked Open Data-enabled Strategies for Top-N Recommendations

1 995 vues

Publié le

Linked Open Data-enabled Strategies for Top-N Recommendations - Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro - 1st Workshop on New Trends in Content-based Recommender Systems, co-located with ACM Recommender Systems 2014

  • Soyez le premier à commenter

Linked Open Data-enabled Strategies for Top-N Recommendations

  1. 1. CBRecSys 2014 Workshop on New Trends in Content-based Recommender Systems Foster City (CA, United States) October 6, 2014 Linked Open Data-enabled Strategies for Top-N Recommendations Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
  2. 2. Outline • Background • Content-based RecSys (CBRS) • Limitations • Linked Open Data • What? • Introducing LOD in CBRS • Experiments • Conclusions Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 2 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  3. 3. Content-based Recommender Systems Suggest items similar to those the user liked in the past (I bought Converse shoes, I’ll continue buying similar sport shoes) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 3 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  4. 4. Content-based Recommender Systems Limitations Limited content 4 (in several domains) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  5. 5. Content-based Recommender Systems Limitations Poor Semantics Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 5 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  6. 6. How can we boost Content-based Recommender Systems with Semantics? (and with more content) 6 Problem Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  7. 7. 7 Semantics in CBRS State of the art Ontologies X Folksonomies Distributional Semantics Encyclopedic Knowledge Linked Open Data Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  8. 8. 8 Top-down approaches What is the difference? X Formal Semantics Large-scale Folksonomies X X Ontologies V X Encyclopedic Knowledge X V Distributional Semantics X V Linked Open Data V V Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  9. 9. 9 Top-down approaches What is the difference? X Formal Semantics Large-scale Folksonomies X X Ontologies V X Encyclopedic Knowledge X V Distributional Semantics X V Linked Open Data V V Linked Open Data merge the vastness of encyclopedic knowledge with the formal semantics typical of ontologies Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  10. 10. 10 Top-down approaches What is the difference? X We focus on the introduction of Formal Semantics Large-scale Folksonomies X X Linked Open Data in Ontologies V X Content-based Recommender Encyclopedic Knowledge X V Systems Distributional Semantics X V Linked Open Data V V Linked Open Data merge the vastness of encyclopedic knowledge with the formal semantics typical of ontologies Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  11. 11. 11 Linked Open Data What are we talking about? Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  12. 12. 12 Linked Open Data Definition Methodology to publish, share and link structured data on the Web Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  13. 13. 13 Linked Open Data (cloud) What is it? A (large) set of interconnected semantic datasets Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  14. 14. 14 Linked Open Data (cloud) What kind of datasets? Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  15. 15. 15 Linked Open Data (cloud) DBpedia http://dbpedia.org Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  16. 16. 16 Linked Open Data (cloud) http://dbpedia.org DBpedia DBpedia is the structured mapping of Wikipedia It is the core of the LOD cloud. Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  17. 17. 17 Linked Open Data (cloud) Example: unstructured content from Wikipedia example “Foster City is a town in United States located in California” (from Wikipedia page) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  18. 18. 18 Linked Open Data (cloud) How are these data represented? Semantic Web cake Information from the LOD cloud is represented in RDF Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  19. 19. “Foster City is a town in United States located in California” 19 Linked Open Data (cloud) How are these data represented? Foster City United States http://dbpedia.org/resource/United_States California http://dbpedia.org/resource/Foster_City,_California http://dbpedia.org/resource/California dbpedia-owl:country dbpedia-owl:isPartOf example (from Wikipedia page) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  20. 20. “Foster City is a town in United States located in California” 20 Linked Open Data (cloud) How are these data represented? Data coming from the LOD cloud have a formal semantics represented in RDF Foster City United States http://dbpedia.org/resource/United_States California http://dbpedia.org/resource/Foster_City,_California http://dbpedia.org/resource/California dbpedia-owl:country dbpedia-owl:isPartOf example (from Wikipedia page) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  21. 21. 21 Our checklist Can Linked Open Data boost content-based recommender systems? More Semantics More Content V ? Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  22. 22. 22 Linked Open Data (cloud) How many data? Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  23. 23. 23 Linked Open Data (cloud) How many data? 1048 datasets and 58 billions triples source: http://stats.lod2.eu Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  24. 24. 24 Our checklist Can Linked Open Data boost content-based recommender systems? More Semantics More Content V V Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  25. 25. 25 Our checklist Can Linked Open Data boost content-based recommender systems? More Semantics More Content V V …but Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  26. 26. 26 Research Question Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  27. 27. 27 Approach We propose two methodologies to introduce LOD-based features into CBRS Direct Access to DBpedia Entity Linking algorithms Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  28. 28. Introducing LOD-based features in CBRS 28 Methodology :: Direct Access to DBpedia (We assume that each item to be recommender is already in the LOD cloud) The simplest way to introduce LOD-based features Domain-dependent features are manually defined 1. 2. (e.g. book recommendation —> genre, author, publisher, subject, etc.) SPARQL queries extract features’ values Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  29. 29. Introducing LOD-based features in CBRS Example: The Great and Secret Show (Clive Barker’s book) 29 Methodology :: Direct Access to DBpedia Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  30. 30. Introducing LOD-based features in CBRS 30 Methodology :: Direct Access to DBpedia e.g. Book Recommendation: author, genre, publisher, subject Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  31. 31. Introducing LOD-based features in CBRS 31 Methodology :: Direct Access to DBpedia Each item is represented through the set of the (manually defined) features extracted from the LOD cloud. Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  32. 32. Introducing LOD-based features in CBRS 32 Methodology :: Direct Access to DBpedia 9 LOD-based features: author (Clive Barker), genre (Fantasy Literature), publisher (William Collins), series (Books of the Art), subject (1980s fantasy novels, William Collins books, Novels by Clive Barker, British Fantasy Novels) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  33. 33. 33 Direct Access to DBpedia Analysis - Very Straightforward approach - SPARQL queries can be easily built - Properties are manually defined - Approach is strongly domain-dependent - Does not exploit unstructured information Pros: Cons: Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  34. 34. Introducing LOD-based features in CBRS Methodology :: Entity Linking algorithms • Entity Linking Algorithms! • Input: free text. • items description, in our setting • Output: identification of the most relevant entities mentioned in the text. • State of the art • tag.me(1), • DBpedia Spotlight(2), • Wikipedia Miner(3) (1) http://tagme.di.unipi.it (2) http://spotlight.dbpedia.org (3) http://wikipedia-miner.cms.waikato.ac.nz Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 34 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  35. 35. Introducing LOD-based features in CBRS Methodology :: Entity Linking algorithms • Entity Linking Algorithms! • Input: free text. • items description, in our setting • Output: identification of the most relevant entities mentioned in the text. • State of the art • tag.me(1), • DBpedia Spotlight(2), • Wikipedia Miner(3) (1) http://tagme.di.unipi.it (2) http://spotlight.dbpedia.org (3) http://wikipedia-miner.cms.waikato.ac.nz Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 35 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  36. 36. Introducing LOD-based features in CBRS 36 Methodology :: Entity Linking algorithms • Entity Linking Algorithms! • Input: free text. • in this setting: textual description of the items (e.g. Wikipedia abstract) • Output: identification of the most relevant entities mentioned in the text. from Tagme Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  37. 37. Introducing LOD-based features in CBRS Entity Linking - output 37 Methodology :: Entity Linking algorithms Very human-readable representation! Free n-grams and entity recognition, free sense disambiguation Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  38. 38. Introducing LOD-based features in CBRS Entity Linking - output not a simple textual feature! Each entity is a reference to a DBpedia node http://dbpedia.org/resource/Harry_D'Amour 38 Methodology :: Entity Linking algorithms Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  39. 39. Introducing LOD-based features in CBRS Methodology :: Entity Linking algorithms LOD-based representation can be enriched! through broader categories by exploiting SPARQL queries 39 encoded in the dcterms:subject property Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  40. 40. Introducing LOD-based features in CBRS The final representation of each item is obtained by merging the DBpedia nodes identified in the text with those the dcterms:subjects property refers to (broader categories) dbpedia nodes+ broader categories Features = 40 Methodology :: Entity Linking algorithms Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  41. 41. 41 Entity Linking Algorithms Analysis Pros: Cons: - Exploit unstructured information - Very general approach - May introduce unexpected (but relevant) features - Strong features engineering (which ones are the best?) - Threshold score of Entity Linking algorithms is difficult to be set Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  42. 42. 42 LOD-based features in CBRS Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  43. 43. Experimental Evaluation Research Hypothesis 43 1. Which is the contribution of the Linked Open Data features to the accuracy of recommendation algorithms? 2. Does the representation based on Linked Open Data outperform existing state-of-the-art recommendation algorithms? Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  44. 44. Experimental Evaluation Description of the dataset 44 • Book recommendation • ESWC 2014 Challenge Dataset (*) • 6,733 books • 6,181 users • 72,372 binary ratings • 11.71 ratings/user • Very sparse dataset! • Only 5.37 positive ratings/user! (*) http://challenges.2014.eswc-conferences.org/index.php/RecSys Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  45. 45. Experimental Evaluation Feature combinations 45 • Content (crawled from Wikipedia + NLP processing) • LOD (direct access to DBpedia) • Entity Linking (Tagme) • Content + LOD • Content + Entity Linking • LOD + Entity Linking • All 7 combinations for each run Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  46. 46. Experimental Evaluation Setup 46 • Evaluation of the effectiveness of LOD-based features on varying six different recommendation algorithms • Vector Space Models • VSM • BM25 • eVSM (*) • Classifiers • Random Forests • Linear Regression • Graph-based Approaches • PageRank with Priors (*) C. Musto: Enhanced vector space models for content-based recommender systems. RecSys 2010: 361-364 Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  47. 47. Experimental Evaluation Design of the Experiment :: Vector Space Models 47 User profile (built upon the features describing the items the user liked) used as query Cosine Similarity to get the most similar items Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  48. 48. Experimental Evaluation Design of the Experiment :: Classifiers 48 Random Forests learn a classification model which is used to predict the class (positive/negative) of unlabeled item.! Model is based! on the features coming from labeled items. Linear Regression also uses “basic” features (e.g. positive and negative ratings, average rating of the user, ratio between positive and negative ratings, etc.) to learn the model. Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  49. 49. Experimental Evaluation Design of the Experiment :: PageRank with Priors (PRP) graph-based representation users, items = nodes positive feedback = edges PageRank calculates the ‘importance’ of a node according to the quality and the number of its connections Equal probability is assigned to all the nodes, by default Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 49 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  50. 50. Experimental Evaluation Design of the Experiment :: PageRank with Priors (PRP) graph-based representation users, items = nodes positive feedback = edges PageRank calculates the ‘importance’ of a node according to the quality and the number of its connections PageRank with Priors introduces a bias towards some nodes ! (in our setting, the items the user liked) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 50 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  51. 51. Experimental Evaluation Design of the Experiment :: PageRank with Priors (PRP) 51 Several strategies to build the graph are compared 1. no-LOD. Graph only models users and items 2. small-LOD. Graph expanded with new nodes by adding basic properties (subject, genre, publisher, author, etc.), of the items as well as their relationships 3. big-LOD. Graph is further expanded by introducing more nodes (e.g. other resources of the same genre, other resources written by the authors, etc.), as well as their relationships Rationale: the introduction of new nodes and connections coming from the LOD cloud can improve the effectiveness of the PageRank. Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  52. 52. Experimental Evaluation Design of the Experiment :: PageRank with Priors (PRP) 52 Several strategies to build the graph are compared 1. no-LOD. Graph only models users and items 2. small-LOD. Graph expanded with new nodes by adding basic properties (subject, genre, publisher, author, etc.), of the items as well as their relationships 3. big-LOD. Graph is further expanded by introducing more nodes (e.g. other resources of the same genre, other resources written by the authors, etc.), as well as their relationships PRP is run and items in the test set are ranked according to their PageRank Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  53. 53. Experimental Evaluation Recap 6 algorithms 7 set of features • Content • LOD • Entity Linking • Content + LOD • Content + Entity Linking • LOD + Entity Linking • All • VSM • BM25 • eVSM • Linear Regression • Random Forests • Page Rank With Priors Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 53 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  54. 54. Experiment 1 54 Impact of LOD-based features. Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  55. 55. Impact of LOD-based features :: VECTOR SPACE MODEL CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 54,62 54,42 54,59 54,47 54,36 54,69 53,79 +0,17 +0,05 53 53,5 54 54,5 55 55 LOD-based features improve F1-measure Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  56. 56. Impact of LOD-based features :: VECTOR SPACE MODEL CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 54,62 54,42 54,59 54,47 54,36 paired t-test (p<0.01) 54,69 53,79 +0,17 +0,05 53 53,5 54 54,5 55 56 Statistically significant improvement Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  57. 57. Impact of LOD-based features :: VECTOR SPACE MODEL CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 54,62 54,42 54,59 54,47 +0,27 54,36 54,69 53,79 paired t-test (p<0.01) 53 53,5 54 54,5 55 57 Best: LOD+Entity Linking (No Content!) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  58. 58. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 54,43 54,56 54,51 54,6 -1,00% 53,9 53,91 53,43 53 53,5 54 54,5 55 58 Impact of LOD-based features :: BM25 Worst (again): LOD alone Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  59. 59. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 54,43 54,56 54,51 54,6 53,9 53,91 53,43 +0,17 paired t-test (p<0.01) 53 53,5 54 54,5 55 59 Impact of LOD-based features :: BM25 Best (again): LOD+Entity Linking (With Content!) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  60. 60. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 52,9 53,07 52,8 53,04 53,02 paired t-test (p<0.01) 53,37 52,06 +0,47 +0,17 +0,14 +0,12 51 51,75 52,5 53,25 54 60 Impact of LOD-based features :: EVSM Introduction of LOD-based features leads to an improvement again Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  61. 61. Experiment 1 Impact of LOD-based features :: LESSONS LEARNED FOR VSMS 61 VSM BM25 eVSM 1. 2. LOD features alone are always the worst configuration. (At least) a LOD-based representation based on Entity Linking always improve the content alone Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  62. 62. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL 53,86 Experiment 1 53,68 53,75 53,76 53,77 53,34 53,52 +0,36 53 53,25 53,5 53,75 54 62 Impact of LOD-based features :: RANDOM FORESTS Similar outcomes: all but LOD alone lead to improvement Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  63. 63. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL 53,86 Experiment 1 53,68 53,75 53,76 53,77 53,34 53,52 +0,36 53 53,25 53,5 53,75 54 63 Impact of LOD-based features :: RANDOM FORESTS Content does matter: LOD+entity+content is the best Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  64. 64. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 55,59 55,59 55,67 55,64 55,61 +0,08 55,5 55,57 paired t-test (p<0.01) 55 55,25 55,5 55,75 56 64 Impact of LOD-based features :: LINEAR REGRESSION Entity-based representation is the best one Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  65. 65. CONTENT LOD ENTITY CONTENT+LOD CONTENT+ENTITY LOD+ENTITY ALL Experiment 1 55,59 55,59 55,67 55,64 55,61 +0,08 55,5 55,57 paired t-test (p<0.01) 55 55,25 55,5 55,75 56 65 Impact of LOD-based features :: LINEAR REGRESSION BTW, smaller improvements (due to basic features?) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  66. 66. Experiment 1 Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS 66 RF LR 1. 2. LOD features alone never overcome the content (At least) a LOD-based representation based on Entity Linking always improve the content alone Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  67. 67. Experiment 1 Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS 67 Same LR outcomes RF (algorithm-independent behaviour) 1. 2. LOD features alone never overcome the content (At least) a LOD-based representation based on Entity Linking always improve the content alone Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  68. 68. Experiment 1 Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS 68 Same LR outcomes RF (algorithm-independent behaviour) 1. 2. LOD features alone never overcome the content (At least) a LOD-based representation based on Entity Linking always improve the content alone Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  69. 69. Experiment 1 Impact of LOD-based features :: PAGERANK WITH PRIORS +0,45 55,44 54,73 54,28 +1,16 paired t-test (p<0.001) 53 54 55 56 57 69 NO-LOD SMALL-LOD BIG-LOD The more LOD-based data, the best the accuracy Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  70. 70. Impact of LOD-based features :: PAGERANK WITH PRIORS NO-LOD SMALL-LOD BIG-LOD Experiment 1 55,44 54,73 54,28 53 54 55 56 57 Drawback: more nodes produce an exponential growth of computational costs (from 3 hours to 120 hours to run the experiment!) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014 70 +0,45 +1,16 paired t-test (p<0.001)
  71. 71. [*] V. Ostuni, T. Di Noia, E. Di Sciascio, R. Mirizzi: Top-N recommendations from implicit feedback leveraging Linked Open Data. RECSYS 2013 [+] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme: BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI 2009. Experiment 2 71 Comparison to State of the art SPRANK (Semantic Path Ranking)[*] BPRMF (Bayesian Personalized Ranking) [+] U2U_CF (User to User CF) I2I_CF (Item to Item CF) Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  72. 72. VSM LR PRP SPRANK BPRMF U2U_CF I2I_CF Experiment 2 52,27 52,28 52,24 54,12 55,67 55,44 54,69 baselines 51 52,25 53,5 54,75 56 Our best-performing configurations are considered as baseline 72 Comparison to state of the art Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  73. 73. VSM LR PRP SPRANK BPRMF U2U_CF I2I_CF Experiment 2 52,27 52,28 52,24 54,12 55,67 55,44 54,69 51 52,25 53,5 54,75 56 Classical CF techniques poorly performs (sparsity?) 73 Comparison to state of the art Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  74. 74. VSM LR PRP SPRANK BPRMF U2U_CF I2I_CF Experiment 2 52,27 52,28 52,24 54,12 55,67 55,44 54,69 ! -3,4% 51 52,25 53,5 54,75 56 74 Comparison to state of the art +3,4% over LOD-based state of the art algorithm Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  75. 75. VSM LR PRP SPRANK BPRMF U2U_CF I2I_CF Experiment 2 52,27 52,28 52,24 54,12 +0,57 55,67 55,44 54,69 +1,55 51 52,25 53,5 54,75 56 75 Comparison to state of the art Our approaches overcome Matrix Factorization Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014 +0,32
  76. 76. Conclusions Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. 76 Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  77. 77. Lessons Learned INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA IN Two Solutions have been proposed.! Direct Access to DBpedia and Entity Linking Algorithms! ! Evaluation.! Research Question: What is the impact of LOD-based features on VSM, Classifiers and Graph-based Algorithms?! All recommendation approaches significantly benefit of the introduction of LOD-based features! Our best-performing configurations overcomes both collaborative and LOD-based state of the art algorithms 77 CONTENT-BASED RECOMMENDATION TASKS Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  78. 78. Future Research 78 Evaluation against different datasets and stronger baselines; Better (automatic) tuning of parameters and integration of more LOD-based datasources Evaluation of Novelty, Diversity and Serendipity on LOD-based Recommendations; Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
  79. 79. questions? Cataldo Musto, Ph.D cataldo.musto@uniba.it

×