How to Troubleshoot Apps for the Modern Connected Worker
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features
1. @cataldomusto
Semantics-aware Recommender Systems
Exploiting Linked Open Data
and Graph-based Features
CATALDO MUSTO, PASQUALE LOPS, MARCO DE GEMMIS, GIOVANNI SEMERARO
UNIVERSITÀ DEGLI STUDI DI BARI ‘ALDO MORO’ - ITALY
World Wide Web Conference 2018
Journal Track
Lyon, France
April 25, 2018
cataldo.musto@uniba.it
2. What are we going to talk about?
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
3. Technology able to push
relevant items (movies, news,
books, etc.) to the user
according to her preferences.
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Recommender Systems
4. Recommender Systems
Largely adopted in industry
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
5. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Recommendation Paradigms
Collaborative
Filtering
Content-based
RecSys
6. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Collaborative
Filtering
Recommendation Paradigms
Exploits the preferences of the
community to generate
recommendations.
Insight: to suggest items liked
by users similar to the target
one
7. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Content-based
RecSys
Recommendation Paradigms
Exploit descriptive features
of the items (e.g. genre of a
book, director of a movie) to
generate recommendations.
Insight: to suggest items
similar to those the user
already liked
8. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Hybrid RecSys
+
Combine different recommendation paradigms to provide recommendations.
Advantage: to merge the strength of each paradigm in a unique representation
Recommendation Paradigms
9. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Hybrid Recommender Systems
• Several hybridization strategies presented in literature (*)
• «Feature Combination» strategy
• Learns a representation of the item that combines heterogeneous
groups of features
• Each representation can be used as a positive/negative example to
tackle the recommendation problem as a text classification one
(*) Burke, Robin. "Hybrid recommender systems:
Survey and experiments." User modeling and
user-adapted interaction 12.4 (2002): 331-370.
10. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Hybrid Recommender Systems
• Several hybridization strategies presented in literature
• «Feature Combination» strategy
• Learns a representation of the item that combines heterogeneous
groups of features
…..
11. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Hybrid Recommender Systems
• Several hybridization strategies presented in literature
• «Feature Combination» strategy
• Learns a representation of the item that combines heterogeneous
groups of features
…..
Group 1 Group 2 Group 3 Group N
12. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Hybrid Recommender Systems
• Several hybridization strategies presented in literature
• «Feature Combination» strategy
• Learns a representation of the item that combines heterogeneous groups of
features
• Groups of basic features
• Popularity-based features
• Collaborative features
• Content-based features
• … etc.
13. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Basic features: popularity-based
•Features based on the popularity of the item
•Examples
• #users who have voted the item
• #user who have positively voted the item
• #ratio of positive ratings
• …etc
…..
Popularity-based Group 2 Group 3 Group N
10 7 0.7 1
14. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Basic features: collaborative
•Features based on the behavior of the users
•Based on the user-item rating matrix
• A column vector is extracted from the matrix and it is
encoded in the representation of the item
…..
Popularity-based Collaborative Group 3 Group N
10 7 0.7 1 1 1 1 1 10 0
i1 i2 … iN
u1 0 1 1 1
u2 1 1 0 1
.. … … … …
uN 0 1 0 0
15. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Basic features: collaborative
•Features based on the behavior of the users
•Based on the user-item rating matrix
• A column vector is extracted from the matrix and it is
encoded in the representation of the item
…..
Popularity-based Collaborative Group 3 Group N
10 7 0.7 1 1 1 1 1 10 0
i1 i2 … iN
u1 0 1 1 1
u2 1 1 0 1
.. … … … …
uN 0 1 0 0
16. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Basic features: content-based
•Features based on the textual description of the item
•Based on natural language processing techniques
• Textual description is processed
• Typically stopwords are removed and lemmatization is performed
• Sometimes more sophisticated techniques (e.g. semantic analysis, entity linking, etc.) are
implemented
• Output: a set of tokens describing the item, modeled as a vocabulary
…..
Popularity-based Collaborative Content-based Group N
10 7 0.7 1 1 1 1 1 10 0
17. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Basic features: content-based
…..
Popularity-based Collaborative Content-based Group N
10 7 0.7 1 1 1 1 1 10 0
Feature Score
science-fiction 1
horror 0
programmer 1
lawyer 0
…. …
dystopian 1
1 10 1
18. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Research Question
…..
Popularity-based Collaborative Content-based Group N
10 7 0.7 1 1 1 1 1 10 0
Is there any other information source that can
be exploited to enrich item representation in a
hybrid recommendation framework?
?
1 10 1
19. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
20. Linked Open Data cloud
This is the Linked Open Data cloud
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
21. Linked Open Data cloud
This is the Linked Open Data cloud
It is a (huge) set of interconnected
semantic datasets
Each bubble is a dataset!
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
22. Linked Open Data cloud
This is the Linked Open Data cloud
It is a (huge) set of interconnected
semantic datasets
Each bubble is a dataset!
How many datasets we have?
149 billions triples
and 9,960 datasets
(source: http://stats.lod2.eu)
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
23. Linked Open Data cloud
The core of the
Linked Open Data cloud
is DBpedia (http://www.dbpedia.org)
RDF mapping
of Wikipedia
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
24. DBpedia
Wikipedia
Unstructured Content
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
25. DBpedia
Wikipedia
Unstructured Content
DBpedia
Structured Data (in RDF)
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
26. DBpedia
Very fine-grained
and interesting
features!
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
27. Research Question (again)
How can we use Linked Open Data features for Recommender Systems?
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
28. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Extended features: LOD-based
Such features can be
extracted from the
LOD cloud and can be
encoded as a novel
vocabulary of
features that
enriches the item
representation
29. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Extended features: LOD-based
…..
Popularity-based Collaborative Content-based LOD-based
features
10 7 0.7 1 1 1 1 1 10 0 1 10 1
Feature Score
Don Davis 1
Ennio Morricone 0
Dystopian Movie 1
Brad Pitt 0
…. …
Films about Rebellions 1
1 0 1
30. Extended features: graph-based
Graph-based features describing
the item can be inferred by
mining the structure of the
tripartite graph
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Average Neighbor degree
Degree Centrality
Node redundancy
Clustering coefficient
31. Extended features: graph-based
…..
Popularity-based Collaborative Content-based Graph-based
10 7 0.7 1 1 1 1 1 10 0 1 10 1
Feature Score
Average Neigh. Degree 1.7
Degree Centrality 2
Clustering Coefficient 0
Node Redundancy 0.3
1.7 2 0
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
32. Hybrid RecSys & LOD-based features
Research Question: what is the impact of such
features on the overall performance of a hybrid
recommendation framework?
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
33. Research Question: what is the impact of such
features on the overall performance of a hybrid
recommendation framework?
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Insight: to evaluate the performance of the
recommender system on varying of the different
groups of features
Hybrid RecSys & LOD-based features
34. Datasets
MovieLens 1M (ML1M)
6,040 users
3,883 movies
1,000,209 ratings
57.51% positive ratings
165.59 ratings/user (avg.)
269.88 ratings/item (avg.)
99.4% sparsity
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
35. Datasets
DBbook
6,181 users
6,733 movies
72,732 ratings
45.86% positive ratings
11.71 ratings/user (avg.)
10.74 ratings/item (avg.)
99.8% sparsity
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
36. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: item representation
Popularity-based Collaborative Content-based
10 7 0.7 1 1 1 1 0 0 1 10 1
We first model basic features
37. 1
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: item representation
Popularity-based Collaborative Content-based
10 7 0.7 1 1 1 1 0 0 1 10 1
Then we introduce extended features based on the LOD cloud
LOD-based
1 0 2
Graph-based
1 2
38. 1
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: item representation
Popularity-based Collaborative Content-based
10 7 0.7 1 1 1 1 0 0 1 10 1
We use them to feed a hybrid classification framework
LOD-based
1 0
Graph-based
1 22
39. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: experiments
Experiment 1: impact of basic features on the performance
10 7 0.7 1 1 1 1 0 0 1 10 1
40. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: experiments
Experiment 2: impact of extended features on the performance
11 0 21 2
41. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: experiments
Experiment 3: impact of both groups of features on the performance
11 0 21 210 7 0.7 1 1 1 1 0 0 1 10 1
42. Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Experimental Protocol: experiments
Experiment 4: comparison to state-of-the-art algorithms
11 0 21 210 7 0.7 1 1 1 1 0 0 1 10 1
43. Experiment 1
Collaborative and
popularity-based
features got the
best results
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
53,38
56,156,18
54,21
49,13
55,32
56,35 56,27
50,51
55,67
51,87
55,49
52,46
55,83
48
49
50
51
52
53
54
55
56
57
MovieLens Dbbook
Popularity (P) Collaborative (C) Content-based (T) P+C P+T C+T P+C+T
Results of the Experiments
44. Experiment 2
LOD-based and
graph-based
features alone
did not overcome
the basic
features
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
56,35 56,27
50,65
55,44
50,54
54,67
48
49
50
51
52
53
54
55
56
57
MovieLens Dbbook
Popularity+Collaborative LOD-based Graph-based
Results of the Experiments
45. Experiment 3
The injection of
LOD-based
features improves
the predictive
accuracy
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
56,35
56,27
56,42
56,59
56,21
56,07
56,78
56,67
55,5
55,7
55,9
56,1
56,3
56,5
56,7
56,9
MovieLens Dbbook
Popularity+Collaborative (Baseline) Baseline+LOD
Baseline+Graph Baseline+LOD+Graph
Results of the Experiments
46. Results of the Experiments
Experiment 4
Our framework
overcomes all the
baselines
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
56,78 56,67
42,7
51,93
43,2
51,11
52,18
52,9
52,15
53,04
53,97
55,02
54
55,4
41
43
45
47
49
51
53
55
57
MovieLens Dbbook
LOD-based RecSys U2U-KNN I2I-KNN BPRMF
BPRMF+LOD PPR PPR+LOD
U2U-KNN: User-to-User
Collaborative Filtering
I2I-KNN: Item-to-Item
Collaborative Filtering
BPRMF: Bayesian Personalized
Ranking Matrix Factorization
PPR: Personalized PageRank
47. LOD-based RecSys
Take Home Messages
Future Trends
• Linked Open Data && (Graph Embeddings || Word Embeddings)
• Linked Open Data && Different Metrics (Serendipity, Novelty, etc.)
1. Linked Open Data represent a huge data silos, which is freely available
2. They can feed machine learning models with new and relevant features
3. They improve the accuracy of recommender systems
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
48. Want to read more?
cataldo.musto@uniba.it
@cataldomusto
Cataldo Musto, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features. WebConf 2018 – Journal Track. Lyon, France. April 25, 2018
Semantics-aware Recommender Systems exploiting
Linked Open Data and graph-based features
C Musto, P Lops, M de Gemmis, G Semeraro
Knowledge-Based Systems 136, 1-14