An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

WWW 2015 – 7th International Workshop on Web Intelligence & Communities
May 18, 2015 in Florence, Italy
An evaluation of SimRank and Personalized PageRank to
build a recommender system for the Web of Data
Phuong T. Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio
{phuong.nguyen, paolo.tomeo, tommaso.dinoia, eugenio.disciascio}@poliba.it
Polytechnic University of Bari - Bari (ITALY)

Outline
 Introduction
 Recommender systems
 Linked Open Data
 SimRank and Personalized PageRank
 Experimental Evaluation
 Conclusions

Introduction
 Content-based recommender systems base on the
notion of similarity between items, but obviously
they need content
 Web of data is an opportunity to foster data-
intensive applications
 We have investigated how effectively SimRank and
Personalized PageRank work with Linked Data in the
recommendation task.

Recommender Systems (RSs) are software tools and
techniques providing suggestions for items to be of
use to a user.
[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
Recommender Systems

Recommender Systems (RSs) are software tools and
techniques providing suggestions for items to be of
use to a user.
[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
Recommender Systems
• Content-based filtering
• Collaborative filtering

Content-based RSs try to recommend items similar to those a given user
has liked in the past. The recommendations are based upon a description
of the items and a profile of the user’s interests
Recommender
System
User profile
Content-based RSs
Items description
Personalized
Recommendations

We need domain knowledge
and rich descriptions of the items
No suitable suggestion without enough information
Quality of CB recommendations depends on quantity and quality of
the features explicitly associated to the items
Main drawback: Limited Content Analysis
P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In
Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners, Chapter 3, Springer, 2010.

Linked Open Data
• Initiative for publishing and connecting data on the Web using
Semantic Web technologies
• >30 billion of RDF triples from hundreds of data sources
• Good resource to mine existing information and deduce new
knowledge

Florence in DBpedia
Florence
Formers capital of Italydcterms:subject
Tuscany
dbpedia-owl:region
Italy
dbpedia-owl:country
.
.
.
Example of facts in Triple-form: subject- predicate - object

Enrich Data model
Catalog Items Knowledge Graph
1 – Mapping (Entity linking)
2 – Subgraph extraction

Enrich Data model
Catalog Items Knowledge Graph
Each item is represented by a part of this subgraph.
We need for algorithms able to compute similarity between graphs.

SimRank
SimRank computes similarity between nodes in a graph using the
structural context: two nodes are similar if they are referenced by
similar nodes.
G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In ACM KDD ’02, pages 538–543, 2002.
Given k ≥ 0, R(k) (α, β) = 1 with α = β. R(k) (α, β) = 0 with k = 0 and α = β.
Otherwise, the general formula is

Personalized PageRank
A node receives an amount of rank from every node which points to
it and in turn transfers an amount of its rank to the nodes it refers to.
T. H. Haveliwala. Topic-sensitive pagerank. In ACM WWW ’02, pages 517–526, 2002. ACM.
The similarity between two items α and β represented by vectors α = {ai }i=1,..,n
and β = {bi }i=1,..,n is computed as the inner product space between the two vectors

K-Nearest Neighbors
K-Nearest Neighbors (k-NN) algorithm find the set neighbors(α)
containing the k most similar entities β to a given item α using a
similarity function sim(α, β)
A k-NN recommendation algorithm predicts the rating for item α for
an user u, taking into account the neighbors of α and the profile of u

Evaluation Methodology
• Top-N Item recommendation task, for different number of N
(from top-1 to top-50 )
• 60-40% holdout split for each user
• Baselines: Vector Space Model (VSM) and a simplified variation of
the algorithm proposed in [T. Di Noia, R. Mirizzi, V. C. Ostuni, D. Romito, and
M. Zanker. Linked open data to support content-based recommender systems. In
ACM I-SEMANTICS ’12. ACM]
• The results for the four algorithms were computed with 10, 20, 40,
60, 80 neighbors

Evaluation Metrics
Accuracy Precision (P@N)
Recall (R@N)
Sales diversity Catalog coverage
Entropy
Gini index
Novelty Long-tail percentage
Expected Popularity Complement (EPC@N)

Dataset
Subset of Last.fm mapped to DBpedia
• 1,867 users
• 700 most popular artists and bands
• 47,330 ratings
• 113,386 number of extracted triples
Mappings
http://sisinflab.poliba.it/semanticweb/lod/recsys/datasets/

Properties extracted
Properties extracted from DBpedia (113,386 extracted triples)
Inbound Outbound
dbo:producer
dbo:artist
dbo:writer
dbo:associatedBand
dbo:associatedMusicalArtist
dbo:musicalArtist
dbo:musicComposer
dbo:bandMember
dbo:formerBandMember
dbo:starring
dbo:composer
dcterms:subject
dbo:genre
dbo:associatedBand
dbo:associatedMusicalArtist
dbo:instrument
dbo:occupation
dbo:birthPlace
dbo:background

Accuracy with 40 neighbors
PageRank and SimRank more accurate

Catalog coverage with 40 neighbors
Worst coverage

Distribution with 40 neighbors
Recommendations concentred on a few items

Novelty Results

Conclusions
SimRank and Personalized PageRank obtained interesting
results compared to the two baselines in terms of precision,
recall and novelty, even though catalog coverage and items
distribution decrease
Future work:
Same experiments on two more dataset: Movielens and
TheLibraryThing
Same experiments using other graph similarity metrics

Thanks for your
attention!
Q & A

An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

Similaire à An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data (20)

Dernier

Dernier (20)

An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

Notes de l'éditeur