Recommender Systems and Linked Open Data

Recommender Systems
and
Linked Open Data
Tommaso Di Noia
Polytechnic University of Bari
ITALY
11th Reasoning Web Summer School – Berlin August 1, 2015
tommaso.dinoia@poliba.it
@TommasoDiNoia

Agenda
• A quick introduction to Linked Open Data
• Recommender systems
• Evaluation
• Recommender Systems and Linked Open Data

LINKED OPEN DATA
A quick introduction to

Linked Open Data
the Giant Global Graph

Linked (Open) Data
Some definitions:
– A method of publishing data on the Web
– (An instance of) the Web of Data
– A huge database distributed in the Web
– Linked Data is the Semantic Web done right

Web vs Linked Data
Web Linked Data
Analogy File System Database
Designed for Men Machines
(Software Agents)
Main elements Documents Things
Links between Documents Things
Semantics Implicit Explicit
Courtesy of Prof. Enrico Motta, The Open University, Milton Keynes – Uk – Semantic Web: Technologies and Applications.

Which technologies?
Data
Language
Query
Language
Schema
Languages

URI
• Every resource/entity/thing/relation is
identified by a (unique) URI
– URI: <http://dbpedia.org/resource/Berlin>
– CURIE: dbpedia:Berlin
– URI: <http://purl.org/dc/terms/subject>
– CURIE: dcterms:subject

Which vocabularies/ontologies?
• Most popular on http://prefix.cc (July 25, 2015)
– YAGO: http://yago-knowledge.org/resource/
– FOAF: http://xmlns.com/foaf/0.1/
– DBpedia Ontology: http://dbpedia.org/ontology/
– DBpedia Properties:
http://dbpedia.org/property/

Which vocabularies/ontologies?
• Most popular on http://lov.okfn.org (July 25,
2015)
– VANN: http://purl.org/vocab/vann/
– SKOS: http://www.w3.org/2004/02/skos/core
– FOAF
– DCTERMS
– DCE: http://purl.org/dc/elements/1.1/

RDF – Resource Description Framework
• Basic element: triple
[subject] [predicate] [object]
URI URI
URI | Literal
"string"@lang | "string"^^datatype

dbpedia:Berlin dbo:country dbpedia:Germany .
dbpedia:Berlin rdfs:label "Berlin"@en .
dbpedia:Berlin rdfs:label "Berlino"@it .
dbpedia:Berlin dbo:populationTotal "3517424"^^xsd:integer .
dbpedia:Berlin dcterms:subject category:Capitals_in_Europe .
dbpedia:Berlin rdf:type yago:UrbanArea108675967 .
dbpedia:Germany dbo:language dbpedia:German_Language .
dbpedia:Germany dbo:firstDriverCountry dbpedia:2014_German_Grand_Prix .

Germany Berlin
2014_German_Grand_Prix
German_Language
Capitals_in_Europe
UrbanArea108675967
"Berlin"@en
"Berlin"@it
"3517424"^^xsd:integer
country
language
firstDriverCountry
type
subject
label
populationTotal

RDFS and OWL in two statements
dbo:country rdfs:range dbo:Country .
dbpedia:Berlin owl:sameAs freebase:Berlin .

SPARQL
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT DISTINCT ?name ?city
WHERE {
?city dcterms:subject category:Capitals_in_Europe.
?city rdfs:label ?name .
?city dbo:populationTotal ?population .
FILTER (?population < 30000).
}

SPARQL
curl -g -H 'Accept: application/json'
'http://dbpedia.org/sparql?query=SELECT+DISTINCT+?name+?city+WHERE+{?city+dcterms:su
bject+category:Capitals_in_Europe+.+?city+rdfs:label+?name+.+?city+dbpedia-
owl:populationTotal+?population+.+FILTER+(?population+<+30000)+.}'

RECOMMENDER SYSTEMS
The information overload problem

Personalized Information Access
• Help the user in finding the information they
might be interested in
• Consider their preferences/past behaviour
• Filter irrelevant information

Recommender Systems
• Help users in dealing with Information/Choice Overload
• Help to match users with items

Some definitions
– In its most common formulation, the recommendation problem is
reduced to the problem of estimating ratings for the items that have
not been seen by a user.
[G. Adomavicius and A. Tuzhilin. Toward the Next Generation of Recommender Systems:A survey of the State-of-the-Artand
Possible Extension. TKDE, 2005.]
– Recommender Systems (RSs) are software tools and techniques
providingsuggestions for items to be of use to a user.
[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

The problem
• Estimate a utility function to automatically
predict how much a user will like an item
which is unknown to them.
Input
Set of users
Set of items
Utility function
𝑈 = {𝑢1, … , 𝑢 𝑀}
𝑋 = {𝑥1, …, 𝑥 𝑁}
𝑓: 𝑈 × 𝑋 → 𝑅
∀ 𝑢 ∈ 𝑈, 𝑥 𝑢
′
= arg 𝑚𝑎𝑥 𝑥∈𝑋 𝑓(𝑢, 𝑥)
Output

The rating matrix
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Vito
Phuong
Jessica
Paolo

The rating matrix
(in the real world)
5 ? ? 4 3 ?
2 4 5 ? 5 ?
? 3 ? 4 ? 3
3 5 ? 5 2 ?
4 4 5 ? 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Vito
Phuong
Jessica
Paolo

How sparse is a rating matrix?

Rating Prediction vs Ranking
Best Worst

Recommendation techniques
• Content-based
• Collaborative filtering
• Demographic
• Knowledge-based
• Community-based
• Hybrid recommender systems

Collaborative Recommender Systems
Collaborative RSs recommend items to a user by identifying
other users with a similar profile
Recommender
System
User profile
Users
Item7
Item15
Item11
…
Top-N Recommendations
Item1, 5
Item2, 1
Item5, 4
Item10, 5
….
….
Item1, 4
Item2, 2
Item5, 5
Item10, 3
….
Item1, 4
Item2, 2
Item5, 5
Item10, 3
….
Item1, 4
Item2, 2
Item5, 5
Item10, 3
….

Content-based Recommender Systems
Recommender
System
User profile
Item7
Item15
Item11
…
Item1, 5
Item2, 1
Item5, 4
Item10, 5
….
Items
Item1
Item2
Item100
Item’s
descriptions
….
CB-RSs recommend items to a user based on their description
and on the profile of the user’s interests

Knowledge-based Recommender Systems
Recommender
System
Item7
Item15
Item11
…
Items
Item1
Item2
Item100Item’s
descriptions
….
KB-RSs recommend items to a user based on their description
and domain knowledge encoded in a knowledge base
Knowledge-base

Collaborative Filtering
• Memory-based
– Mainly based on k-NN
– Does not require any preliminary model building
phase
• Model-based
– Learn a predictive model before computing
recommendations

User-based Collaborative Recommendation
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Vito
Phuong
Jessica
Paolo
Pearson’s correlation coefficient
Rate prediction
= 𝑋

Item-based Collaborative Recommendation
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
𝑠𝑖𝑚 𝑥𝑖, 𝑥𝑗 =
𝑥𝑖 ⋅ 𝑥𝑗
|𝑥𝑖| ∗ |𝑥𝑗|
=
σ 𝑟𝑢,𝑥 𝑖
∗ 𝑟𝑢,𝑥 𝑗𝑢
σ 𝑟𝑢,𝑥 𝑖
2
𝑢 ∗ σ 𝑟𝑢,𝑥
2
𝑢
Cosine Similarity
Rate prediction
𝑟ǁ 𝑢𝑖, 𝑥′ =
σ 𝑠𝑖𝑚 𝑥Ԧ, 𝑥Ԧ′ ∗ 𝑟𝑥,𝑢 𝑖𝑥∈𝑋 𝑢 𝑖
σ 𝑠𝑖𝑚 𝑥Ԧ, 𝑥Ԧ′𝑥∈𝑋 𝑢 𝑖
Adjusted Cosine Similarity
= 𝑋 𝑢𝑖
Tommaso
Vito
Phuong
Jessica
Paolo

CF drawbacks
• Sparsity / Cold-start
– New user
– New item
• Grey sheep problem

Content-Based Recommender Systems
• Items are described in terms of
attributes/features
• A finite set of values is associated to each
feature
• Item representation is a (Boolean) vector

Content-based
CB-RSs try to recommend items similar* to
those a given user has liked in the past
[P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: Stateof the Art and Trends. Recommender
Systems Handbook. 2011]
• Heuristic-based
– Usually adopt techniques borrowed from IR
• Model-based
– Often we have a model for each user
(*) similar from a content-based perspective

CB drawbacks
• Content overspecialization
• Portfolio effect
• Sparsity / Cold-start
– New user

Knowledge-based
Recommender Systems
• Conversational approaches
• Reasoning techniques
– Case-based reasoning
– Constraint reasoning

Hybrid recommender systems
• Weighted
• Switching
• Mixed
• Feature combination
• Cascade
• Feature augmentation
• Meta-level
Robin D. Burke. Hybrid recommender systems:Survey and experiments. User Model. User-Adapt. Interact., 12(4):331–370, 2002.

Dataset split
20%80%
…
hold-out
k-fold cross-validation
Training Set
Test Set (TS)

Protocols
• Rated test-items
• All unrated items: compute a score for every
item not rated by the user (also items not
appearing in the user test set)

Accuracy metrics for rating prediction
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟
𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟

MAE and RMSE drawback
• Not very suitable for top-N recommendation
– Errors in the highest part of the recommendation
list are considered in the same way as the ones in
the lowest part

Accuracy metrics for top-N
recommendation
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @ 𝑁
𝑃𝑢@𝑁 =
|𝐿 𝑢 𝑁 ∩ 𝑇𝑆 𝑢
+
|
𝑁
𝑅𝑒𝑐𝑎𝑙𝑙 @ 𝑁
𝑅 𝑢@𝑁 =
|𝐿 𝑢 𝑁 ∩ 𝑇𝑆 𝑢
+|
|𝑇𝑆 𝑢
+
|
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝐺𝑎𝑖𝑛 @ 𝑁
𝐿 𝑢 𝑁 is the recommendation list
up to the N-th element
𝑇𝑆 𝑢
+ is the set of relevant test
items for 𝑢
𝐼𝐷𝐶𝐺@𝑁 indicates the score
Obtained by an ideal ranking of 𝐿 𝑢 𝑁

Is all about precision?
• Diversity
– Avoid to recommend only items in a small subset
of the catalog
– Suggest diverse items in the recommendation list
• Novelty
– Recommend items in the long tail
• Serendipity
– Suggest unexpected but interesting items

Novelty
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐵𝑎𝑠𝑒𝑑 𝑁𝑜𝑣𝑒𝑙𝑡𝑦

RECOMMENDER SYSTEMS AND
LINKED OPEN DATA

Content-Based Recommender Systems
P. Lops, M. de Gemmis, G. Semeraro. Content-based recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach, B. Shapira,
editors, Recommender Systems Hankbook: A complete Guide for Research Scientists& Practitioners

Need of domain knowledge!
We need rich descriptionsof the items!
No suggestion is availableif the analyzed content does not contain enough
information to discriminateitems the user might like from items the user
might not like.*
(*) P. Lops,M. de Gemmis,G.Semeraro.Content-basedRecommenderSystems:State of the ArtandTrends.In:P.Kantor,F.Ricci,L. Rokach andB. Shapira,
editors,RecommenderSystemsHandbook:A CompleteGuide forResearchScientists&Practitioners
The quality of CB recommendations are correlated with the quality of the
features that are explicitly associated with the items.
Limited Content Analysis

Traditional Content-based RecSys
• Base on keyword/attribute -based item
representations
• Rely on the quality of the content-analyzer to
extract expressive item features
• Lack of knowledge about the items

Semantic-aware approaches
Traditional Ontological/Semantic
Recommender Systems
make use of limited
domain
ontologies;

What about Linked Data?
Use Linked Datato mitigate
the limitedcontent analysis
issue
• Plenty of structureddata
available
• No Content Analyzer
required
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/

Why RS + LOD
• Multi-Domain knowledge

Why RS + LOD
• Standardized access to data
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?actor WHERE {
dbpedia:Pulp_Fiction dbo:starring ?actor .
}
PREFIX yago: <http://yago-knowledge.org/resource/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
CONSTRUCT{
?book ?p ?o .
?book yago:linksTo ?yagolink .
}
WHERE{
SERVICE <http://live.dbpedia.org/sparql> {
?book rdf:type dbpedia-owl:Book .
?book ?p ?o .
?book owl:sameAs ?yago .
FILTER(regex(str(?yago),"http://yago-knowledge.org/resource/"))
.
}
SERVICE <http://lod2.openlinksw.com/sparql> {
?yago yago:linksTo ?yagolink .
}
}

Why RS + LOD
• Semantic Analysis

Item Linker
• Direct Item Linking
• Item Description Linking

Direct item Linking
dbpedia:I_Am_Legend_(film)

Direct item Linking
dbpedia:Troy_(film)
dbpedia:Troy

Direct item Linking
dbpedia:Troy_(film)
dbpedia:Scarface_(1983_film)
dbpedia:Scarface:_The_World_Is_Yours

Direct Item Linking
dbpedia:The_Da_Vinci_Code
dbpedia:Divine_Comedy

Direct Item Linking

Direct Item Linking
dbpedia:Divine_Comedy
???

Direct Item Linking
• The easy way
SELECT DISTINCT ?uri, ?title WHERE {
?uri rdf:type dbpedia-owl:Film.
?uri rdfs:label ?title.
FILTER langMatches(lang(?title), "EN") .
FILTER regex(?title, "matrix", "i")
}

Direct item Linking
• Other approaches
– DBpedia Lookup
https://github.com/dbpedia/lookup
– Silk Framework
http://silk-framework.com/

Item Graph Analyzer
• Build your own knowledge graph
– Select relevant properties. Possible solutions:
• Ontological properties
• Categorical properties
• Frequent properties
– Explore the graph up to a limited depth

Which LOD RSs?
• Content-based
– Heuristic-based
– Model based
• Hybrid
• Knowledge-based

Linked Data as a structured
information source for item descriptions
Rich item descriptions

Different item features
representations
• Direct properties
• Property paths
• Node paths
• Neighborhoods
• …

Datasets
Subset of Movielens mapped to DBpedia
Subset of Last.fm mapped to DBpedia
Subset of The Library Thing mapped to DBpedia
Mappings
http://sisinflab.poliba.it/semanticweb/lod/recsys/datasets/

Vector Space Model for LOD
Righteous Kill
starring
director
subject/broader
genre
Heat
RobertDeNiro
JohnAvnet
Serialkillerfilms
Drama
AlPacino
BrianDennehy
Heistfilms
Crimefilms
starring
RobertDeNiro
AlPacino
BrianDennehy
Righteous Kill
Heat
… …

Righteous Kill
STARRING
Al Pacino
(v1)
Robert
De Niro
(v2)
Brian
Dennehy
(v3)
Righteous
Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
𝑤 𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜

Righteous Kill
STARRING
Al Pacino
(v1)
Robert
De Niro
(v2)
Brian
Dennehy
(v3)
Righteous
Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
𝑤 𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜
𝑡𝑓 ∈ {0,1}

+
+
+
… =
𝒔𝒊𝒎 𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋) =
𝒘 𝒗 𝟏,𝒙𝒊
∗ 𝒘 𝒗 𝟏,𝒙𝒋
+ 𝒘 𝒗 𝟐,𝒙𝒊
∗ 𝒘 𝒗 𝟐,𝒙𝒋
+ 𝒘 𝒗 𝟑,𝒙𝒊
∗ 𝒘 𝒗 𝟑,𝒙𝒋
𝒘 𝒗 𝟏,𝒙𝒊
𝟐 + 𝒘 𝒗 𝟐,𝒙𝒊
𝟐 + 𝒘 𝒗 𝟑,𝒙𝒊
𝟐 ∗ 𝒘 𝒗 𝟏,𝒙𝒋
𝟐 + 𝒘 𝒗 𝟐,𝒙𝒋
𝟐 + 𝒘 𝒗 𝟑,𝒙𝒋
𝟐
𝜶 𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈 ∗ 𝒔𝒊𝒎 𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋)
𝜶 𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓 ∗ 𝒔𝒊𝒎 𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓(𝒙𝒊, 𝒙𝒋)
𝜶 𝒔𝒖𝒃𝒋𝒆𝒄𝒕 ∗ 𝒔𝒊𝒎 𝒔𝒖𝒃𝒋𝒆𝒄𝒕(𝒙𝒊, 𝒙𝒋)
𝒔𝒊𝒎 (𝒙𝒊, 𝒙𝒋)

VSM Content-based Recommender
We predict the rating using a Nearest NeighborClassifier wherein the similarity
measure is a linear combination of localpropertysimilarities
TommasoDi Noia,Roberto Mirizzi,VitoClaudioOstuni,Davide Romito, Markus Zanker.LinkedOpenDatatosupportContent-basedRecommenderSystems.8th
International Conference on SemanticSystems(I-SEMANTICS) - 2012

Selected properties

heuristic-based → model-based

Property subset evaluation
The subject+broader
solution is better than only
subject or subject+more
broaders.
The best solution is
achieved with
subject+broader+
genres.
Too many broaders
introduce noise.
Rated test items protocol

Evaluation against other
content-based approaches

Evaluation against other approaches

Path-based features
Analysis of complex relations between the user preferences and the
target item
Vito ClaudioOstuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi. Top-N Recommendations from Implicit Feedback leveragingLinked Open Data.
7th Conference on Recommender Systems (RecSys ) – 2013

Data model
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
Implicit Feedback Matrix Knowledge Graph
^
S 

Data model
Implicit Feedback Matrix Knowledge Graph
^
S 
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1

Path-based features
Path: acyclic sequence of relations ( s , .. rl , .. rL )
Frequency of j-th path in the sub-graph
related to u and x
• The more the paths, the more the relevance of the item.
• Different paths have different meaning.
• Not all types of paths are relevant.
u3 s i2 p2 e1 p1 i1  (s, p2 , p1)

Problem formulation
Feature vector
Set of irrelevant items for u
Set of relevantitems for u
Training Set
Sample of irrelevant items for u
𝑋 𝑢
+ = 𝑥 ∈ 𝑋 𝑠Ƹ 𝑢 𝑥 = 1}
𝑋 𝑢
−
= 𝑥 ∈ 𝑋 𝑠Ƹ 𝑢 𝑥 = 0}
𝑋 𝑢
−∗ ⊆ 𝑋 𝑢
−
𝑤𝑢𝑥 ∈ ℝ 𝐷
TR = ‫ڂ‬ < 𝑤𝑢𝑥, 𝑠Ƹ 𝑢 𝑥 > 𝑥 ∈ (𝑋 𝑢
+
∪ 𝑋 𝑢
−∗
)}𝑢

u1
x1
u2
u3
x2
x3
e1
e3
e4
e2
e5
u4
x4
Path-based features
wu3x1
?

u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 1
x1
x2
x3
x4

u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2
x1
x2
x3
x4

u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2
path(2) (s, p2, p1) : 1
x1
x2
x3
x4

u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2
path(2) (s, p2, p1) : 2
x1
x2
x3
x4

u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2
path(2) (s, p2, p1) : 2
path(3) (s, p2, p3, p1) : 1
x1
x2
x3
x4

Path-based features
path(1) (s, s, s) : 2
path(2) (s, p2, p1) : 2
path(3) (s, p2, p3, p1) : 1
u1
u2
u3
e1
e3
e4
e2
e5
u4
x1
x2
x3
x4

Evaluation of different ranking
functions
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given 30 given 50 given All
recall@5
userprofile size
Movielens
BagBoo
GBRT
Sum

Evaluation of different ranking
functions
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given All
recall@5
userprofile size
Last.fm
BagBoo
GBRT
Sum

Comparative approaches
• BPRMF, Bayesian Personalized Ranking for Matrix Factorization
• BPRLin, Linear Model optimized for BPR (Hybrid alg.)
• SLIM, Sparse Linear Methods for Top-N Recommender Systems
• SMRMF, Soft Margin Ranking Matrix Factorization
MyMediaLite

Comparison with other
approaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given 30 given 50 given All
userprofile size
Movielens
SPrank
BPRMF
SLIM
BPRLin
SMRMF
precision@5

approaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given All
userprofile size
Last.fm
SPrank
BPRMF
SLIM
BPRLin
SMRMF
precision@5

Graph-based Item Representation
The Godfather
Mafia_films
Gangster_films
American
Gangster
Films_about_organized_crime
_in_the_United_States
Best_Picture_Academy
_Award_winners
Best_Thriller_Empire
_Award_winners
Films_shot_in_New_York_City
subject
subject
subject
subject
subject
subject
subject
Vito Claudio Ostuni, Tommaso Di Noia, Roberto Mirizzi, Eugenio Di Sciascio. A Linked Data Recommender Systemusing a Neighborhood-based
Graph Kernel. The15th International Conferenceon Electronic Commerceand Web Technologies –2014

The Godfather
Mafia_films Films_about_organized_crime
Gangster_films
American
Gangster
Films_about_organized_
crime_by_country
_Award_winners
_Award_winners
Awards_for_best_film
subject
subject
subject
broader
broader
broader
broader
broader
subject
subject
subject
subject

The Godfather
Gangster_films
American
Gangster
crime_by_country
_Award_winners
_Award_winners
subject
subject
subject
broader
broader
broader
broader
broader
broader
subject
subject
subject
subject

The Godfather
Gangster_films
American
Gangster
crime_by_country
_Award_winners
_Award_winners
subject
subject
subject
broader
broader
broader
broader
broader
broader
subject
subject
subject
subject
Exploit entities descriptions

h-hop Item Neighborhood Graph
The Godfather
Gangster_films
_Award_winners Awards_for_best_film
subject
subject
subject
broader
broader
broader

Kernel Methods
Work by embedding data in a vector space and looking for linear
patterns in such space
𝑥 → 𝜙(𝑥)
[Kernel Methods for General PatternAnalysis. Nello Cristianini . http://www.kernel-methods.net/tutorials/KMtalk.pdf]
𝜙(𝑥)
𝜙
𝑥Input space Feature space
We can work in the new space F by specifying an inner product
function between points in it
𝑘 𝑥𝑖, 𝑥𝑗 = < 𝜙(𝑥𝑖), 𝜙(𝑥𝑗)>

h-hop Item Entity-based
Neighborhood Graph Kernel
Explicit computation of the feature map
entity importance in the item neighborhood graph
𝑘 𝐺ℎ 𝑥𝑖, 𝑥𝑗 = 𝜙 𝐺ℎ 𝑥𝑖 , 𝜙 𝐺ℎ 𝑥𝑗
𝜙 𝐺ℎ 𝑥𝑖 = (𝑤𝑥𝑖,𝑒1
, 𝑤𝑥𝑖,𝑒2
, …, 𝑤𝑥𝑖,𝑒 𝑚
, … , 𝑤 𝑥𝑖,𝑒 𝑡
)

# edges involving 𝑒 𝑚 at l hops from 𝑥𝑖
a.k.a. frequency of the entity in the
item neighborhood graph
factor taking into account at which hop the entity appears
h-hop Item Entity-based
𝜙 𝐺ℎ 𝑥𝑖 = (𝑤𝑥𝑖,𝑒1
, 𝑤𝑥𝑖,𝑒2
, …, 𝑤𝑥𝑖,𝑒 𝑚
, … , 𝑤 𝑥𝑖,𝑒 𝑡
)

Weights computation example
i
e1
e2
p3
p2
e4
e5
p3
p3
h=2
𝑐 𝑃෠1 𝑥 𝑖 ,𝑒1
= 2
𝑐 𝑃෠1 𝑥 𝑖 ,𝑒2
= 1
𝑐 𝑃෠2 𝑥 𝑖 ,𝑒4
= 1
𝑐 𝑃෠2 𝑥 𝑖 ,𝑒5
= 2

Weights computation example
i
e1
e2
p3
p2
e4
e5
p3
p3
h=2
𝑐 𝑃෠1 𝑥 𝑖 ,𝑒1
= 2
𝑐 𝑃෠1 𝑥 𝑖 ,𝑒2
= 1
𝑐 𝑃෠2 𝑥 𝑖 ,𝑒4
= 1
𝑐 𝑃෠2 𝑥 𝑖 ,𝑒5
= 2
Informative entity about the item even if not directly related to it

Experimental Settings
• Trained a SVM Regression model for each user
• Accuracy Evaluation: Precision, Recall
• Novelty Evaluation: Entropy-based Novelty (All
Items protocol) [the lower the better]

Kernel calibration
impact of alpha params

Comparative approaches
•NB: 1-hop item neigh. + Naive Bayes classifier
•VSM: 1-hop item neigh. Vector Space Model (tf-idf) +
SVM regr
•WK: 2-hop item neigh. Walk-based kernel + SVM regr

approaches (i)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Prec@10 [20/80] Prec@10 [40/60] Prec@10 [80/20]
NK-bestPrec
NK-bestEntr
NB
VSM
WK

approaches (ii)
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
EBN@10 [20/80] EBN@10 [40/60] EBN@10 [80/20]
NK-bestPrec
NK-bestEntr
NB
VSM
WK

The FreeSound case study
Vito ClaudioOstuni, Sergio Oramas, Tommaso Di Noia, Xavier Serra, Eugenio Di Sciascio. A Semantic Hybrid Approach for Sound Recommendation. 24th
World Wide Web Conference - 2015

FreeSound Knowledge Graph
Item textual descriptions enrichment: Entity Linking tools can be used
to enrich item textual descriptions with LOD

# sequences and subsequences of nodes
from 𝑥𝑖 to em
Normalization factor
h-hop Item Node-Based
𝜙 𝐺ℎ 𝑥𝑖 = (𝑤𝑥𝑖,𝑝∗1
, …, 𝑤𝑥𝑖,𝑝∗ 𝑚
, … , 𝑤 𝑥𝑖,𝑝∗ 𝑡
)

Hybrid Recommendation via
Feature Combination
The hybridizations is based on the combination of different data
sources
Final approach: collaborative + LOD + textual description + tags
Users who rated the item
u1 u2 u3 …. entity1 entity2 …. keyw1 keyw2 … tag1 …
entities from the knowledge
graph (explicit feature mapping)
Keywords extracted from
the textual description
tags associated to the item
Item Feature Vector

• Feature combination hybrid approach
• adding collaborative features to item contentfeature vectors can improve
considerably recommendation accuracy
• Semantic Enrichment
• semantics can help in improving differentperformances beyond accuracy
such as novelty and catalog coverage
Hybrid approaches:
some lessons learnt

Select the domain(s) of your RS
SELECT count(?i) AS ?num ?c
WHERE {
?i a ?c .
FILTER(regex(?c, "^http://dbpedia.org/ontology")) .
}
ORDER BY DESC(?num)

Does the LOD dataset selection
matter?

A comparison between
DBpedia and Freebase
Accuracy Coverage Diversity Novelty
Freebase + + - -
DBpedia - - + +
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio. Content-based recommendationsvia DBpedia and Freebase: a casestudy
in the music domain. The14th International Semantic Web Conference - ISWC 2015

A comparison between
DBpedia and Freebase
Accuracy Coverage Diversity Novelty
1-hop - - - +
2-hop + + + -
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio. Content-based recommendationsvia DBpedia and Freebase: a casestudy
in the music domain. The14th International Semantic Web Conference - ISWC 2015

Conclusions
• Linked Open Data to enrich the contentdescriptionsof
item
• Exploit differentcharacteristcsof the semantic network
to represent/learnfeatures
• Improved accuracy
• Improved novelty
• Improved Aggregate Diversity
• Entity linking for a better expoitationof text-based
data
• Select the right approach, dataset,set of properties to
build your RS

Open issues
• Generalize to graph pattern extraction to
represent features
• Automatically select the triples related to the
domain of interest
• Automatically select meaningful properties to
represent items
• Analysis with respect to «knowledge
coverage» of the dataset
– What is the best approach?

Not covered here
• User profile
• Preferences
• Context-aware
• Knowledge-based approaches
• …

Many thanks to the
RecSys crew @ SisInf Lab
Roberto Mirizzi
now at Yahoo! CA
Vito Claudio Ostuni
now at
Jessica Rosati
Phd Fellowship Awardee @
Paolo Tomeo
Jindřich Mynarz
Phuong Nguyen
Sergio Oramas
Aleksandra Karpus
Visiting Students and PostDoc

Recommender Systems and Linked Open Data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (13)

Similaire à Recommender Systems and Linked Open Data

Similaire à Recommender Systems and Linked Open Data (20)

Dernier

Dernier (20)

Recommender Systems and Linked Open Data