Formation of low mass protostars and their circumstellar disks
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
1. TRANSFERRING SEMANTIC
CATEGORIES WITH VERTEX
KERNELS:
RECOMMENDATIONS WITH
SemanticSVD++
MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
International Semantic Web Conference 2014
Riva del Garda, Trento, Italy
3. Latent Factor Models: Factor Consistency Problem
1 … f
1
2
3
1 2 3
1 8* 8* 3*
2 9* ? 1*
3 9* 8* 1*
1 2 3
1
…
f
≈
F = #factors (a priori)
Time
?
?
?
?
• Cannot ‘accurately’ align latent factors
• Cannot tell how users’ taste have evolved
2
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
4. Solution: Semantic Categories
i <URI> {<SKOS_CATEGORY>}
1 … c
1
2
3
1 2 3
1 8* 8* 3*
2 9* ? 1*
3 9* 8* 1*
≈
Preference for
category c at
time s
√" √"
Time
S3e manticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings. M Rowe. In the proceedings of
the International Conference on Web Intelligence 2014. Warsaw, Poland. (2014)
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
8. ?category). If the year of the movie item appears within a mapped category
(?category) then we identified the given semantic URI as denoting the item.
This disambiguation was needed here as multiple films can share the same ti-tle
Dataset: MovieTweetings + URI Alignment
(i.e. film remakes). This approach achieved coverage (i.e. proportion of items
mapped) of 69%: this reduced coverage is explained by the recency of the movies
being reviewed and "the I rated Aliens title/lack tt0133093/ of coverage 8/10 http://#IMDb"this www.imdb.com/
!
on DBPedia at present. Table 1
presents the statistics of the dataset following URI alignment.
explicit factors. The web of linked data provided a resource
for such information, where movies appear within the linked
data cloud as Uniform Resource Identifiers (URIs) which, upon
dereferencing, return information about the movie: director,
year of release, actors, and the semantic categories in which
the film has been placed. For instance, for the movie ‘Alien’
released in 1979, which we shall now use as a running
example, the following categories are found:
<h t t p : / / dbpedia . org / r e sour c e / Al iens ( fi lm)>
https://github.com/sidooms/MovieTweetings
From this point on we use the following notations to aid comprehension: u, v
denote users, i, j denote SPARQL
items, r denotes a known rating value (where r [1, 10]),
r ˆdenotes a predicted Query for
Get Semantic
2 Candidate
rating value, Categories c denotes of
a semantic Disambiguate
based on
category that an item
has been mapped to, URIs and from
cats(i) is candidate
a each
convenience Movie’s function Year
that returns the set
of semantic categories Movie’s of title
item i.
dcterms : s u b j e c t c a t egory : Al ien ( f ranchi se ) f i l m s ;
dcterms : s u b j e c t c a t egory :1986 hor r o r f i lms .
In this work we use DBPedia URIs, given their relation
to semantic (SKOS) categories. In order to provide such
information, however, we require a link between a given item
within one of our recommendation datasets and the URI that
denotes that movie item, prompting the question: How can
items be aligned to their semantic web URIs? Our method for
semantic URI alignment functioned as follows: first, we used
who: and and that in the work The information evolved limited following the large approach the ratings items obscure Table 1. Statistics of the MovieTweetings dataset with the reduction from the original
dataset shown in parentheses.
7
#Users #Items #Ratings Ratings Period
14,749 (-22.5%) 7,913 (-30.8%) 86,779 (-25.9%) [28-03-2013,23-09-2013]
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
{(ItemID,<URI>)}
Training (first 80%), Validation (next 10%), Testing (final 10%)
11. Now, let C be the set of categories that a given user has rated previously,
mapping function :
Definition 2 (Vertex Kernel). Given graph vertices u and v from a linked
data graph G0, we define a vertex kernel (k) as a surjective function that maps
the product of two vertices’ attribute vectors into a real valued space, where (u)
is a convenience function that returns kernel-specific attributes to be used by the
function (i.e. an n-dimensional attribute vector of node u: (u) 2 Rn). Hence:
and D be the set of categories that a given item has been mapped to, then we
define the Category Transfer function as follows:
k : V ⇥ V ! R (2)
k((u), (v))7−! x Triple-Object Vectors of
(3)
Categories
Given this formulation, we can vary the kernel function (k(., .)) to measure
the similarity between arbitrary categories based on the topology k(φ(c2), φ(of c4))the !
linked
data graph that surrounds them. All the kernels considered in c2this ! paper c4!
function
over two nodes’ feature vectors. Therefore, to derive the feature vector for a given
category node (c), we include information about the objects that c is linked to
within the linked data graph. Let c ?p ?o define a triple where c appears
within the subject position. We can then populate a vector (x) based on the
object concepts that c links to over 1-hop: 1 2 Rn - where n denotes the
dimensionality of the vector space. This can also be extended to n hops away
from c by traversing edges away from c and collecting the objects within the
traversed triples. Each element in the vector is weighted by the out-degree of
Category Transfer Function
f(C,D) = {argmax
c2C
k
!
c, d
: d 2 D} (1)
The codomain of the Category Transfer function is therefore a subset of the
set of categories that the user has rated beforehand (i.e. C0 ⇢ C). In the above
dbpedia:c2!
dbpedia:c4!
dbpedia:_1!
dbpedia:_2!
dbpedia:_3!
…!
dbpedia:_n!
0
0.5
0.5
…
0
0
0
0.5
…
0.5
Vertex Kernel
Function
Rated Categories
Categories of the Item
10 Vary k(.,.): Cosine, Dice, Squared Euclidean, JS-Divergence
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
12. User Profiling with Semantic Categories
Split user’s
training ratings
into 5-stages
Derive the user’s
average rating
per semantic
category
Calculate the
probability of
the user rating
the category
highly
For each stage…
s
Pu
and how their tastes have evolved, at this
time. From this point onwards we reserve
characters for set notations, as follows:
users, and i, j denote items.
known rating value (where r 2 [1, 5] or
ˆr denotes a predicted rating value.
provided as quadruples of the form
where t denotes the time of the rating,
segmented into training (Dtrain), validation
test (Dtest) sets by the above mentioned
semantic category that an item has been
cats(i) is a convenience function that
of semantic categories of item i.
Profiles
profiles describe the preferences that a user
time for given semantic categories.
understanding how a profile at one point
profile at an earlier point in time,
taste evolution has taken place. In
McAuley and Leskovec [5] the assessment of
in the context 11
of review platforms (e.g.
Review) demonstrated the propensity
based on their own ‘personal clock’. This
to segment a user’s lifetime (i.e. time
From these definitions we then derived the discrete prob-ability
distribution of the user rating the category favourably
as follows, defining the set Cu,s
train as containing all unique
categories of items rated by u in stage s:
Pr(c|Du,s
train) =
avrating(Du,s,c
train) X
c02Cu,s
train
avrating(Du,s,c0
train )
(4)
When implementing this approach, we only consider the
categories that item URIs are directly mapped to; that is,
only those categories that are connected to the URI by the
dbterms:subject predicate. Prior work by Ostuni et al.
[8] performed a mapping where grandparent categories were
mapped to URIs, however we chose the parent categories in
this instance to open up the possibility of other mappings in
the future - i.e. via linked data node vertex kernels.
B. User Taste Evolution: From Prior Taste Profiles
1 2
5
We now turn to looking at the evolution of users’ tastes
…
over time in order to understand how their preferences change.
Given our use of probability distributions to model the lifecycle
stage specific taste profile of each user, we can apply infor-mation
Time
theoretic measures based on information entropy. One
such measure is conditional entropy, it enables one to assess
the information needed to describe the taste profile of a user
at one time step (Q) using his taste profile from the previous
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
13. Taste Evolution over Rated Categories
0.275 0.280 0.285 0.290
Lifecycle Stages
Conditional Entropy
●
●
●
●
1 2 3 4 5
Users diverge from prior tastes
(increase)
0.112 0.114 0.116
Lifecycle Stages
Transfer Entropy
● ●
●
●
1 2 3 4 5
Users are not influenced by global
tastes (increase)
12
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
14. stages (e.g.
95% con-fidence
Putting it all together: SemanticSVD++!
stage s.
describes the
current stage
previous stage
categories at the
entropy of
(6)
calculate
conditional prob-ability
variables
(7)
Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.
H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence
interval for the derived means.
Modified version of SVD++ with:
• User taste evolution captured in semantic category biases
• Semantic personalisation component
named SemanticSV D++, an extension of Koren et al.’s ear-lier
SV D++ model [2]. The predictive function of the model
is shown in full in Eq. 8, we now explain each component in
greater detail.
ˆrui =
zStatic}|Biases{
μ + bi + bu +
Category Biases z }| {
↵ibi,cats(i) + ↵ubu,cats(i)
+
Personalisation Component z }| {
q|i
pu + |R(u)|−12
X
j2R(u)
yj
+ |cats(R(u))|−12
X
c2cats(R(u))
zc
!
(8)
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
A. Static Biases
13
15. SV D++ model [2]. The predictive function of the model
shown in full in Eq. 8, we now explain each component in
greater detail.
such signals on a per-user basis by assessing the change in
transfer entropy for each user over time and modelling this as a
global influence factor u. We derive this as follows, based on
measuring the proportional change in transfer entropy starting
from lifecycle period k that produced a monotonic increase or
decrease in transfer entropy:
Transferring Categories using Vertex Kernels
ˆrui =
zStatic}|Biases{
μ + bi + bu +
Category Biases z }| {
↵ibi,cats(i) + ↵ubu,cats(i)
General Category
Biases
+
u =
1
4 − k
X4
Ts+1|s
Q!P − Ts|s−1
Q!P
z i
|Personalisation }| Component Ts|s−1
{
z Prior s=Rated k
}| Categories Q!P
{
qpu + |R(u)|−12X
By combining the average change rate (uc
|C cats(i)|
X
highly rating a given category c2{cats(c with i)C}
the global influence factor
(u), we then derived j2R(the conditional u)
probability of a user
rating a given category highly as follows, where Pu
yj
bu,cats(i) =
⇣
!k
⌘
1
+
⇣
1 − !k
⌘
) of the user
Transferred Categories z }| {
Vertex Kernel (k)
can be varied
+ |cats(R(u))|−12
X
Pr(+|c, u)
1
|fk(C, cats(i)/C)|
X
the taste profile of the user observed c2fk(for C,D)
the final lifecycle
stage (5):
c2cats(R(u))
5 denotes
Pr(+|c, u)
zc
!
User Biases to
Categories
(8)
Static Biases
The static 14
biases include the general bias of the given
dataset (μ), which is the mean rating score across all ratings
within the training segment; the item bias (bi), and the user bias
(13)
(16)
Here we have !k-weighted the influence of the transferred categories on the
bias in order Pr(+|assess the e↵ects of the transferred categories on recommendation
accuracy. In essence, c, u) !=
k forms one of our hyperparameters that we optimise
when tuning the model over the validation set for a given vertex kernel (k). As
!k 2 [0, 1] we can assess its e↵ect: a larger !k places more emphasis on known
information, while a lower !k places more emphasis on transferred categories by
the given kernel (k). As the category biases are, in essence, static features we
included two weights, one for each category bias, defined as ↵i and ↵u for the
item biases to categories and the user biases to categories respectively - these
weights are then learnt during the training phase of inducing the model.
Prior Rating z }| {
Pu
5 (c) +
Change Rate z }| {
!u
c Pu
5 (c) +
Glzobal}I|nflue{nce
uQ5(c) (14)
hyperparameter D. Model To learn (item and vectors) function, fitting using min
b⇤,↵⇤,p⇤,To learn (SGD) the order through Average change in Transfer
Entropy of the User
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
β controls the
transfer of
category ratings
16. Experiments
¨ Aim: Predict users’ ratings of items
¤ Minimise the Root Mean Square Error (RMSE)
1. How does a semantic model perform
against existing MF-based approaches?
2. Does transferring categories actually
reduce error?
15
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
17. Experimental Setup
¨ Tested four models (trained using Stochastic Gradient
Descent)
¤ SVD and SVD++ (baselines)
¤ SB-SVD++: SVD++ with Semantic Category Biases
¤ S-SVD++ (SB-SVD++ with personalisation component)
¨ Tuned hyperparameters over the validation split:
¤ All models: Learning rate, regularisation weight, #factors
¤ Semantic Models: transfer parameter β
¨ Model testing:
¤ Trained models using both training+validation splits
¤ Applied to held-out final 10% of reviews
¨ Evaluation measure: Root Mean Square Error
16
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
18. that the user has previously rated, rather than transferring in ratings to cover
the unreviewed categories. We find varying levels across the other kernels where,
aside from the JS-Divergence kernel, the optimised k places more emphasis on
using rated semantic categories that the item is aligned to.
Results: Ratings Prediction Error
Table 4. Root Mean Square Error (RMSE) results with each model’s best kernel is
highlighted in bold with the p-value of the Mann-Whitney with the baseline marked.
Model Kernel Tuned Parameters RMSE
SVD - = 0.001, ⌘ = 0.1, f = 50 1.786
SVD++ - = 0.01, ⌘ = 0.05, f = 100 1.591
SBSVD++ - = 105, ⌘ = 0.05, f = 100 1.590*
Cosine = 105, ⌘ = 0.05, f = 20, $k = 0.9 1.588***
Dice = 0.001, ⌘ = 0.05, f = 20, $k = 0.7 1.589**
Squared-Euclidean = 105, ⌘ = 0.05, f = 20, $k = 0.6 1.589**
JS-Divergence = 0.01, ⌘ = 0.05, f = 50, $k = 0.3 1.590*
SSVD++ - = 0.001, ⌘ = 0.05, f = 20 1.590*
Cosine = 0.01, ⌘ = 0.05, f = 5, $k = 0.8 1.588***
Dice = 0.001, ⌘ = 0.05, f = 20, $k = 0.9 1.590*
Squared-Euclidean = 0.05, ⌘ = 0.05, f = 5, $k = 0.7 1.590*
JS-Divergence = 104, ⌘ = 0.05, f = 10, $k = 0.8 1.589**
Significance codes: p-value 0.001 *** 0.01 ** 0.05 * 0.1 .
Tuned β gives preference of known rated categories over transferred ones
7 N.b. all tested models significantly outperformed SV D at p 0.001, so we do not
report the di↵erent p-values here.
17
Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++
19. Conclusions
¨ Users’ taste evolution can be tracked with semantic
categories
¨ Vertex kernels overcome the cold-start categories
problem
¨ Significant reduction in error:
¤ Over baselines + When transferring semantic
categories
¨ Future Work:
1. Vertex kernels based on linked data graph traversals
2. Ranked-loss objectives (i.e. top-k models)