call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation
1. Exploiting distributional semantics for
Content-Based and Context-Aware
Recommendation
PhD in Artificial Intelligence
Victor Codina
Advisor: Luigi Ceccaroni
Universitat Politècnica de Catalunya
June, 2014
6. Main families of recommendation models
6
Collaborative
Filtering (CF)
Content-Based
(CB) Filtering
Context-aware
Recommendation
(CARS)
Item metadata
Ratings
Context
Target
user
Target
item
Predicted rating
LIMITATION:
Low accuracy in
data-sparsity
scenarios
7. Exploitation of explicit semantic relationships
to mitigate the data-sparsity problem
Existing solution: use the knowledge
contained in domain ontologies
77
Semantically-Enhanced
CB Filtering
Semantically-Enhanced
CARS
Item ontology
attribute
similarities
Context ontology
condition
similarities
castle monastery
Historic building
is-a
sunny cloudy
Weather
is-a
8. Building and mantaining ontologies is expensive
Ontologies are bounded by fixed representations
They may not suit the data
Limitations of domain ontologies
8
rating dataontology
≠
domain expert
9. Similarities automatically derived from the data itself
Advantages:
Collecting rating data is cheaper than building ontologies
Not bounded by a fixed knowledge representations
Fine-grained semantic similarities can be identified
Key idea: exploit distributional semantics
derived from rating data
9
rating data
semantic similarities
10. Question 1: Is it possible to enhance content-based
recommendation by exploiting the distributional
semantics of item attributes?
Question 2: Is it possible to enhance contextual
recommendation by exploiting the distributional
semantics of contextual conditions?
Research questions
10
13. The meaning of a concept is captured by its usage
Distributional Hypothesis:
“concepts that share similar usages share similar meaning”
In Linguistics usages are regions of text:
• document
• paragraph
• sentence
Distributional hypothesis
13
15. Cosine similarity is the most popular measure
good accuracy in high-dimensional vector spaces
Advantage: it can be used in combination with
dimensionality reduction techniques (SVD)
Distributional similarity measure
15
Glass
Wine
Spoon
18. IDEA: “show me more of the same I’ve liked”
Content-based Recommendation
18
user profile
Profile
Learner
Profile
Matching
target user’s ratings
item metadata
target item profile
predicted
rating
19. Lack of semantics exploitation
Syntactically different attribute pairs are not considered
Hypothesis: profile matching can be enhanced by
exploiting similarities between attributes
Traditional item-to-user profile matching
19
Item Profile
User profile
0.2 1 0.5 0 1
0 0.7 0 1 0
a1 a2 a3 a4 a5
a1 a2 a3 a4 a5
score = 1 x 0.7
21. Assumption: two attributes are similar if several
users are interested in them similarly
Attribute User1 User2 User3 User4 User5 User6 User7
action 1 -0.7 0 0.9 0.1 -1 0
Bruce
Willis
0.7 -0.8 0.5 0.8 0.4 -0.2 0
comedy -0.5 0.7 0.2 -1 0.9 0.8 0.5
Distributional semantics of item’s
attributes derived from rating data
21
User6’s degree of interest in action movies
(“-1” = strong dislike, “1” = strong like)
22. Rating data set statistics before and after pruning:
Evaluation using MovieLens data set
22
Original Pruned
Users 2.113 2.113
Movies 10.197 1.646
Attributes 6 4
Attributes
values
13.367 3.105
Ratings per user 404 235
Sparsity 96% 86%
23. Best-pairs Vs. All-pairs
23
% = Improvement with respect to the traditional CB profile matching
Best-pairs All-pairs (the higher, the better)
Rating prediction Ranking prediction
24. Distributional Vs. Ontology semantics
24
Ranking prediction
% = Improvement with respect to the traditional CB profile matching
(the higher, the better)
25. SCB Vs. State of the art
25
SCB (proposed method) SVD++ BPR-MF
Rating prediction Ranking prediction
% = Improvement with respect to the traditional CB profile matching
29. 29
Context-aware recommendation
Context as additional dimension for estimation
Three main context-aware recommender families
target
context
predicted
ratingPrediction
model
in-context
ratings
target
Item
target
user
Pre-filtering Post-filteringContextual modeling
30. Main limitation: its lack of flexibility
Only ratings acquired in exactly the same context are used
Hypothesis: ratings filtering can be enhanced by
exploiting semantic similarities between contexts
Traditional contextual pre-filtering
30
local
ratings
in-context
ratings
Ratings
filtering
Prediction
model
target context
predicted
rating
31. Key idea: reuse ratings acquired in similar contexts
Semantic contextual pre-filtering
31
local
ratings
Ratings
filtering
Prediction
model
≈
≠semantic
similarities
in-context
ratings
target context
global threshold
predicted
rating
32. Distributional semantics of contextual
conditions derived from rating data
32
Assumption: two contexts are similar if their
composing conditions influence ratings similarly
Condition User1 User2 User3 User4 User5 User6 User7
1 -0.7 0 0.9 0.1 -0.6 0
0.7 -0.8 0.5 0.8 0.4 -0.2 0
-0.5 0.7 0.2 -1 0.9 0.8 0.5
Influence of family context in User6’s ratings
(“<0” = negative, “0” = neutral, “>0” = positive)
33. Six in-context rating data sets on diverse domains:
Evaluation data sets
UMAP – June 2013, Rome, Italy 33
Datasets Ratings Conditions
Context
granularity
Music 4013 26 1
Tourism 1358 57 3
Adom 1464 14 3
Comoda 2296 49 12
Movie 2190 29 2
Library 609K 149 4
34. Semantic Vs. traditional pre-filtering
34
% = MAE reduction with respect to a context-free MF model
(the higher, the better)
Semantic Traditional
35. SPF Vs. State of the art
35
% = MAE reduction with respect to the context-free MF model
(the higher, the better)
SPF (proposed method) UI-Splitting CAMF
37. Method for computing the
distributional semantics of item’s
attributes
Two strategies for exploiting the
semantic similarities during profile
matching
Main contributions (II)
37
Semantic Content-Based filtering (SCB)
38. Better accuracy than state of the art in new user scenarios
Main contributions (III)
38
Semantic Content-Based filtering (SCB)
SCB (proposed method)
38
Ranking predictionRating prediction
39. Method for computing the
distributional semantics of
contextual conditions
Novel semantic pre-filtering
method that reuses ratings in
semantically similar contexts
Main contributions (IV)
39
Semantic Contextual Pre-filtering (SPF)
40. Better accuracy than state of the art
Main contributions (V)
40
Semantic Contextual Pre-filtering (SPF)
SPF
41. Question 1?
YES. It is possible to enhance content-based
recommendation by exploiting the distributional
semantics of item’s attributes
Question 2?
YES. It is possible to enhance context-aware
recommendation by exploiting the distributional
semantics of contextual conditions
Conclusions
41
42. Conference papers:
CCIA 2010: Codina, V. & Ceccaroni, L. Taking advantage of semantics…
DCAI 2010: Codina, V., & Ceccaroni, L. A Recommendation System for the…
CCIA 2011: Codina, V., & Ceccaroni, L. Extending Recommendation Systems with…
CCIA 2012: Codina, V., & Ceccaroni, L. Semantically-Enhanced Recommenders
CARR 2013: Codina et al. Semantically-enhanced pre-filtering for…
UMAP 2013: Codina et al. Exploiting the Semantic Similarity of Contextual…
RecSys 2013: Codina et al. Local Context Modeling with Semantic Pre-filtering
Journal paper:
UMUAI (User Modeling and User-Adapted Interaction journal): Codina et al.
Distributional Semantic Pre-filtering in Context-Aware Recommender Systems.
2012 Impact Factor: 1.600 (current status: accepted)
Publications related to the thesis
42
Notes de l'éditeur
Today I’m going to present the main contributions of my research in the field of the RSs
This work has been carried out in the UPC, with the support of the KEMLG research group and it has been supervised by the Dr. LUIGI Ceccaroni
We are living in an era of information and choice overload having access to an overwhelming number of alternatives for almost every type of product or service we are interested in.
Although having such a variety of options is usually seen as something beneficial, it also has the negative effect that makes harder the decision-making process, leading us to make poor decisions when we don’t have the necessary knowledge.
A natural way for solving this information overload problem is to rely on the recommendations of other people, and this simple observation was what motivated the development of RSs. Therefore, the goal of RSs is to help users to find the right items for them through recommendations adapted to their preferences.
Here you can see an example of personalized movie recommendations provided by the popular movie rental service Netflix
Nowadays, the success of many popular sites in a large variety of domains strongly depend on the RSs. The Amazon, Ebay, Netflix, Spotify, Yahoo News, LinkedIn are some popular examples
They use RS to add value to their information services, improving the user’s experience and as a consequence their business.
Recommender systems are composed of three main components: the knowledge base, where it is stored information about the items to recommend and historical user data, that is, previous user-item interactions that show what users liked or disliked in the past; the recommendation engine, where one or several recommendation models exploiting the knowledge base are used to make recommendations; and finally the user interface component which is responsible for presenting the recommendations in an proper way and also to collect new feedback about the recommended items.
my thesis has focused on improving the accuracy of existing recommendation models.
The recommendation task is commonly formulated as a rating prediction problem, that is, the problem of estimating how much a target user will like or dislike a certain candidate item.
Depending on the type of information exploited, recommendation models are commonly classified into three main families: CF approaches, which make predictions to a user based on the ratings of others, so they only require rating data; CB approaches, whose predictions are based on the metadata of the items the target user rated in the past and the candidate ones, and finally, the context-aware approaches, which in addition to the ratings also incorporate contextual information into their processes.
A common limitation that share the three recommendation approaches is that they perform poorly (in terms of accuracy) in data-sparsity scenarios, and although it is a well-known limitation in the research community, still is an open and relevant issue.
A reason of this low accuracy of CB and CA approaches in data-sparsity scenarios is that their models lack semantic intelligence.
Therefore, several works have address this limitation by exploiting the explicit semantics relationships about items content and contextual information available in domain ontologies.
In CB approaches, these explicit similarities between item attributes is commonly used to infer new user’s interests. For example, a CB recommendation model exploiting this item ontology could infer that users that like castles also are interested in monstareies and viceversa, because these two concepts are hierarchically related
In CA approaches, the hierarchical relatiionships between contextual conditions are commonly used to make generalizations of the context when it is too fine grained to make meaningful contextual recommendation
However, using ontologies as a knowledge source has its limitations. On the one hand, the process of building and maintining expressive ontologies is expensive. This limits its use in many domains and the number of publicly available domain-specific ontologies is limited. Most of them consists of general taxonomies that are limited in terms of expressivenss and richness.
In addition to this, another major limitation of ontologies is the fact that they are predefined specifications of a domain based on the criteria of human experts. For this reason, it may happen that the ontology does’nt fit the data which is actually used for making recommendations, a therefore exploiting this knowledge is not really useful for improving the prediction accuracy.
In order to overcome this limitation of ontogy-based semantics, in this thesis, I have investigated the use of distributional semantics derived from rating data to improve recommendations.
Differently from similarities derived from ontologies, distributinal semantic similarities are automatically derived from the data itself, and consequently this semantics source does’nt suffer from the previously mentioned limitations of ontologies.
On the one hand, user data is cheaper and easier to obtain than ontologies.
They are not bounded to static knowledge represntations
Finallly, distributional semantic similarity measures can capture finer-grained similarities which might be only detected from the data
My research then has focused on investigating how distributional semantics derived from rating data can be exploited in existing CB and CA recommendation models in order to improve their accuracy.
These are the two research questions of this thesis:
To answer these questions I have implemented and empirically evaluated two recommendation models: (1) a novel content-based approach enhanced with distributional semantics of item’s attributes, and (2) a novel context-aware approach enhanced by using the distributional semantics of contextual
This is the outline I will follow during the
First, before presenting each of the two proposed approaches I’m going to introduce to you the concept of distributional semantics and its mathematical foundations, which come from Computional Linguistics
In the second part, I’ll present the novel content-based approach enhanced with distributional semantics and its evaluation results,
State of the art will be presented in each of the sections
And in the last part, I’ll talk about the novel contextual pre-filtering approach and also its performance results
In Distributional Semantics the meaning of a concept is captured by the usage or distributional properties of the concept, which are automatically derived from the corpus of data where the concept is used.
The fundamental idea behind this way to extract semantic similarities between domain concepts is the so-called distributional hypothesis: which claims that concepts repeatedly co-occurring in the same context or usage tend to be related.
Distributional semantics have been mainly studied in Linguistics, where usages or contexts are defined by specific regions of text that can have different granularities: for instance, the whole document, a paragraph or a sentence.
A common representation method to measure distributional similarities between words consists of employing a vector space representation of concept meaning, and then measure the similarity in terms of proximity in such vector space.
This matrix shows an example of such a representation where rows represent the semantic vectors of these words, and each of the elements (the columns) indicate if the concept was used or not in the linguistic context (that in this example are supposed to be defined as text sentences). Commonly these values are calculated by means of a weighting scheme of the occurrence frequency of the concept in the specific region of text.
In this example, we can see that the concept wine has a better overlap with glass than with spoon because they co-occurr more frequently.
Once computed the semantic vectors or co-occurrence matrix, then we are ready to calculate semantic similarities between words. To do so we need to employ a specific similarity measure.
In the literature there are several types of similarity measures that can be used for this purpose, such as set theory measures and probabilistic measures. However, for computing similarities in the vector space, the cosine similarity is one of the most commonly used because of its proved reliability, specially when dealing with high-dimensional vector spaces.
Here I’m showing in a 2D space the main idea of the cosine similarity, which is calculated as the cosine of the angle between the vectors. The smaller is the angle, the more similar are the semantic vectors. Therefore in this case the cosine similarity between glass and wine is larger than the one between wine and spoon.
Additionally, it has the advantage that can be used in combination with dimensionality reduction techniques like SVD. These techniques are useful when the dimensionality of the semantic vectors is too high and sparse, because they can produce a more compact and informative semantic representation. This usually improves the accuracy of the similarity assessments.
This is the outline I will follow during the
First, before presenting each of the two proposed approaches I’m going to introduce to you the concept of distributional semantics and its mathematical foundations, which come from Computional Linguistics
In the second part, I’ll present the novel content-based approach enhanced with distributional semantics and its evaluation results,
State of the art will be presented in each of the sections
And in the last part, I’ll talk about the novel contextual pre-filtering approach and also its performance results
The main assumption of CB recommendation approaches is that users tend to like items with similar attributes to those he or she already liked in the past.
As illustrated in this graphic, CB models first build a model of user’s interests in the same attribute space as items, and then use this user profile to recommend new items whose attributes match the user’s interests.
In domain where explicit ratings are available, it is common the use of linear CB models, in which user profiles are represented as weighted vectors, each value indicating a quantification of the degree of interest in a certain item attribute based on the ratings given to the items containing the attribute, and then the predictions are computed by directly comparing the user and item vector representations.
Commonly the item and user profile matching is computed by means of the dot product or the cosine similarity, which are methods that only rely on the “syntactic” evidence of attribute relatedness. That is, syntactically different attributes do not contribute to the similarity value.
Present example (0 means that the attribute do not appears in the profile) For example, using the dot product to compute the matching score between this user profile and item profile, only the weights of the attribute 2 would be aggregated.
Therefore, they have a lack of semantics intelligence in this sense, which limit the accuracy of the prediction, especially if the user profiles are based on few ratings and consequently there is little knowledge about user’s interest
My hypothesis was that traditional profile matching could be enhanced by exploiting the distributional similarities of the syntactically different item’s attributes in addition to the exact coincidences.
In particular, I proposed two profile matching strategies based on pairwise comparison that exploit the distributional semantic similarities between item’s attributes: a best-pairs and an all-pairs strategy.
The best-pairs strategy, aggregates in addition to the exact attribute matchings, the best-matching attribute pairs, so each attribute in the item profile with value different from 0 is compared with only 1 attribute in the user profile different from zero. In this example…
The alll-pairs strategy, as its name indicates agregates all the possible attribute pairs combinations appearing in both profiles. So in the same user and item profile comparison, the number of aggregated values is doubled.
In both strategies the aggregated attribute pairs are weighted according to their semantic similarity value, so that the weaker similarities contribute less to the predicted score.
I experimented with these two strategies because my hypothesis was that they might perform differently depending on the recommendation task. In particular the all-pairs strategy is supposed to perform better in ranking prediction, where what matters most is the order of the recommended items and not how similar the predicted and true ratings are. And in contrast, given that the best-pairs is more selective it should be more adequate for rating prediction, where the exact predicted score is relevant.
So far I have explained the methods for exploiting semantic similarities during the item and user profile matching. Now i’m going to talk about how we calculated these similarites based on the distributional semantics of item’ attributes derived from rating data.
The main assumption of the proposed method for computin such distributional similarities is that two attributes are semantically related if several users are interested in them in a similar way
Based on this assumption, to measure user-dependent distributional similarities, first we need to compute the user-dependant semantic vector where each element stores a user interest weight. That is the attribute’s semantic vectors are built with respect to the attribute-based user profiles generated by the CB profile learner.
In this example I show the semantic vectors of three movie attributes with respect to six users of the system. If we analyze the number of co-occurrences between pairs of attributes, it is easy to observe that between &lt;Bruce Willis, action&gt; pair that several users tend to be interested in them similarly, and in contrast, there is only 1 case between Bruce willis and comdey.
Finally, based on this semantic representation, we calculate the distributional similarity between two attributes by comparing their semantic vectors. We experimented with several measures but as expected, the Cosine similarity was the one performing better in general.
For the evaluation of SCB we used an extension of the popular movie rating data set collected by the MovieLens recommender, which contains over 10 million ratings from 2K users on 10K movies. We used this data set because it included a large variety of movie attributes such as genres, directors, actors, countries of origin, filming locations and user tags; some of them extracted from IMDb.
In order to avoid the introduction of non-informative movie metadata into the CB models which could degrade predictions we discarded some of them, especially the least popular actors and user tags. We also removed all the movies with less than five tags as well as the ratings associated to them.
Here I illustrate the % improvement achieved when using the proposed pairwise strategies exploiting the user-based distributional similarities compared to the traditional profile matching strategy.
Optional. The baseline and the enhanced CB approaches employed the same user profile learning method. In these experiments we employed a sophisticated user-profile learning method based on the rating average.
MAE and RMSE are well-know metrics for measuring how accurate are the models predicting unknown ratings, and Recall and NDCG are metrics that measure the accuracy of the models making personalized rankings. All means that the results are averaged over all the users, and New averaged over the set of new users. In our experiments we considered as new users the 10% of users with the lowest number of ratings.
We can see that in the New user scenario is where both variants has a significantly different performance and particularly effecttive. On the one hand, best-pairs strategy is better than the all-pairs one for rating prediction, and in contrast, the all-pairs clearly outperforms the best-pairs in terms of ranking precision.
These results prove the hypothesis that the all-pairs strategy is more effective for ranking, given that for this task what matters most is the order of the items and not the closeness between the predicted and the true rating. And that the best-pais, which is more selective matching strategy is better for rating prediction.
Here I compare the ranking accuracy of the all-pairs strategy when exploiting different sources of semantics similarities: the blue bars correspond to the user-based distributional semantics; the yellow bars the distributional semantics derived from item-based co-occurrences; that is, in the item-based representation rating data is not considered only the item metada, and the red bars are similarities derived from an ontology. The ontology-based semantics were derived from the hierarchical relationships defined in the Amazon.com movie taxonomy.
As it can be observed, using distributional semantics the overall accuracy is better than when using ontology-based semantics, being the user-based slighty better than item-based. In the new user scenario results are quite different. In this case, the item-based ones are clearly the less effective, and the user-based and ontology-based have similar accuracy results.
Considering the accuracy in both sets of users, the results validate the hypothesis that user-based semantics, derived from rating data, can be more effective to improve prediction accuracy than the other types.
I this other slide I show the improvement achieved by using the proposed novel CB method (the orange bar) compared to two state-of-the-art CF approaches based on Matrix Factorization which is a popular CF method.
the yellow bar correspond to SVD++, a MF model which was part of the winning solution in the Netflix prize and therefore is especially effective for rating prediction, and the red bars correspond to BPR-MF, another MF model which is designed for recommending rankings, and therefore it is not able to make rating predictions.
Possibly you have noted that the gain achieved for rating prediction is much smalller than the gain achieved for ranking. This is because rating prediction the space for improvement is more limitaed as was demonstrated during the Netflix challenge, where they offered a 1M dollar prize for reducing by 10% the RMSE of their approach and 3 years of research were needed to achieve it.
If we look at the overall results, the all columns we can see that the CF approaches are clearly better: SVD is the best model for rating prediction and BPR-MF for ranking.
However, for new users the new CB method outperforms the best CF approach for each recommendation task. Differences are especially significant in terms MAE and NDCG. This proves that our method is an effective method for improving CB recommendation in general, and for improving state-of-the-art CF methods in data sparsity scenarios as the new user.
We can see that based on all the users, the CF approaches are the most accurate
This is the outline I will follow during the
First, before presenting each of the two proposed approaches I’m going to introduce to you the concept of distributional semantics and its mathematical foundations, which come from Computional Linguistics
In the second part, I’ll present the novel content-based approach enhanced with distributional semantics and its evaluation results,
State of the art will be presented in each of the sections
And in the last part, I’ll talk about the novel contextual pre-filtering approach and also its performance results
The main assumption of CARS is that items can be experienced differently by the users depending on the current contextual situation, and as a result, user evaluations or ratings can also be different.
A clear example where context matters is in the tourism domain, where the same recommendations to the same users can be considered as good or bad depending on the weather conditions.
For this reason, context-aware recommendation approaches incorporate contextual information into their processes. Typically, CARS extend existing CB and CF techniques with context-awareness, and depending on how they incorporate context into the recommendation process, three main familiies of context-aware approach can be identified: pre-filtering, post-filtering and contextual modeling.
Pre-filtering approaches exploit contextual information to discard the user’s ratings that are not relevant in the context in which the user is asking for a recommendation. Then, a context-free CB or CF approach is used to make recommendations based on the subset of relevant ratings.
On the contrary, post-filtering approaches use contextual information once recommendations are made by a context-free model to adjust them. For instance, by applying some kind of rescoring.
Finally, contextual modeling approaches incorporate context into the recommendation model, representing user’s interests and other model parameters as a function of context.
Because context-aware approaches require a large number of ratings of users for items in several context, they are more affected by the data-sparsity problem than the context-free ones. Contextual prefiltering is the approach that tipically suffers more from this limitation and for this reason my research has focus on this paradigm.
The traditional contextual pre-filtering, is known as the reduction-based approach, because for each target contextual situation it builds a strict local model, where only the ratings acquired in exactly the same situation to the target one are used for recommending.
The main limitation of this approach is its lack of flexibility because it uses always the maximum level of contextualization, and therefore it fails when the target situations are too specific and not relevant, or when there are not enough ratings in that situation for generating a robust local prediction model
With this example I’m showing how the traditional contextual pre-filtering works. Each of these circles represent the set of training ratings tagged with 3 syntactically different situations, s1, s2 and s3. Assuming that the target context is s3, then the method would discard all the ratings acquired in S2 and s1. Finally it builds a local prediction model based on the selected ratings.
My hypothesis is that is possible to overcome this lack of flexibility exploiting the semantic similarities between contexts during the rating pre-filtering process.
To validate this hypothesis, we proposed a novel pre-filtering approach that, in addition to the ratings acquired in the exactly the same context, it also reuses ratings acquired in contexts semantically similar to the target one.
Following with the same example, let’s assume now that the system knows that the target context sunny is semantically related to when users travel in family but not when users are sad. In this case the semantic pre-filtering would also reuse the ratings acquired in the famility context for building the local prediction model to make prediction in the target context sunny.
Our approach employs a global similarity threshold to select those situations that are similar enough to be considered as reusable: the larger the threshold, the sharper the contextualization, that is the more similar are the local models to the strict models generated by the traditional approach
So far I have assumed the existence of semantic similarities between contexts, and now I’m going to explain how we compute these similarities with respect to the rating data.
In particular our method computes distributional semantic similarities between contextual situations based on the assumption that two situations are similar if their composing conditions influence users’ ratings in a similar way.
For this reason, in this case the semantic vectors of contextual conditions contains estimates of its influence on the given ratings. We estimated this influence as the average deviation between the observed ratings when the condition holds, and a context-free rating estimated by using a baseline predictor. In this case, the average deviation be calculated either from the item perspective or the user perspective, and the depending on the rating data is more appropiate than the other.
here I’m showing an example using the user-based persepective. In this case this -1 is indicating that the family condition influences negatively the ratings of the user 6.
Once computed the semantic vectors of the conditions, we calculate their similarities by comparing the vectors using the cosine similarity. The more similar is the influence, the more similar are the conditions. In this example, we can see that famility and sunny are more similar than sunny and sad given that there are more users where the influence is similar and in constrast between sunny and sad only 1.
In the case that the situation is defined by several conditions, we compute first the semantic vector of the situation by averaging the vectors of its composing conditions, and then we compute the cosine similarity
For the evaluation we have considered 6 data sets of contextually tagged ratings on diverse domains and with different characteristics. Here I’m showing some of them: conditions refer to total number of conditions captured by the system, and context granularity is calculated as the average number of conditions per contextual situation. The larger number, the more specific are the contexts.
The Music data set contains ratings for music tracks collected by an in-car music recommender. The Tourism data set contains ratings for POIs in the region of South Tyrol. Adom, Comoda and Movie are all movie rating data sets and Library is about book ratings.
As you can see, Library is the biggest data set with more than 600k ratings and Comoda is the one with fine grained contextual situations.
Here I’m showing the MAE reduction with respect to a context-free MF when using the proposed semantic pre-filtering (the orage bars) and the traditional one. (the yellow ones). The larger is the percentage, the better is the rating prediction accuracy.
As you can see, in all the data sets the semantic prefiltering is clearly superior to the traditional one, proving the effectiveniss of our method to exploit distributional semantic similarities between contextual situations during prefiltering to improve accuracy.
The traditional prefiltering is even worse than the context-free model in some data sets. This poor performance in Tourism, Music and Comoda is due to the lack of flexibility of this approach, which allways builds a strict local model, and in some cases they are too specific that there is not enough training data to build robust local models and therefore their accuracy is worse than the global context-free model.
Here I’m showing the results of the proposed semantic pre-filtering (the orange bars) compared to two state-of-the-art context-aware approaches.
The blue bars correspond to another pre-filtering approach that differently from the reduction-based approach that build local models for each target context, it modifies the original rating matrix by splitting the rating vectors associated to the users and items into virtual vectors based on the contextual condition that influences the most the rating. And then builds a global model based on new rating matrix.
The red bars corresponds to CAMF, a contextual modeling approach that extends the standard MF model with additional parameters that model the influence of context with respect to the items or the users. So in this case the context is modeled as part of the MF model.
An advantage of pre-filtering approaches is that they can use any context-free recommendation technique to model the local models. However, to properly compare the performance of the pre-filtering approaches to CAMF, we use them in combination with the standard context-free MF. In other words, our method uses MF to build the local models, and the global model of the splitting approach.
As it can be observed, the three context-aware prediction models significantly outperform MF in all the data sets, confirming that contextual information is relevant for improving the rating predictions.
On the other hand, the new method is the most effective exploiting the context since it outperforms the other approaches in all the data sets, and the differences are specially large in the Tourism, Adom, Comoda and Movie data sets.
Building block: distributional semantics
Key idea: Content-based and context-aware recommendation can be enhanced by exploiting distributional semantics derived from rating data
User-based distributional semantics of attributes
Based on how users are interested in them
More effective than item and ontology-based
50% gain in ranking accuracy
7% gain in rating prediction
User-based distributional semantics of attributes
Based on how users are interested in them
More effective than item and ontology-based
50% gain in ranking accuracy
7% gain in rating prediction
Based on how conditions influence the users’ ratings
Based on how conditions influence the users’ ratings
Question 1: Is it possible to enhance CB recommendation by exploiting distributional semantic similarities between item attributes?
semantic similarities between attributes are useful to enhance the profile matching
Question 2: Is it possible to enhance contextual recommendation by exploiting distributional semantic similarities between contextual conditions?
Many results reported in this thesis have already been presented in several international conferences, some of the them of significant impact in the field of Recommender Systems, such as UMAP and Recsys conferences.
Additionaly the main results of this thesis have also been presented in a highly ranked journal related to the field.