Collaborative filtering

By-
Neha Kulkarni (5202)
ME Computer
Pune Institute of Computer Technology

 Recommender systems
 Types of recommender systems
 Content based filtering
 Collaborative filtering
 Hybrid systems
 Content boosted collaborative filtering
 Evaluation of the CBCF
 Advantages
 Conclusion

 Recommender system predict “rating” or “preference”
that a user given to an item.
 Recommendation done by two ways:
1. Content based filtering
2. Collaborative filtering

 Content based filtering select an item based on
correlation between the content of the items and user’s
preference.
 Keywords are used to describe the items and user
profile.

 Collaborative filtering based on collecting and
analyzing a large amount of information on user’s
behavior , activates or preference and predicting what
user’s will like based on similarity to other user’s.
 For measuring similarity many algorithm used:
1. K-nearest neighbor
2. Pearson correlation

 Collaborative filtering gives recommend items that are
relevant to the user
 Content based recommendation gives the user profile
content
 Because of this collaborative filtering is used mostly

1. Cold start : we must have enough data in the system
to find match
2. Sparsity : most of the user do not rate most of items
and hence the user-item rating matrix is “sparse”,
therefore the probability of finding a set of users with
significant similar rating is usually low.
3. First rater : can not recommend an item that has not
been previously rated.

 Hybrid approach uses content based prediction to
convert a sparse user rating matrix into a full use rating
matrix and then uses collaborative filtering to provide
recommendation.
 Ex: they use hybrid approach in domain of movie
recommendation

 In neighborhood-based algorithms, a subset of users are
chosen based on their similarity to the active user, and a
weighted combination of their ratings is used to
produce predictions for the active user.
 Steps:
 Weight all users with respect to similarity with the
active user.

 Select n users that have the highest similarity with the
active user.
 Compute a prediction from a weighted combination of
the selected neighbors’ ratings.

 1. Implementing collaborative and content-based
methods separately and combining their predictions
 2. Incorporating some content-based characteristics
into a collaborative approach
 3. Incorporating some collaborative characteristics
into a content-based approach
 4. Constructing a general unifying model that
incorporates both content-based and collaborative
characteristics.

 Netflix is a good example of hybrid system using content-
boosted collaborative filtering.
 Recommendations are made by comparing the watching and
searching habits of similar users(CF) and also by offering
movies that share characteristics with films that the user has
rated highly(Content-Based)

 Another good example of hybrid recommendation system
 Stores the click stream of the user and usage pattern of the
user and other users with similar preferences(CF) and also
by offering products that share characteristics with products
that the user has rated highly(Content-Based)

Use content-based predictor to enhance existing user data
and then provide personalized predictions using
collaborative filtering.
Input
Input
Content-
based
recommender
CF-based
recommender
Combiner Recommendations

 Create a pseudo-user rating for each user ‘u’ in the
database.
 ru,i – actual rating of the user ‘u’ for item ‘i’
 Cu,i – rating predicted by pure content-based system
The two parameters put together give the dense
pseudo-ratings matrix V .

 Similarity between active user ‘a’ and another
user ‘u’ is found out using Pearson’s
correlation coefficient.
 Instead of using original user votes, we
substitute the values provided by pseudo-user
ratings vector va and vu

 Inaccuracies in pseudo user-ratings vector often
yielded misleadingly high correlations between the
active user and other users.
 Hence to incorporate conﬁdence (or the lack thereof)
in our correlations, we weight them using the
Harmonic Mean weighting factor (HM weighting).

where :
ni- items rated by user i
Harmonic mean tends to bias the weight towards the lower of
the two values.
The choice of the threshold as 50 ratings was based on
10-fold cross-validation.

 To the harmonic mean weight, we add the significance
weighting factor to obtain hybrid correlation weight.
 If two users have rated less than 50 items, significance
weighting factor is n/50 or else if more than 50 items
are rated, then it is 1.

 To provide the pseudo-active user more importance than the
neighbours(increase confidence in the pure-content predictions
from the pseudo-active user) incorporate self-weighting factor
in the final prediction.
max- overall confidence on the content-based predictor

Where :
Pa,i : final CBCF prediction for user a and item i
Ca,i : pure content-based predictions for user a and item I
n : size of the neighbourhood
The denominator is a normalizing factor that ensures all
weights sum to 1.

 Mean Absolute Error (statistical accuracy) : average absolute
difference between predicted ratings and actual ratings
 ROC curve (decision support) :
sensitivity : probability that a good item is accepted by the
filter
specificity : probability that a bad item is rejected by the
filter

 Overcoming the first-rater problem
 Tackles sparsity
 Finding better neigbours
 Overcoming cold-start problem

 CBCF elegantly exploits content within a collaborative
framework.
 Overcomes problems faced by pure content or
collaborative systems.
 Incorporating content information into collaborative
framework can improve the recommender systems.

 Data mining-Concepts and Techniques : 3rd edition
 Mining the Web by Chakarabarti
 Web Data Mining, Springer
 “Content-Boosted Collaborative Filtering for
Improved Recommendations”, Prem Melville,
Raymond J. Mooney, Ramadass Nagarajan, AAAI-02
Proceedings, 2002

Collaborative filtering

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Collaborative filtering

Similaire à Collaborative filtering (20)

Dernier

Dernier (20)

Collaborative filtering