Collaborative filtering with CCAM

ICMLA'11, Honolulu, Hawaii 1

COLLABORATIVE
FILTERING WITH CCAM
Presenter: Meng-Lun Wu
Author: Meng-Lun Wu, Chia-Hui Chang and Rei-Zhe Liu
Date: 2011/12/21


Outline
• Introduction
• Related Work
• Preliminary
• Collaborative Filtering with CCAM
• Experiment
• Conclusion


Introduction (1/2)
• In any recommender system, the number of ratings already
obtained is usually very small compared to the number of
ratings that need to be predicted.

• A possible solution turns out to be dimensionality reduction
methods which can alleviate data sparsity.

• Typically, clustering is the simplest way that can be extended
over recommender systems to achieve a compact model and
avoid the sparsity problem.


Introduction (2/2)
• In the past years, co-clustering based on information theory has
attracted more and more attention.

• We have extended a co-clustering algorithm based on
information theory to augmented data matrix which called Co-
Clustering with Augmented data Matrix, CCAM.

• In this paper, we consider how to alleviate the sparsity problem
and achieve a precise prediction by Collaborative Filtering with
CCAM.


Related Work
• Information theoretical co-clustering
• Dhillon et al. (2003) developed from information theory and tried to
optimize the objective function based on the loss of mutual information
between clustered random variables.

• Matrix factorization co-clustering
• Chen et al. (2008) linearly combined user-based, item-based CF
method, and matrix factorization results in order to make prediction on
ratings which relied on ONMTF.

• Li et al. (2009) presented a novel cross-domain collaborative filtering
method which co-clusters movie information via ONMTF and
reconstructs knowledge for recommending books and movies.


Preliminary (1/2)
• Suppose that we are given a clicking information matrix R
which is composed of user set, U={u1, u2, …, unu} and a set of
ad, A={a1, a2, …, ana}.
• nu and na respectively represents the number of users and ads.

• For memory-based CF methods, before finding similar
neighbors, it is inevitable to encounter sparsity issues of
demanded data.
• In the research of Dhillon et al. (2003), they considered a co-clustering
algorithm which monotonically decreases the information loss of tabular data
to form a compact model.


Preliminary (2/2)
• Assume U and A are random variable sets with a joint probability
distribution p(U, A) and marginal distribution p(U) and p(A). The
mutual information I(U; A) is defined as

• Suppose there are G1 user clusters CU={cu(1), cu(2), …, cu(G1)} and, G2
ad clusters CA={ca(1), ca(2), …, ca(G2)}, in order to judge the quality of
a co-clustering, we define the loss in mutual information as

• PROPOSITION 1. There are also properties that are declared and
proven, they are


Co-Clustering with Augmented data
Matrix, CCAM (1/4)
• When the optimization problem of loss in mutual information is first
proposed by Dhillon et al. (2003), it was designed and applied for
single tabular data.
• However, in many cases besides the major data set, there exist related tables which
may provide some useful information.

• In this co-clustering approach, Co-Clustering with Augmented data
Matrix (CCAM), we will simultaneously modify the co-clusters of
multiple augmented data to reduce the information loss.

• The other two sets of components, feature set F={f1, f2, …, fn }, and
f
profile set P={p1, p2, …, pnp}, are extensive information for ads and
users and form the augmented matrices
• where nf and np denotes the number of features and profiles, respectively.


Matrix, CCAM (2/4)
• PROPOSITION 2. There are extensive properties recognized
when p(A, F) and p(U, P) were considered.

• which were also declared and proven.

• DEFINITION 1. An optimal co-cluster (CU, CA) we desire to
obtain would minimize


Matrix, CCAM (3/4)
•


Algorithm 1Co-Clustering with Augmented data Matrix algorithm


Collaborative filtering with CCAM
(1/5)
•


Collaborative filtering with CCAM
(2/5)
• DEFINITION 3. Since CCAM is designed on the base of KL-
divergence, the distance metrics would be in a similar format.
• Here we define the distance between each user and user cluster and each ad and
ad cluster.

• Note that the ad cluster prototype and user cluster prototype of
CCAM would be regarded as


Collaborative filtering with CCAM (3/5)
•


•


Data set
• The data set used in the experiments are obtained from a financial
social web-site, Ad$Mart, which ranged from 2009/09/01 to
2010/03/31.

• For each test user, 15 observed clicking rates (Given15) are provided
to find nearest neighbors and the remaining clicking rates are used for
evaluation.

• To ensure each test user would click at least 15 ads, users with more
than 20 clicked ads and ads with more than 10 clicked user-ad pairs
would be reserved.
• User-Ad: The pre-processing clicking data is provided by 1786 users and 520 ads. After
preprocessing, we make it a joint probability distribution over user and ad, and also reform it into a
clicking rate matrix scaled from 1-5.
• Ad-Feature: An advertisement feature data set compiling 37 statistics of 530 ads.
• User-Profile: A questionnaire data set provided by 520 users on 24 survey questions.


Evaluation methodology (1/2)
•


Evaluation methodology (2/2)
•


 and  tuning based on k-NN
•


G1 and G2 tuning based on K-Means
• We also have to determine what value of G1 would result in a
well-performed MAE.
• We simply make G2=10 as well as K1 = K2 = 5, and as a strategy to avoid too
many parameter tunings.
• On this issue, we will see the responding of k-Means with different G1
(7, 15, 30, 60) and reserve the best one in order to apply to the other
algorithms.


Parameter tuning with CCAM (1/2)
• In order to evaluate the result of co-clustering, we take
advantage of classification algorithm (Weka J48) on user data to
test the F-measure of 10-fold c.v., and similarly in ad aspect.
• We use the clustering result of the user data (user-ad matrix and user-profile
matrix) as the target labels for evaluation of user clustering, and is similar to
the ad data (ad-user matrix and ad-feature matrix).

• To examine the effectiveness of co-clustering, we reduce the columns of user-
ad matrix to a smaller user-ad cluster matrix. The reduced data is then inserted
into our user data for classification, so as the ad data.

Clustering result
User-
User data of user-ad and
ad cluster
user-profile


Parameter tuning with CCAM (2/2)

• We find that when G1=60, the best setting will be λ=0.2, φ=0.1.

• Therefore, we will then apply the result of the optimal parameters of
CCAM in the next section to compare with the other algorithms.
•


Results
• Table 3 compare the model-
based approaches.

• Table 4 compare the hybrid
models approaches with the
previous parameter settings.


Conclusion
• In this paper, we applied the rating framework of Chen’s to evaluate
the performance of hybrid CF with various model construction.

• In order to give a fair comparison, we start by tuning for the best
performance in each individual approach.

• As a result, we compared four algorithm, CCAM, ITCC, k-Means and
k-NN. The MAE metric has shown that CCAM outperformed the
other three algorithms.

• In the future, to have more thorough discussions, we will investigate
our algorithm on different real world data set.
• such as the MovieLens, EachMovie and Book-Crossing data sets which respectively
contains movie and book rating data of users.


THANK YOU FOR
LISTENING.
Q&A

Collaborative filtering with CCAM

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (13)

En vedette

En vedette (11)

Similaire à Collaborative filtering with CCAM

Similaire à Collaborative filtering with CCAM (20)

Dernier

Dernier (20)

Collaborative filtering with CCAM