BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
Collaborative filtering with CCAM
1. ICMLA'11, Honolulu, Hawaii 1
COLLABORATIVE
FILTERING WITH CCAM
Presenter: Meng-Lun Wu
Author: Meng-Lun Wu, Chia-Hui Chang and Rei-Zhe Liu
Date: 2011/12/21
2. ICMLA'11, Honolulu, Hawaii 2
Outline
• Introduction
• Related Work
• Preliminary
• Collaborative Filtering with CCAM
• Experiment
• Conclusion
3. ICMLA'11, Honolulu, Hawaii 3
Introduction (1/2)
• In any recommender system, the number of ratings already
obtained is usually very small compared to the number of
ratings that need to be predicted.
• A possible solution turns out to be dimensionality reduction
methods which can alleviate data sparsity.
• Typically, clustering is the simplest way that can be extended
over recommender systems to achieve a compact model and
avoid the sparsity problem.
4. ICMLA'11, Honolulu, Hawaii 4
Introduction (2/2)
• In the past years, co-clustering based on information theory has
attracted more and more attention.
• We have extended a co-clustering algorithm based on
information theory to augmented data matrix which called Co-
Clustering with Augmented data Matrix, CCAM.
• In this paper, we consider how to alleviate the sparsity problem
and achieve a precise prediction by Collaborative Filtering with
CCAM.
5. ICMLA'11, Honolulu, Hawaii 5
Related Work
• Information theoretical co-clustering
• Dhillon et al. (2003) developed from information theory and tried to
optimize the objective function based on the loss of mutual information
between clustered random variables.
• Matrix factorization co-clustering
• Chen et al. (2008) linearly combined user-based, item-based CF
method, and matrix factorization results in order to make prediction on
ratings which relied on ONMTF.
• Li et al. (2009) presented a novel cross-domain collaborative filtering
method which co-clusters movie information via ONMTF and
reconstructs knowledge for recommending books and movies.
6. ICMLA'11, Honolulu, Hawaii 6
Preliminary (1/2)
• Suppose that we are given a clicking information matrix R
which is composed of user set, U={u1, u2, …, unu} and a set of
ad, A={a1, a2, …, ana}.
• nu and na respectively represents the number of users and ads.
• For memory-based CF methods, before finding similar
neighbors, it is inevitable to encounter sparsity issues of
demanded data.
• In the research of Dhillon et al. (2003), they considered a co-clustering
algorithm which monotonically decreases the information loss of tabular data
to form a compact model.
7. ICMLA'11, Honolulu, Hawaii 7
Preliminary (2/2)
• Assume U and A are random variable sets with a joint probability
distribution p(U, A) and marginal distribution p(U) and p(A). The
mutual information I(U; A) is defined as
• Suppose there are G1 user clusters CU={cu(1), cu(2), …, cu(G1)} and, G2
ad clusters CA={ca(1), ca(2), …, ca(G2)}, in order to judge the quality of
a co-clustering, we define the loss in mutual information as
• PROPOSITION 1. There are also properties that are declared and
proven, they are
8. ICMLA'11, Honolulu, Hawaii 8
Co-Clustering with Augmented data
Matrix, CCAM (1/4)
• When the optimization problem of loss in mutual information is first
proposed by Dhillon et al. (2003), it was designed and applied for
single tabular data.
• However, in many cases besides the major data set, there exist related tables which
may provide some useful information.
• In this co-clustering approach, Co-Clustering with Augmented data
Matrix (CCAM), we will simultaneously modify the co-clusters of
multiple augmented data to reduce the information loss.
• The other two sets of components, feature set F={f1, f2, …, fn }, and
f
profile set P={p1, p2, …, pnp}, are extensive information for ads and
users and form the augmented matrices
• where nf and np denotes the number of features and profiles, respectively.
9. ICMLA'11, Honolulu, Hawaii 9
Co-Clustering with Augmented data
Matrix, CCAM (2/4)
• PROPOSITION 2. There are extensive properties recognized
when p(A, F) and p(U, P) were considered.
• which were also declared and proven.
• DEFINITION 1. An optimal co-cluster (CU, CA) we desire to
obtain would minimize
13. ICMLA'11, Honolulu, Hawaii 13
Collaborative filtering with CCAM
(2/5)
• DEFINITION 3. Since CCAM is designed on the base of KL-
divergence, the distance metrics would be in a similar format.
• Here we define the distance between each user and user cluster and each ad and
ad cluster.
• Note that the ad cluster prototype and user cluster prototype of
CCAM would be regarded as
17. ICMLA'11, Honolulu, Hawaii 17
Data set
• The data set used in the experiments are obtained from a financial
social web-site, Ad$Mart, which ranged from 2009/09/01 to
2010/03/31.
• For each test user, 15 observed clicking rates (Given15) are provided
to find nearest neighbors and the remaining clicking rates are used for
evaluation.
• To ensure each test user would click at least 15 ads, users with more
than 20 clicked ads and ads with more than 10 clicked user-ad pairs
would be reserved.
• User-Ad: The pre-processing clicking data is provided by 1786 users and 520 ads. After
preprocessing, we make it a joint probability distribution over user and ad, and also reform it into a
clicking rate matrix scaled from 1-5.
• Ad-Feature: An advertisement feature data set compiling 37 statistics of 530 ads.
• User-Profile: A questionnaire data set provided by 520 users on 24 survey questions.
21. ICMLA'11, Honolulu, Hawaii 21
G1 and G2 tuning based on K-Means
• We also have to determine what value of G1 would result in a
well-performed MAE.
• We simply make G2=10 as well as K1 = K2 = 5, and as a strategy to avoid too
many parameter tunings.
• On this issue, we will see the responding of k-Means with different G1
(7, 15, 30, 60) and reserve the best one in order to apply to the other
algorithms.
22. ICMLA'11, Honolulu, Hawaii 22
Parameter tuning with CCAM (1/2)
• In order to evaluate the result of co-clustering, we take
advantage of classification algorithm (Weka J48) on user data to
test the F-measure of 10-fold c.v., and similarly in ad aspect.
• We use the clustering result of the user data (user-ad matrix and user-profile
matrix) as the target labels for evaluation of user clustering, and is similar to
the ad data (ad-user matrix and ad-feature matrix).
• To examine the effectiveness of co-clustering, we reduce the columns of user-
ad matrix to a smaller user-ad cluster matrix. The reduced data is then inserted
into our user data for classification, so as the ad data.
Clustering result
User-
User data of user-ad and
ad cluster
user-profile
23. ICMLA'11, Honolulu, Hawaii 23
Parameter tuning with CCAM (2/2)
• We find that when G1=60, the best setting will be λ=0.2, φ=0.1.
• Therefore, we will then apply the result of the optimal parameters of
CCAM in the next section to compare with the other algorithms.
•
24. ICMLA'11, Honolulu, Hawaii 24
Results
• Table 3 compare the model-
based approaches.
• Table 4 compare the hybrid
models approaches with the
previous parameter settings.
25. ICMLA'11, Honolulu, Hawaii 25
Conclusion
• In this paper, we applied the rating framework of Chen’s to evaluate
the performance of hybrid CF with various model construction.
• In order to give a fair comparison, we start by tuning for the best
performance in each individual approach.
• As a result, we compared four algorithm, CCAM, ITCC, k-Means and
k-NN. The MAE metric has shown that CCAM outperformed the
other three algorithms.
• In the future, to have more thorough discussions, we will investigate
our algorithm on different real world data set.
• such as the MovieLens, EachMovie and Book-Crossing data sets which respectively
contains movie and book rating data of users.