1) The document presents a new co-clustering framework called Block Value Decomposition (BVD) for dyadic data. BVD factorizes a data matrix into three components: a row coefficient matrix, a block value matrix, and a column coefficient matrix.
2) An algorithm for non-negative BVD (NBVD) is derived based on minimizing the reconstruction error between the original and reconstructed matrices. The algorithm iteratively updates the three matrices using equations derived from Kuhn-Tucker conditions.
3) Empirical evaluations on text clustering datasets show NBVD achieves high clustering accuracy that is competitive with or better than other co-clustering algorithms.
1. Author / Bo Long, Zhongfei Zhang and Philip S. Yu Source / ACM KDD’05, August 21-24, 2005, pp. 635 – pp. 640 Presenter / Allen Wu Co-clustering by Block Value Decomposition 1
2. Outline Introduction Block value decomposition Derivation of the Algorithm Empirical Evaluation Conclusion 2
3. Introduction Dyadic data refer to a domain with two finite sets of objects in which observations are made for dyads. Co-clustering can effectively deal with the high dimensional and sparse data between rows and columns. In this paper, a new co-clustering framework, Block Value Decomposition(BVD), had been proposed. 3
4. Introduction (cont.) This paper develop a specific novel co-clustering algorithm for a special yet very popular case – non-negative dyadic data. The algorithm performs an implicitly adaptive dimensionality reduction, which works well for typical sparse data. The dyadic data matrix is factorized into three components. The row-coefficient matrix – R The block value matrix– B The column-coefficient matrix– C 4
5. The definition of dyadic data 5 The notion dyadic refers to a domain with two sets of objects X={x1, …, xn} and Y={y1, …, ym} The data can be organized as an n by m two-dimensional matrix Z. Each w(x,y) corresponds to one element of Z.
7. 7 y1 y2 y3 y4 x1 x2 x3 x4 × × C B R y1 y2 y3 y4 = y1 y2 y3 y4 x1 x2 x3 x4 x1 x2 x3 x4 RBC Z
8. Block value decomposition definition 8 Non-negative block value decomposition of a non-negative data matrix Z n×m(i.e. ij: Zij 0) is given by the minimization of f(R, B, C) = ||Z – RBC||2 subject to the constraints ij: Rij 0, Bij 0 and Cij 0, where R n×k, B k×l, C l×m, k<<n, and l<<m. If R=CT, symmetric non-negative block value decomposition of a symmetric non-negative data matrix Z n×n(i.e. ij: Zij 0) is given by the minimization of f(S, B,) = ||Z – SBST||2 ij: Sij 0, and Bij 0, where S n×k, B k×k and k<<n.
9. Derivation of the algorithm 9 The objective function is convex in R, B and C respectively. However, it is not convex in all of them simultaneously. Thus, it is unrealistic to expect an algorithm to find the global minimum. Theorem 1. If R, B and C are a local minimizer of the objective function , then the equations (ZCTBT )。R + (RBCCTBT )。R = 0 (RTZCT )。B +(RTRBCCT )。B = 0 (BTRTZ)。C + (BTRTRBC)。C = 0 are satisified, where 。denotes the Hadamard product of two matrices.
10. Derivation of the algorithm (cont.) 10 Let λ1, λ2, and λ3 be the Lagrange multipliers for the constraint R, B, and C 0, respectively, where λ1k×n, λ2l×k and λ3m×l. The Lagrange function L(R, B, C, λ1, λ2, λ3 ) becomes: L = f(R;B;C) -tr(λ1 RT ) -tr(λ2BT ) - tr(λ3 CT ) The Kuhn-Tucker conditions are: L/ R = L/ B = L/ C = 0 λ1。R = λ2。B = λ3。C = 0 Taking the derivatives, we obtain the following three equations, respectively. 2ZCTBT - 2RBCCTBT + λ1 = 0 2RTZCT - 2RTRBCCT + λ2 = 0 2BTRTZ - 2BTRTRBC + λ3 = 0
11. Derivation of the algorithm (cont.) 11 Based on Theorem 1, we propose following updating rules. If the R=CT, we derive the updating rules for symmetric matrix, that the symmetric NBVD provides only one clustering result.
12. EMPIRICAL EVALUATIONS 12 The experiment dataset is collected from the 20-Newsgroup data and CLASSIC3 dataset. We measure the clustering performance using the accuracy given by the confusion matrix of the obtained clusters and the "real" classes.
15. Conclusion 15 In this paper, we have proposed a new co-clustering frame work for dyadic data called Block Value Decomposition. Under this framework, we focus on a special but also very popular case, Non-negative Block Value Decomposition. We have shown the correctness of the NBVD algorithm theoretically. According to the empirical evaluations, the effectiveness and the great potential of the BVD framework.