Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Recommendation System --Theory and Practice

2 849 vues

Publié le

Survey on recommendation systems presented at IMI Colloquium, Kyushu University, Feb 18, 2015.

レコメンデーションシステムの最新の研究動向に関する解説です。2015年2月18日に九州大学IMIコロキアムで講演したものです。資料は英語ですが、講演は日本語でやりました。

Publié dans : Technologie
  • Soyez le premier à commenter

Recommendation System --Theory and Practice

  1. 1. Recommendation System — Theory and Practice IMI Colloquium @ Kyushu Univ. February 18, 2015 Kimikazu Kato Silver Egg Technology 1 / 27
  2. 2. About myself Kimikazu Kato Ph.D in computer science, Master's degree in mathematics Experience in numerical computation, especially ... Geometric computation, computer graphics Partial differential equation, parallel computation, GPGPU Now specialize in Machine learning, especially, recommendation system 2 / 27
  3. 3. About our Company Silver Egg Technology Established: 1998 CEO: Tom Foley Main Service: Recommendation System, Online Advertisement Major Clients: QVC, Senshukai (Bell Maison), Tsutaya We provide a recommendation system to Japan's leading web sites. 3 / 27
  4. 4. Today's Story Introduction to recommendation system Rating prediction Shopping behavior prediction Practical viewpoint Conclusion 4 / 27
  5. 5. Recommendation System Recommender systems or recommendation systems (sometimes replacing "system" with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that user would give to an item. — Wikipedia In this talk, we focus on collaborative filtering method, which only utilize users' behavior, activity, and preference. Other methods includes: Content-based methods Method using demographic data Hybrid 5 / 27
  6. 6. Our Service and Mechanism ASP service named "Aigent Recommender" Works as an add-on to the existing web site. 6 / 27
  7. 7. Netflix Prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest. — Wikipedia Shortly, an open competition for preference prediction. Closed in 2009. 7 / 27
  8. 8. Description of the Problem usermovie W X Y Z A 5 4 1 4 B 4 C 2 3 D 1 4 ? Given rating information for some user/movie pairs, is it possible to predict a rating for an unknown user/movie pair? 8 / 27
  9. 9. Notations Number of users: Set of users: Number of items (movies): Set of items (movies): Input matrix: ( matrix) n U = {1, 2, …, n} m I = {1, 2, …, m} A n × m 9 / 27
  10. 10. Matrix Factorization Based on the assumption that each item is described by a small number of latent factors Each rating is expressed as a linear combination of the latent factors Achieve good performance in Netflix Prize Find such matrices , where A ≈ YX T X ∈ Mat(f , n) Y ∈ Mat(f , m) f ≪ n, m 10 / 27
  11. 11. Find and maximize p(A|X, Y , σ) =  ( | , σ) ∏ ≠0aui Aui X T u Yi p(X| ) =  ( |0, I)σX ∏ u Xu σX p(Y | ) =  ( |0, I)σY ∏ i Yi σY X Y p(X, Y |A, σ) 11 / 27
  12. 12. According to Bayes' Theorem, Thus, where means Frobenius norm. How can this be computed? Use MCMC. See [Salakhutdinov et al., 2008]. Once and are determined, and the prediction for is estimated by p(X, Y |A, σ) = p(A|X, Y , σ)p(X| )p(X| ) × const.σX σX log p(U, V |A, σ, , )σU σV = ( − ) + ∥X + ∥Y + const. ∑ Aui Aui X T u Yi λX ∥2 Fro λY ∥2 Fro ∥ ⋅ ∥Fro X Y := YA ~ X T Aui A ~ ui 12 / 27
  13. 13. Rating usermovie W X Y Z A 5 4 1 4 B 4 C 2 3 D 1 4 ? Includes negative feedback "1" means "boring" Zero means "unknown" Shopping (Browsing) useritem W X Y Z A 1 1 1 1 B 1 C 1 D 1 1 ? Includes no negative feedback Zero means "unknown" or "negative" More degree of the freedom Difference between Rating and Shopping Consequently, the algorithm effective for the rating matrix is not necessarily effective for the shopping matrix. 13 / 27
  14. 14. Evaluation Metrics for Recommendation Systems Rating prediction The Root of the Mean Squared Error (RMSE) The square root of the sum of squared errors Shopping prediction Precision (# of Recommended and Purchased)/(# of Recommended) Recall (# of Recommended and Purchased)/(# of Purchased) The criteria are different. This is another reason different algorithms should be applied. 14 / 27
  15. 15. Solutions Adding a constraint to the optimization problem Changing the objective function itself 15 / 27
  16. 16. Adding a Constraint The problem is the too much degree of freedom Desirable characteristic is that many elements of the product should be zero. Assume that a certain ratio of zero elements of the input matrix remains zero after the optimization [Sindhwani et al., 2010] Experimentally outperform the "zero-as-negative" method 16 / 27
  17. 17. [Sindhwani et al., 2010] Introduced variables to relax the problem. Minimize subject to pui ( − ) + ∥X + ∥Y ∑ !=0Aui Aui X T u Yi λX ∥2 Fro λY ∥2 Fro + [ (0 − − (1 − )(1 − ]∑ =0Aui pui X T u Yi ) 2 pui X T u Yi ) 2 +T [− log − (1 − ) log(1 − )] ∑ =0Aui p ui p ui p ui p ui = r 1 |{ | = 0}|Aui Aui ∑ =0Aui pui 17 / 27
  18. 18. Ranking prediction Another strategy of shopping prediction "Learn from the order" approach Predict whether X is more likely to be bought than Y, rather than the probability for X or Y. 18 / 27
  19. 19. Bayesian Probabilistic Ranking [Rendle et al., 2009] Consider matrix factorization model, but the update of elements is according to the observation of the "orders" The parameters are the same as usual matrix factorization, but the objective function is different Consider a total order for each . Suppose that means "the user is more likely to buy than . The objective is to calculate such that and (which means and are not bought by ). >u u ∈ U i j(i, j ∈ I)>u u i j p(i j)>u = 0Aui Auj i j u 19 / 27
  20. 20. Let and define where we assume According to Bayes' theorem, the function to be optimized becomes: = {(u, i, j) ∈ U × I × I| = 1, = 0},DA Aui Auj p( |X, Y ) := p(i j|X, Y ) ∏ u∈U >u ∏ (u,i,j)∈DA >u p(i j|X, Y )>u σ(x) = σ( − )X T u Yi Xu Yj = 1 1 + e −x ∏ p(X, Y | ) = ∏ p( |X, Y ) × p(X)p(Y ) × const.>u >u 20 / 27
  21. 21. Taking log of this, Now consider the following problem: This means "find a pair of matrices which preserve the order of the element of the input matrix for each ." L := log [∏ p( |X, Y ) × p(X)p(Y ) ] >u = log p(i j|X, Y ) − ∥X − ∥Y ∏ (u,i,j)∈DA >u λX ∥2 Fro λY ∥2 Fro = log σ( − ) − ∥X − ∥Y ∑ (u,i,j)∈DA X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro [ log σ( − ) − ∥X − ∥Y ] max X,Y ∑ (u,i,j)∈DA X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro X, Y u 21 / 27
  22. 22. Computation The function we want to optimize: is huge, so in practice, a stochastic method is necessary. Let the parameters be . The algorithm is the following: Repeat the following Choose randomly Update with This method is called Stochastic Gradient Descent (SGD). log σ( − ) − ∥X − ∥Y ∑ (u,i,j)∈DA X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro U × I × I Θ = (X, Y ) (u, i, j) ∈ DA Θ Θ = Θ − α (log σ( − ) − ∥X − ∥Y ) ∂ ∂Θ X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro 22 / 27
  23. 23. Practical Aspect of Recommendation Problem Computational time Memory consumption How many services can be integrated in a server rack? Super high accuracy with a super computer is useless for real business 23 / 27
  24. 24. Sparsification As an expression of a big matrix, a sparse matrix can save computational time and memory consumption at the same time It is advantageous to employ a model whose parameters become sparse 24 / 27
  25. 25. Example of sparse model: Elastic Net In the regression model, adding L1 term makes the solution sparse: The similar idea is used for the matrix factorization [Ning et al., 2011]: Minimize subject to [ ∥Xw − y + ∥w + λρ|w ] min w 1 2n ∥2 2 λ(1 − ρ) 2 ∥2 2 |1 ∥A − AW∥ + ∥W + λρ|W λ(1 − ρ) 2 ∥2 Fro |1 diag W = 0 25 / 27
  26. 26. Conclusion: What is Important for Good Prediction? Theory Machine learning Mathematical optimization Implementation Algorithms Computer architecture Mathematics Human factors! Hand tuning of parameters Domain specific knowledge 26 / 27
  27. 27. References Salakhutdinov, Ruslan, and Andriy Mnih. "Bayesian probabilistic matrix factorization using Markov chain Monte Carlo." Proceedings of the 25th international conference on Machine learning. ACM, 2008. Sindhwani, Vikas, et al. "One-class matrix completion with low-density factorizations." Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 2010. Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009. Zou, Hui, and Trevor Hastie. "Regularization and variable selection via the elastic net." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67.2 (2005): 301-320. Ning, Xia, and George Karypis. "SLIM: Sparse linear methods for top-n recommender systems." Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 2011. 27 / 27

×