SlideShare une entreprise Scribd logo
1  sur  17
Big, Practical Recommendations
with Alternating Least Squares

Sean Owen • Apache Mahout / Myrrix.com
WHERE’S BIG LEARNING?
 Next: Application Layer
    Analytics
    Machine Learning
                                     Applications
 Like Apache Mahout
    Common Big Data app today       Processing
    Clustering, recommenders,
     classifiers on Hadoop            Database
    Free, open source; not mature

 Where’s commercialized               Storage
  Big Learning?
A RECOMMENDER SHOULD …
 Answer in Real-time                Accept Diverse Input
    Ingest new data, now               Not just people and products
    Modify recommendations based       Not just explicit ratings
      on newest data                    Clicks, views, buys
    No “cold start” for new data
                                        Side information
 Scale Horizontally                 Be “Pretty Accurate”
    For queries per second
    For size of data set
NEED: 2-TIER ARCHITECTURE
 Real-time Serving Layer
    Quick results based on
      precomputed model
    Incremental update
    Partitionable for scale

 Batch Computation Layer
    Builds model
    Scales out (on Hadoop?)
    Asynchronous, occasional,
      long-lived runs
A PRACTICAL ALGORITHM

MATRIX FACTORIZATION                BENEFITS
 Factor user-item matrix to         Models intuition
  user-feature + feature-item        Factorization is batch
  matrix                              parallelizable
 Well understood in ML, as:         Reconstruction (recs) in
    Principal Component Analysis     low-dimension is fast
    Latent Semantic Indexing
                                     Allows projection of new data
 Several algorithms, like:             Cold start solution
    Singular Value Decomposition       Approximate update solution
    Alternating Least Squares
A PRACTICAL IMPLEMENTATION
ALTERNATING LEAST
SQUARES                              BENEFITS
 Simple factorization P ≈ X YT       Parallelizable by row --
 Approximate: X, Y are                very Hadoop-friendly
  “skinny” (low-rank)                 Iterative: OK answer fast,
 Faster than the SVD                  refine as long as desired
    Trivially parallel, iterative    Yields to “binary” input model
 Dumber than the SVD                    Ratings as regularization
                                           instead
    No singular values,
                                         Sparseness / 0s no longer a
      orthonormal basis
                                           problem
ALS ALGORITHM 1
 Input: (user, item, strength)      1   4   3
  tuples
                                             3
     Anything you can quantify is
       input                             4       3   2
     Strength is positive           5       2       3
 Many tuples per user-item                      5
 R is sparse user-item              2   4               R
  interaction matrix
 rij = total strength of
  interaction between user i
  and item j
ALS ALGORITHM 2
 Follow “Collaborative                    1   1   1   0   0
  Filtering for Implicit
                                           0   0   1   0   0
  Feedback Datasets”
  www2.research.att.com/~yifanhu/PUB/cf.   0   1   0   1   1
  pdf
                                           1   0   1   0   1
 Construct “binary” matrix P
                                           0   0   0   1   0
    1 where R > 0
                                           1   1   0   0   0   P
    0 where R = 0

 Factor P, not R
    R returns in regularization

 Still sparse; implicit 0s fine
ALS ALGORITHM 3
 P is m x n
 Choose k << m, n
 Factor P as Q = X YT, Q ≈ P
    X is m x k ; YT is k x n               YT
 Find best approximation Q
    Minimize L2 norm of diff: || P-Q   X
      ||2
    Minimal squared error:
      “Least Squares”
 Recommendations are
  largest values in Q
ALS ALGORITHM 4
 Optimizing X, Y
  simultaneously is non-
  convex, hard
 If X or Y are fixed, system of
                                       YT
  linear equations:
  convex, easy
 Initialize Y with random         X
  values
 Solve for X
 Fix X, solve for Y
 Repeat (“Alternating”)
ALS ALGORITHM 5
 Define regularization weights cui = 1 + α rui
 Minimize:

  Σ cui(pui – xuTyi)2 + λ(Σ||xu||2 + Σ||yi||2)

 Simple least-squares regression objective, plus
    Weighted least-squared error terms by strength,
      a penalty for not reconstructing 1 at “strong” association is higher
    Standard L2 regularization term
ALS ALGORITHM 6
 With fixed Y, compute optimal X
 Each row xu is independent
 Define Cu as diagonal matrix of cu (user strength weights)
 xu = (YTCuY + λI)-1 YTCupu
 Compare to simple least-squares regression solution (YTY)-1 YTpu
    Adds Tikhonov / ridge regression regularization term λI
    Attaches cu weights to YT

 See paper for how YTCuY is computed efficiently;
  skipping the engineering!
EXAMPLE FACTORIZATION
 k = 3, λ = 2, α = 40, 10 iterations

                                        0.96   0.99   0.99    0.38    0.93
         1    1   1   0   0
                                        0.44   0.39   0.98    -0.11   0.39
         0    0   1   0   0

                               ≈
                                        0.70   0.99   0.42    0.98    0.98
         0    1   0   1   1
         1    0   1   0   1             1.00   1.04   0.99    0.44    0.98   Q = X•YT
                                        0.11   0.51   -0.13   1.00    0.57
         0    0   0   1   0
                                        0.97   1.00   0.68    0.47    0.91
         1    1   0   0   0
FOLD-IN
 Need immediate, if               Note (YTY)(YTY)-1 = I
  approximate, updates for         Gives YT’s right inverse:
  new data                          YT (Y(YTY)-1) = I
 New user u needs new row         Xu = Qu Y(YTY)-1
  Qu = Xu YT
                                   Xu ≈ Pu Y(YTY)-1
 We have Pu ≈ Qu
                                   Recommend as usual:
 Compute Xu via right inverse:     Qu = XuYT
  X YT(YT)-1 = Q(YT)-1 so:
                                   For existing user, instead
  X = Q(YT)-1
                                    add to existing row Xu
 What is   (YT)-1?
THIS IS MYRRIX
 Soft-launched
 Serving Layer available
  as open source download
 Computation Layer available
  as beta
 Ready on Amazon EC2 / EMR
                                srowen@myrrix.com
 Full launch Q4 2012
 myrrix.com
APPENDIX
EXAMPLES

STACKOVERFLOW TAGS               WIKIPEDIA LINKS
 Recommend tags to               Recommend new linked
  questions                        articles from existing links
 Tag questions automatically,    Propose missing, related
  improve tag coverage             links
 3.5M questions x 30K tags       2.5M articles x 1.8M articles
 4.3 hours x 5 machines on       28 hours x 2 PCs on
  Amazon EMR                       Apache Hadoop 1.0.3
 $3.03 ≈ $0.08 per 100,000
  recs

Contenu connexe

Tendances

Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
DataStax
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Roelof van Zwol
 

Tendances (20)

Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATSDeploy Secure and Scalable Services Across Kubernetes Clusters with NATS
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 

Similaire à Big Practical Recommendations with Alternating Least Squares

Relaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete MarketsRelaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete Markets
guasoni
 
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Ilya Loshchilov
 

Similaire à Big Practical Recommendations with Alternating Least Squares (20)

Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression model
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Relaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete MarketsRelaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete Markets
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
 
2014.10.dartmouth
2014.10.dartmouth2014.10.dartmouth
2014.10.dartmouth
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
opt_slides_ump.pdf
opt_slides_ump.pdfopt_slides_ump.pdf
opt_slides_ump.pdf
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Partial Derivatives.pdf
Partial Derivatives.pdfPartial Derivatives.pdf
Partial Derivatives.pdf
 
Lec1 01
Lec1 01Lec1 01
Lec1 01
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 

Plus de Data Science London

Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
Data Science London
 

Plus de Data Science London (20)

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Nowcasting Business Performance
Nowcasting Business PerformanceNowcasting Business Performance
Nowcasting Business Performance
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems Design
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Data Science for Live Music
Data Science for Live MusicData Science for Live Music
Data Science for Live Music
 
Research at last.fm
Research at last.fmResearch at last.fm
Research at last.fm
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music Industry
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Practical Magic with Incanter
Practical Magic with IncanterPractical Magic with Incanter
Practical Magic with Incanter
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Big Practical Recommendations with Alternating Least Squares

  • 1. Big, Practical Recommendations with Alternating Least Squares Sean Owen • Apache Mahout / Myrrix.com
  • 2. WHERE’S BIG LEARNING?  Next: Application Layer  Analytics  Machine Learning Applications  Like Apache Mahout  Common Big Data app today Processing  Clustering, recommenders, classifiers on Hadoop Database  Free, open source; not mature  Where’s commercialized Storage Big Learning?
  • 3. A RECOMMENDER SHOULD …  Answer in Real-time  Accept Diverse Input  Ingest new data, now  Not just people and products  Modify recommendations based  Not just explicit ratings on newest data  Clicks, views, buys  No “cold start” for new data  Side information  Scale Horizontally  Be “Pretty Accurate”  For queries per second  For size of data set
  • 4. NEED: 2-TIER ARCHITECTURE  Real-time Serving Layer  Quick results based on precomputed model  Incremental update  Partitionable for scale  Batch Computation Layer  Builds model  Scales out (on Hadoop?)  Asynchronous, occasional, long-lived runs
  • 5. A PRACTICAL ALGORITHM MATRIX FACTORIZATION BENEFITS  Factor user-item matrix to  Models intuition user-feature + feature-item  Factorization is batch matrix parallelizable  Well understood in ML, as:  Reconstruction (recs) in  Principal Component Analysis low-dimension is fast  Latent Semantic Indexing  Allows projection of new data  Several algorithms, like:  Cold start solution  Singular Value Decomposition  Approximate update solution  Alternating Least Squares
  • 6. A PRACTICAL IMPLEMENTATION ALTERNATING LEAST SQUARES BENEFITS  Simple factorization P ≈ X YT  Parallelizable by row --  Approximate: X, Y are very Hadoop-friendly “skinny” (low-rank)  Iterative: OK answer fast,  Faster than the SVD refine as long as desired  Trivially parallel, iterative  Yields to “binary” input model  Dumber than the SVD  Ratings as regularization instead  No singular values,  Sparseness / 0s no longer a orthonormal basis problem
  • 7. ALS ALGORITHM 1  Input: (user, item, strength) 1 4 3 tuples 3  Anything you can quantify is input 4 3 2  Strength is positive 5 2 3  Many tuples per user-item 5  R is sparse user-item 2 4 R interaction matrix  rij = total strength of interaction between user i and item j
  • 8. ALS ALGORITHM 2  Follow “Collaborative 1 1 1 0 0 Filtering for Implicit 0 0 1 0 0 Feedback Datasets” www2.research.att.com/~yifanhu/PUB/cf. 0 1 0 1 1 pdf 1 0 1 0 1  Construct “binary” matrix P 0 0 0 1 0  1 where R > 0 1 1 0 0 0 P  0 where R = 0  Factor P, not R  R returns in regularization  Still sparse; implicit 0s fine
  • 9. ALS ALGORITHM 3  P is m x n  Choose k << m, n  Factor P as Q = X YT, Q ≈ P  X is m x k ; YT is k x n YT  Find best approximation Q  Minimize L2 norm of diff: || P-Q X ||2  Minimal squared error: “Least Squares”  Recommendations are largest values in Q
  • 10. ALS ALGORITHM 4  Optimizing X, Y simultaneously is non- convex, hard  If X or Y are fixed, system of YT linear equations: convex, easy  Initialize Y with random X values  Solve for X  Fix X, solve for Y  Repeat (“Alternating”)
  • 11. ALS ALGORITHM 5  Define regularization weights cui = 1 + α rui  Minimize: Σ cui(pui – xuTyi)2 + λ(Σ||xu||2 + Σ||yi||2)  Simple least-squares regression objective, plus  Weighted least-squared error terms by strength, a penalty for not reconstructing 1 at “strong” association is higher  Standard L2 regularization term
  • 12. ALS ALGORITHM 6  With fixed Y, compute optimal X  Each row xu is independent  Define Cu as diagonal matrix of cu (user strength weights)  xu = (YTCuY + λI)-1 YTCupu  Compare to simple least-squares regression solution (YTY)-1 YTpu  Adds Tikhonov / ridge regression regularization term λI  Attaches cu weights to YT  See paper for how YTCuY is computed efficiently; skipping the engineering!
  • 13. EXAMPLE FACTORIZATION  k = 3, λ = 2, α = 40, 10 iterations 0.96 0.99 0.99 0.38 0.93 1 1 1 0 0 0.44 0.39 0.98 -0.11 0.39 0 0 1 0 0 ≈ 0.70 0.99 0.42 0.98 0.98 0 1 0 1 1 1 0 1 0 1 1.00 1.04 0.99 0.44 0.98 Q = X•YT 0.11 0.51 -0.13 1.00 0.57 0 0 0 1 0 0.97 1.00 0.68 0.47 0.91 1 1 0 0 0
  • 14. FOLD-IN  Need immediate, if  Note (YTY)(YTY)-1 = I approximate, updates for  Gives YT’s right inverse: new data YT (Y(YTY)-1) = I  New user u needs new row  Xu = Qu Y(YTY)-1 Qu = Xu YT  Xu ≈ Pu Y(YTY)-1  We have Pu ≈ Qu  Recommend as usual:  Compute Xu via right inverse: Qu = XuYT X YT(YT)-1 = Q(YT)-1 so:  For existing user, instead X = Q(YT)-1 add to existing row Xu  What is (YT)-1?
  • 15. THIS IS MYRRIX  Soft-launched  Serving Layer available as open source download  Computation Layer available as beta  Ready on Amazon EC2 / EMR srowen@myrrix.com  Full launch Q4 2012  myrrix.com
  • 17. EXAMPLES STACKOVERFLOW TAGS WIKIPEDIA LINKS  Recommend tags to  Recommend new linked questions articles from existing links  Tag questions automatically,  Propose missing, related improve tag coverage links  3.5M questions x 30K tags  2.5M articles x 1.8M articles  4.3 hours x 5 machines on  28 hours x 2 PCs on Amazon EMR Apache Hadoop 1.0.3  $3.03 ≈ $0.08 per 100,000 recs