SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
SVD Applied to
Collaborative Filtering
      ~ URUG 7-12-07 ~
Recommendation System
Recommendation System
Answers the question:
What do I want next?!?
Recommendation System
Answers the question:
What do I want next?!?

 Very consumer driven.

 Must provide good results or a user may not
 trust the system in the future.
Collaborative Filtering
Base user recommendations off of:

  User’s past history.

  History of like-minded users.

View data as product X user matrix.

Find a “neighborhood” of similar users
for that user.

Return the top-N recommendations.
Early Approaches

Goldberg, et. al. (1992), Using
collaborative filtering to weave an
information tapestry
Konstan, J., el. at (1997), Applying
Collaborative Filtering to Usenet news.

Use Pearson Correlation or cosine similarity
as a measure of similarity to form
neighborhoods.
Early CF Challenges
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.

Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.

Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.

Synonymy
Dimensionality Reduction
Dimensionality Reduction
 Latent Semantic Indexing (LSI)
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.

 Easily maps to CF!
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.

 Easily maps to CF!
Framing LSI for CF
Products X Users matrix instead of Terms X
Documents.

        Netflix Dataset
480,189 users, 17,770 movies, only ~100 milion ratings.

17,770 X 480,189 matrix that is 99% sparse!

  About 8.5 billion potential ratings.
SVD- The math behind LSI
   Singular Value Decomposition

      For any M x N matrix A of rank r, it can
      decomposed as:

                                         T
      A = UΣV
 U is a M x M orthogonal matrix.
 V is a N X N orthogonal matrix.
 Σ is a M x N diagonal matrix whose first r diagonal
 entries are the nonzero singular values of A.
σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
Related to eigenvalue
  decomposition (PCA)
U is the orthornormal eigenspace of
AA^T. Spans the “column space”, known
as left singular vectors.
V is the orthornormal eigenspace of
A^TA. Spans “row space”. Right vectors.
Singular values are the square roots of
the eigenvalues.
Reducing Dimensionality


                                  T
                      Ak = Uk ΣkVk

 A_k is the closest approximation to A.

 A_k minimizes the Frobenius norm over all
 rank-k matrices: ||A − Ak ||F
Making Recommendations
 Cosine Similarity- common way to find neighborhood.
                   i· j
 cos(i, j) =
             ||i||2 ∗ || j||2
Somehow base recommendations off of that
neighborhood and its users.

Can also make predictions of products with a simple
dot product if the singular values are combined with
the singular vectors.
                        1/2      1/2 T
     CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
Challenges with SVD
Scalability - Once again, compute
time grows with the number of users
and products. O(m^3)
  Offline stage.
  Online stage.
Even doing the SVD computation offline
is not possible for large datasets.
Other methods are needed.
Incremental SVD
          T
 uk = u       Vk Σk
                  −1
Incremental SVD Results
GHA for SVD
  Gorrell (2006),GHA for Incremental SVD in
  NLP

      Based off of Sanger’s (1989) GHA for eigen
      decomposition.
  a
∆ci      b
      = ci · b(x −    ∑           a a
                            (a · c j )c j )
                      j<i
  b
∆ci      a
      = ci · a(b −   ∑           b b
                           (b · c j )c j )
                     j<i
GHA extended by Funk

 void train(int user, int movie, real rating)
 {
 
real err = lrate * (rating - predictRating(movie, user));

 
userValue[user] += err * movieValue[movie];
 
movieValue[movie] += err * userValue[user];
 }
Netflix Results
Best RMSEs

  0.9283

  0.9212

Blended to get 0.9189, 3.42% better than
Netflix.
Summary
SVD provides an elegant and automatic
recommendation system that has the
potential to scale.

There are many different algorithms to
calculate or at least approximate SVD which
can be used in offline stages for websites
that need to have CF.

Every dataset is different and requires
experimentation with to get the best results.

Contenu connexe

Tendances

Transformer xl
Transformer xlTransformer xl
Transformer xlSan Kim
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxAyushkumar417871
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Graph convolutional matrix completion
Graph convolutional  matrix completionGraph convolutional  matrix completion
Graph convolutional matrix completionpko89403
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at SpotifyRohan Agrawal
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptxrandominfo
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 

Tendances (20)

Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Graph convolutional matrix completion
Graph convolutional  matrix completionGraph convolutional  matrix completion
Graph convolutional matrix completion
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptx
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 

Similaire à SVD and the Netflix Dataset

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...IRJET Journal
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRitesh Sawant
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesAnne-Marie Tousch
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET Journal
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...SamanthaGallone
 
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTPerformance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTIRJET Journal
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 

Similaire à SVD and the Netflix Dataset (20)

Group Project
Group ProjectGroup Project
Group Project
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the Trenches
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Gene's law
Gene's lawGene's law
Gene's law
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...
 
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTPerformance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 

Plus de Ben Mabey

PCA for the uninitiated
PCA for the uninitiatedPCA for the uninitiated
PCA for the uninitiatedBen Mabey
 
Clojure, Plain and Simple
Clojure, Plain and SimpleClojure, Plain and Simple
Clojure, Plain and SimpleBen Mabey
 
Cucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakCucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakBen Mabey
 
Writing Software not Code with Cucumber
Writing Software not Code with CucumberWriting Software not Code with Cucumber
Writing Software not Code with CucumberBen Mabey
 
Outside-In Development With Cucumber
Outside-In Development With CucumberOutside-In Development With Cucumber
Outside-In Development With CucumberBen Mabey
 
Disconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordDisconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordBen Mabey
 
The WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecThe WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecBen Mabey
 

Plus de Ben Mabey (8)

PCA for the uninitiated
PCA for the uninitiatedPCA for the uninitiated
PCA for the uninitiated
 
Clojure, Plain and Simple
Clojure, Plain and SimpleClojure, Plain and Simple
Clojure, Plain and Simple
 
Github flow
Github flowGithub flow
Github flow
 
Cucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakCucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already Speak
 
Writing Software not Code with Cucumber
Writing Software not Code with CucumberWriting Software not Code with Cucumber
Writing Software not Code with Cucumber
 
Outside-In Development With Cucumber
Outside-In Development With CucumberOutside-In Development With Cucumber
Outside-In Development With Cucumber
 
Disconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordDisconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecord
 
The WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecThe WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpec
 

Dernier

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

SVD and the Netflix Dataset

  • 1. SVD Applied to Collaborative Filtering ~ URUG 7-12-07 ~
  • 3. Recommendation System Answers the question: What do I want next?!?
  • 4. Recommendation System Answers the question: What do I want next?!? Very consumer driven. Must provide good results or a user may not trust the system in the future.
  • 5. Collaborative Filtering Base user recommendations off of: User’s past history. History of like-minded users. View data as product X user matrix. Find a “neighborhood” of similar users for that user. Return the top-N recommendations.
  • 6. Early Approaches Goldberg, et. al. (1992), Using collaborative filtering to weave an information tapestry Konstan, J., el. at (1997), Applying Collaborative Filtering to Usenet news. Use Pearson Correlation or cosine similarity as a measure of similarity to form neighborhoods.
  • 8. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs.
  • 9. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.
  • 10. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users. Synonymy
  • 12. Dimensionality Reduction Latent Semantic Indexing (LSI)
  • 13. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.)
  • 14. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
  • 15. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships.
  • 16. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  • 17. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  • 18. Framing LSI for CF Products X Users matrix instead of Terms X Documents. Netflix Dataset 480,189 users, 17,770 movies, only ~100 milion ratings. 17,770 X 480,189 matrix that is 99% sparse! About 8.5 billion potential ratings.
  • 19. SVD- The math behind LSI Singular Value Decomposition For any M x N matrix A of rank r, it can decomposed as: T A = UΣV U is a M x M orthogonal matrix. V is a N X N orthogonal matrix. Σ is a M x N diagonal matrix whose first r diagonal entries are the nonzero singular values of A. σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
  • 20. Related to eigenvalue decomposition (PCA) U is the orthornormal eigenspace of AA^T. Spans the “column space”, known as left singular vectors. V is the orthornormal eigenspace of A^TA. Spans “row space”. Right vectors. Singular values are the square roots of the eigenvalues.
  • 21. Reducing Dimensionality T Ak = Uk ΣkVk A_k is the closest approximation to A. A_k minimizes the Frobenius norm over all rank-k matrices: ||A − Ak ||F
  • 22. Making Recommendations Cosine Similarity- common way to find neighborhood. i· j cos(i, j) = ||i||2 ∗ || j||2 Somehow base recommendations off of that neighborhood and its users. Can also make predictions of products with a simple dot product if the singular values are combined with the singular vectors. 1/2 1/2 T CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
  • 23. Challenges with SVD Scalability - Once again, compute time grows with the number of users and products. O(m^3) Offline stage. Online stage. Even doing the SVD computation offline is not possible for large datasets. Other methods are needed.
  • 24. Incremental SVD T uk = u Vk Σk −1
  • 26. GHA for SVD Gorrell (2006),GHA for Incremental SVD in NLP Based off of Sanger’s (1989) GHA for eigen decomposition. a ∆ci b = ci · b(x − ∑ a a (a · c j )c j ) j<i b ∆ci a = ci · a(b − ∑ b b (b · c j )c j ) j<i
  • 27. GHA extended by Funk void train(int user, int movie, real rating) { real err = lrate * (rating - predictRating(movie, user)); userValue[user] += err * movieValue[movie]; movieValue[movie] += err * userValue[user]; }
  • 28. Netflix Results Best RMSEs 0.9283 0.9212 Blended to get 0.9189, 3.42% better than Netflix.
  • 29. Summary SVD provides an elegant and automatic recommendation system that has the potential to scale. There are many different algorithms to calculate or at least approximate SVD which can be used in offline stages for websites that need to have CF. Every dataset is different and requires experimentation with to get the best results.