SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
People who liked this talk also liked …
Building Recommendation Systems
             Using Ruby

            Ryan Weald, @rweald
             LA RubyConf 2013




                                          1
Who is this guy?

 What does he know
about recommendation
       systems?

                       2
Data Scientist @Sharethrough




 Native advertising
     platform
                               3
4
Outline
1) What is a recommendation system?
2) Collaborative filtering based
   recommendations
3) Content based recommendations
4) Hybrid systems - the best of both worlds
5) Evaluating your recommendation system
6) Resources & existing libraries


                                              5
What this Talk is Not
• Everything there is to know about
  recommendation systems.
• Bleeding edge machine learning
• How to use a specific library




                                      6
What is a
recommendation system?



                         7
A program that predicts
a user’s preferences using information
 about the user, other users, and the
         items in your system.




                                         8
LinkedIn




           9
Netflix




         10
Spotify




          11
Amazon




         12
How do I build
recommendations?



                   13
Two Main Categories of Algorithm



1. Collaborative Filtering (CF)

2. Content Based - Classification




                                   14
Collaborative Filtering


Fill in missing user preferences using
         similar users or items




                                         15
Two Types of CF
1. Memory Based - Uses similarity
between users or items. Dataset
usually kept in memory

2. Model Based - Model generated
to “explain” observed ratings


                                    16
User Based CF


 (User x Item) Matrix + Similarity
Function = Top-K most similar users




                                      17
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               18
Similarity Functions

• Pearson Correlation Coefficient
• Cosine Similarity




                                   19
Pearson Correlation Coefficient




                                 20
Calculating PCC




                  21
Calculating PCC




                  22
Calculating PCC




                  23
Calculating PCC




                  24
Calculating PCC




                  25
Calculating PCC




                  26
27
Using similarity to
recommend items



                      28
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               29
30
Problems With CF

• Cold Start
• Data Sparsity
• Resource expensive



                        31
Doesn’t the video
content matter for
recommendations?


                     32
Content Based Recommendations


  Classify items based on features of
   the item. Pick other items from
      same class to recommend.




                                        33
Content Based Algorithms
• K-means clustering
• Random Forrest
• Support Vector Machines
• ...
• Insert your favorite ML algorithm

                                      34
Content Based Algorithms
          Type of    Duration   Maturity
          content                Rating
Video 1   comedy        60         G

Video 2    action      120         G

Video 3   comedy        34      PG-13

Video 4   romantic      15         R

Video 5    sports      120         G




                                           35
K-means Clustering


  Group items into K clusters.
Assign new item to a cluster and
  pick items from that cluster




                                   36
K-means Clustering




                     37
Problems With Content Based
      Recommendations

• Unsupervised Learning is hard
• Training data limited or expensive
• Doesn’t take user into account
• Limited by features of content

                                       38
Hybrid Recommendations


Combine collaborative filtering with
content based algorithm to achieve
          greater results




                                      39
Hybrid Recommendations

Input
           CF Based
         Recommender

                         Combiner   Reco


Input
         Content Based
         Recommender




                                           40
Hybrid Recommendations




                         41
Hybrid Recommendations



            Content         CF
Input                                 Reco
          Recommender   Recommender




                                             42
Hybrid Recommendations


            CF
        Recommender
Input                        Reco
          Content
        Recommender




                                    43
Evaluating Recommendation Quality


• Precision vs. Recall
• Clicks
• Click through rate
• Direct user feedback


                                    44
Precision vs. Recall




                       45
Precision vs. Recall




                       46
Summary of What We’ve Learned


 • Collaborative Filtering using similar users
 • Content clustering using k-means
 • Combining 2 algorithms to boost quality
 • How to evaluate your recommender


                                                 47
Don’t Reinvent the Wheel

• Apache Mahout
• JRuby mahout gem
• SciRuby
• Recommenderlab for R


                             48
Resources & Further Reading
• Recommender Systems: An Introduction
• Linden, Greg, Brent Smith, and Jeremy York.
"Amazon. com recommendations: Item-to-item
collaborative filtering."
• Resnick, Paul, et al. "GroupLens: an open architecture
for collaborative filtering of netnews."
• ACM RecSys Conference Proceedings


                                                           49
We’re Hiring
http://bit.ly/str-engineering




                                50
Thanks!
        Twitter: @rweald
Email: ryan@sharethrough.com




                               51

Contenu connexe

Similaire à People who liked this talk also liked … Building Recommendation Systems Using Ruby

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Eric Schwartzman
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010guest3b9e35d
 
Code review in practice
Code review in practiceCode review in practice
Code review in practiceEdorian
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Atlassian
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modulesneilbowers
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit TestingShaun Abram
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)El Mahdi Benzekri
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to GreatChris Sietsema
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsNikolaos Tsantalis
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesJose Santos
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantCameron Presley
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationGiannis Tsakonas
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysismeetcontent
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP StackLorna Mitchell
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Steven Hoober
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systemsAravindharamanan S
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Valerie Puffet-Michel
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Lili Wu
 

Similaire à People who liked this talk also liked … Building Recommendation Systems Using Ruby (20)

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010
 
Code review in practice
Code review in practiceCode review in practice
Code review in practice
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modules
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit Testing
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub Contributors
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprises
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually Want
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluation
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysis
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systems
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 

Dernier

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

People who liked this talk also liked … Building Recommendation Systems Using Ruby