SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Recommendations
  in Production
     Alex MacCaw
Netflix Prize
Amazon.com
Facebook
Last.fm
StumbleUpon
Google Suggest
iTunes
Rotten Tomatoes
Yelp
Google Search
Chicken or Egg
• Google Reader
• IMDB
Acts As
Recommendable
Types of
   recommendations

• Content Based
• User Based
• Item Based
Programming Collective Intelligence
Has Many Through Relationship
User            Has Many Through
                                              Book

  Has Many                              Has Many




             UserBooks
              Can have score (rating)
User

class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books
end
Gives you


User#similar_users
User#recommended_books
Book#similar_books
The algorithms
• Manhattan Distance
• Euclidean distance
• Cosine
• Pearson correlation coefficient
• Jaccard
• Levenshtein
How does it work?
Strategy

• Map data into Euclidean Space
• Calculate similarity
• Use similarities to recommend
The Black   John Tucker
          Knight      Must Die
James       4            5

Jonah       3            2

George      5            3

 Alex       4            2
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
                          0   1.25   2.50   3.75   5.00



                      John Tucker Must Die
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
                          0   1.25   2.50   3.75   5.00



                      John Tucker Must Die
item id
{
                        user id
    1 => {
       1 => 1.0,
       2 => 0.0,                  score
       ...
    },
    ...
}
[[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
Problem 1

It was far too slow to calculate on the fly
                 (obvious)
SELECT   * FROM quot;usersquot;   WHERE (quot;usersquot;.quot;idquot; = 2)
SELECT   * FROM quot;booksquot;
SELECT   * FROM quot;usersquot;
SELECT   quot;user_booksquot;.*   FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10))
SELECT   * FROM quot;booksquot;   WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5))
SELECT   * FROM quot;booksquot;   WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6))




                                              All books             All user_books
Solution
      Cache the dataset
        Build offline


rake recommendations:build
SELECT   *   FROM   quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
Problem 2

Fetching the dataset took too
 long since it was so massive
Solution


Split up the cache by item
Rails.cache.write(
   quot;aar_books_1quot;, scores
 )
Problem 3

The dataset was so big it
     crashed Ruby!
Solution


Get rid of ActiveRecord

Only deal with integers
items = options[:on_class].connection.select_values(
  quot;SELECT id from #{options[:on_class].table_name}quot;
  ).collect(&:to_i)
Problem 4


It still crashed Ruby!
{
    1 => {
       1 => 1.0,
       2 => 0.0,
       ...
    },
    ...
}
Solution

Remove unnecessary
 cruft from dataset
{
    1 => {
       1 => 1.0,
       ...
    },
    ...
}
Problem 5

It was too slow
Solution


Re-write the slow bits in C
Details

• RubyInline
• Implemented Pearson
• Monkey patched original Ruby methods
• Very fast
InlineC = Module.new do
    inline do |builder|
      builder.c '
      #include <math.h>
      #include quot;ruby.hquot;
      double c_sim_pearson(VALUE items) {



Ruby Object
InlineC = Module.new do
        inline do |builder|
          builder.c '
          #include <math.h>
          #include quot;ruby.hquot;
          double c_sim_pearson(VALUE items) {




No Floats :(
Hash Lookup

if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) {
  prefs1_item = 0.0;
} else {
  prefs1_item = NUM2DBL(prefs1_item_ob);
}
Conversion

return num / den;
Design Designs

• Not too many relationships
• Not to many ‘items’
• Similarity matrix for items, not users
Changing data
Scaling Even Further


• K Means clustering
• Split cluster by category
Adding ratings
ActiveRecord::Schema.define(:version => 1) do
  create_table quot;booksquot;, :force => true do |t|
    t.string   quot;namequot;
    t.datetime quot;created_atquot;
    t.datetime quot;updated_atquot;
  end

 create_table quot;user_booksquot;, :force => true do |t|
   t.integer quot;user_idquot;,    :null => false
   t.integer quot;book_idquot;,    :null => false
   t.integer quot;ratingquot;,     :default => 0
 end

  create_table   quot;usersquot;, :force => true do |t|
    t.string     quot;namequot;
    t.datetime   quot;created_atquot;
    t.datetime   quot;updated_atquot;
  end
end
class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books, :score => :rating
end
That’s it
Improvements?


• Better API
• Perform calculations over a cluster (EC2)
  using Map/Nanite
class AARN < Nanite::Actor
  expose :sim_pearson

  def sim_pearson(item1, item2)
    Optimizations.c_sim_pearson(item1, item2)
  end
end
Questions?
              http://eribium.org/blog

           twitter : maccman
       email/jabber: maccman@gmail.com



http://github.com/maccman/acts_as_recommendable
             http://rubyurl.com/kUpk

Contenu connexe

Similaire à Acts As Recommendable

Relaxing With CouchDB
Relaxing With CouchDBRelaxing With CouchDB
Relaxing With CouchDBleinweber
 
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...ActsAsCon
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)MongoSF
 
Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Dave Bouwman
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsIgnacio Martín
 
Automated Frontend Testing
Automated Frontend TestingAutomated Frontend Testing
Automated Frontend TestingNeil Crosby
 
Automated Testing with Ruby
Automated Testing with RubyAutomated Testing with Ruby
Automated Testing with RubyKeith Pitty
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Railsfreelancing_god
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Rubymametter
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Building unit tests correctly
Building unit tests correctlyBuilding unit tests correctly
Building unit tests correctlyDror Helper
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Michael Peacock
 

Similaire à Acts As Recommendable (20)

Relaxing With CouchDB
Relaxing With CouchDBRelaxing With CouchDB
Relaxing With CouchDB
 
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
 
Sphinx on Rails
Sphinx on RailsSphinx on Rails
Sphinx on Rails
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
 
Cooking with Chef
Cooking with ChefCooking with Chef
Cooking with Chef
 
Automated Frontend Testing
Automated Frontend TestingAutomated Frontend Testing
Automated Frontend Testing
 
Automated Testing with Ruby
Automated Testing with RubyAutomated Testing with Ruby
Automated Testing with Ruby
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Couch db and_the_web
Couch db and_the_webCouch db and_the_web
Couch db and_the_web
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Cucumber
CucumberCucumber
Cucumber
 
MongoDB
MongoDBMongoDB
MongoDB
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Building unit tests correctly
Building unit tests correctlyBuilding unit tests correctly
Building unit tests correctly
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Acts As Recommendable