SlideShare une entreprise Scribd logo
1  sur  86
Télécharger pour lire hors ligne
                A Python Framework for Building
                    Recommendation Engines
                  PythonBrasil 2011, São Paulo, SP

Marcel Caraciolo Ricardo Caspirro                Bruno Melo
   @marcelcaraciolo        @ricardocaspirro          @brunomelo
What is Crab ?

 A python framework for building recommendation engines
A Scikit module for collaborative, content and hybrid filtering
       Mahout Alternative for Python Developers :D
             Open-Source under the BSD license

When started ?

It began one year ago
Community-driven, 4 members
Since April,2011 the open-source labs Muriçoca incorporated it
Since April,2011 rewritting it as Scikit

Knowing Scikits
Scikits are Scipy Toolkits - independent and projects hosted
                under a common namespace.

                       Scikits Image
                     Scikits MlabWrap
                     Scikits AudioLab
                      Scikit Learn

Knowing Scikits


    Machine Learning Algorithms + scientific Python packages
                (Numpy, Scipy and Matplotlib)


Our goal: Incorporate the Crab as Scikit and incorporate
           some parts of them at Scikit-learn
Why Recommendations ?
The world is an over-crowded place
Why Recommendations
     * +,&-.$/).#&0#/"1.#$%234(".#                   ?
       $/)#5(&6 7&.2.#"$4,#)$8
                   We are overloaded
     * 93((3&/.#&0#:&'3".;#5&&<.#
Thousands of news articles and blog posts each day
       * =/#>$/&3;#?#@A#+B#4,$//"(.;#
 Millions of movies, books and music tracks online
          Several Places, Offers and Events

     * =/#C"1#D&%<;#."'"%$(#
  Even Friends sometimes we are overloaded !

Why Recommendations ?
We really need and consume only a few of them!

   “A lot of times, people don’t know what
   they want until you show it to them.”
                                         Steve Jobs

  “We are leaving the Information age, and
  entering into the Recommendation age.”
                      Chris Anderson, from book Long Tail
Why Recommendations ?
Can Google help ?
  Yes, but only when we really know what we are looking for
           But, what’s does it mean by “interesting” ?
Can Facebook help ?
  Yes, I tend to find my friends’ stuffs interesting
   What if i had only few friends and what they like do not always
                             attract me ?
Can experts help ?
  Yes, but it won’t scale well.
    But it is what they like, not me! Exactly same advice!
Why Recommendations ?
         Recommendation Systems
Systems designed to recommend to me something I may like
Why Recommendations ?
     Recommendation Systems

      -+*#)+.               -#/')             0#)1#

2'              23&4"+')1               5,6           7),*%'"&863

                      Graph Representation
The current Crab

Collaborative Filtering algorithms
 User-Based, Item-Based and Factorization Matrix (SVD)

Evaluation of the Recommender Algorithms
 Precision, Recall, F1-Score, RMSE

                           Precision-Recall Charts
The current Crab

   Precision-Recall Charts
Collaborative Filtering

                O Vento                         Toy
Thor                            Armagedon              Items
                 Levou                         Store


       Marcel        Rafael           Amanda           Users

The current Crab
The current Crab
>>>#load the dataset
The current Crab
>>>#load the dataset

>>> from crab.datasets import load_sample_movies
The current Crab
>>>#load the dataset

>>> from crab.datasets import load_sample_movies
>>> data = load_sample_movies()
The current Crab
>>>#load the dataset

>>> from crab.datasets import load_sample_movies
>>> data = load_sample_movies()
>>> data
The current Crab
>>>#load the dataset

>>> from crab.datasets import load_sample_movies
>>> data = load_sample_movies()
>>> data
{'DESCR': 'sample_movies data set was collected by the book called
          nProgramming the Collective Intelligence by Toby Segaran nnNotesn-----
          nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.',
 'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},
  2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},
  3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},
  4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},
  5: {2: 4.5, 3: 1.0, 4: 4.0},
  6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},
  7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},
 'item_ids': {1: 'Lady in the Water',
  2: 'Snakes on a Planet',
  3: 'You, Me and Dupree',
  4: 'Superman Returns',
  5: 'The Night Listener',
  6: 'Just My Luck'},
 'user_ids': {1: 'Jack Matthews',
  2: 'Mick LaSalle',
  3: 'Claudia Puig',
  4: 'Lisa Rose',
  5: 'Toby',
  6: 'Gene Seymour',
  7: 'Michael Phillips'}}
The current Crab
The current Crab

>>> from crab.models import MatrixPreferenceDataModel
The current Crab

>>> from crab.models import MatrixPreferenceDataModel
>>> m = MatrixPreferenceDataModel(
The current Crab

>>> from crab.models import MatrixPreferenceDataModel
>>> m = MatrixPreferenceDataModel(

>>> print m
MatrixPreferenceDataModel (7 by 6)
         1          2          3          4            5        ...
1        3.000000   4.000000   3.500000   5.000000   3.000000
2        3.000000   4.000000   2.000000   3.000000   3.000000
3           ---     3.500000   2.500000   4.000000   4.500000
4        2.500000   3.500000   2.500000   3.500000   3.000000
5           ---     4.500000   1.000000   4.000000       ---
6        3.000000   3.500000   3.500000   5.000000   3.000000
7        2.500000   3.000000       ---    3.500000   4.000000
The current Crab
The current Crab
>>> #import pairwise distance
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
>>> #import similarity
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
>>> #import similarity
>>> from crab.similarities import UserSimilarity
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
>>> #import similarity
>>> from crab.similarities import UserSimilarity
>>> similarity = UserSimilarity(m,
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
>>> #import similarity
>>> from crab.similarities import UserSimilarity
>>> similarity = UserSimilarity(m,
>>> similarity[1]
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
 >>> #import similarity
 >>> from crab.similarities import UserSimilarity
 >>> similarity = UserSimilarity(m,
 >>> similarity[1]
       [(1, 1.0),
(6, 0.66666666666666663),
(4, 0.34054242658316669),
(3, 0.32037724101704074),
(7, 0.32037724101704074),
(2, 0.2857142857142857),
(5, 0.2674788903885893)]
The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
 >>> #import similarity
 >>> from crab.similarities import UserSimilarity
 >>> similarity = UserSimilarity(m,
 >>> similarity[1]
       [(1, 1.0),
(6, 0.66666666666666663),   MatrixPreferenceDataModel (7 by 6)
                                     1          2          3          4            5
(4, 0.34054242658316669),   1        3.000000   4.000000   3.500000   5.000000   3.000000
(3, 0.32037724101704074),   2        3.000000   4.000000   2.000000   3.000000   3.000000
                            3           ---     3.500000   2.500000   4.000000   4.500000
(7, 0.32037724101704074),   4        2.500000   3.500000   2.500000   3.500000   3.000000
                            5           ---     4.500000   1.000000   4.000000       ---
(2, 0.2857142857142857),    6        3.000000   3.500000   3.500000   5.000000   3.000000
(5, 0.2674788903885893)]    7        2.500000   3.000000       ---    3.500000   4.000000
The current Crab
The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender
The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender
>>> recsys = UserBasedRecommender(model=m,
similarity=similarity, capper=True,with_preference=True)
The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender
>>> recsys = UserBasedRecommender(model=m,
similarity=similarity, capper=True,with_preference=True)

>>> recsys.recommend(5)
array([[ 5.        , 3.45712869],
       [ 1.        , 2.78857832],
       [ 6.        , 2.38193068]])
The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender
>>> recsys = UserBasedRecommender(model=m,
similarity=similarity, capper=True,with_preference=True)

>>> recsys.recommend(5)
array([[ 5.        , 3.45712869],
       [ 1.        , 2.78857832],
       [ 6.        , 2.38193068]])

>>> recsys.recommended_because(user_id=5,item_id=1)
array([[ 2. , 3. ],
       [ 1. , 3. ],
       [ 6. , 3. ],
       [ 7. , 2.5],
       [ 4. , 2.5]])
The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender
>>> recsys = UserBasedRecommender(model=m,
similarity=similarity, capper=True,with_preference=True)

>>> recsys.recommend(5)
array([[ 5.        , 3.45712869],
       [ 1.        , 2.78857832],
       [ 6.        , 2.38193068]])

>>> recsys.recommended_because(user_id=5,item_id=1)
array([[ 2. , 3. ],
       [ 1. , 3. ],       MatrixPreferenceDataModel (7 by 6)
                                   1          2          3        4                     5        ...
       [ 6. , 3. ],       1        3.000000   4.000000   3.500000 5.000000            3.000000
                          2        3.000000   4.000000   2.000000 3.000000            3.000000
       [ 7. , 2.5],       3           ---     3.500000   2.500000 4.000000            4.500000
       [ 4. , 2.5]])      4        2.500000   3.500000   2.500000 3.500000            3.000000
                                   5         ---     4.500000   1.000000   4.000000       ---
                                   6      3.000000   3.500000   3.500000   5.000000   3.000000
                                   7      2.500000   3.000000      ---     3.500000   4.000000
The current Crab

Using REST APIs to deploy the recommender
          django-piston, django-rest, django-tastypie
Crab is already in production

   News from Abril Publisher recommendations!
                    Collecting over 10 magazines, 20 books and 100+ articles

  Running on Python
      + Scipy +


Easy-to-use interface

  Still in development
Content Based Filtering


Duro de            O Vento                         Toy
                                Armagedon                  Items
 Matar              Levou                         Store


                             Marcel                       Users
Crab is already in production

        PythonBrasil keynotes Recommender
               Recommending keynotes based on a hybrid approach

  Running on Python
      + Scipy +
Collaborative Filtering

   Schedule your

   Still in development
source, the recommendation architecture that we propose will                    would rely more on collaborative-filtering techniques, that is,
aggregate the results of such filtering techniques.                                   Bezerra and Carvalho proposed approaches where the results
                                                                                the reviews from similar users.
   We aim at integrating the previously mentioned hybrid prod-                     Figure 1 shows a overview of our meta recommender
                                                                                     achieved showed to be very promising [19].
                                                                                approach. By combining the content-based filtering and the
uct recommendation approach in a mobile application so the

                   Crab is already in production
users could benefit from useful and logical recommendations.                     collaborative-based one into a hybrid recommender system, it
Moreover, we aim at providing a suited explanation for each                     would use the services/products III. S YSTEM catalogues
                                                                                                                repositories which D ESIGN
recommendation to the user, since the current approaches just                   the services to be recommended, and the review repository
                                                                                        Application data information our mobile recommender sys-
                                                                                that contains the user opinions about those services. All this                                                 for
only deliver product recommendations with a overall score
without pointing out the appropriateness of such recommen-                      datatembecan be from data source containers in the web product description
                                                                                      can    extracted divided into two parts: the                                                             rec
dation [13]. Besides the basic information provided by the                      such(such location-based social network Foursquare its attributes) and the user
                                                                                      as the as location, description and [17] as

                                         Hybrid Meta Approach gives the system’s architecture and
suppliers, the system will deliver the explanation, providing
relevant reviews of similar users, we believe that it will
                                                  tags, etc.). The Figure 3
increase the confidence in the buying decision process and the
                                                                                displayed at the Figure 2 and the location recommendation
                                                                                engine from Google: Google HotPot [18]. by user (such as rating, comments,
                                                                                     reviews or ratings provided
product accepptance rate. In the mobile context this approach
could help the users in this process and showing the user
                                                                                   relative components.                                                                                        thi
opinions could contribute to achieve this task.                                                                                                                                                rec
                                                                                     !"#$"%&'$                                                         5&-$
        !"#$%&'%($)                               !".,"/#)                                                                                                                                     acc
        !"*+#,$+'-)                              !"*+#,$+'-)                                                                +,-*.&$
                                                                                                                           /01&'234&$          !6#$6,00&41&7$
                                             )))))))))))%$4%,5)94,14>?)                                                                                                    <',7)41$
                                                                                                                  8&4,99&0731*,0$:0;*0&$                        !B#$B*%1$,2$D4,'&7$<',7)41%$
                                                                                       Fig. 2.   User Reviews from Foursquare Social Network                              8&=,%*1,'>$
                                                                                   The content-based filtering approach will be used to filter                                                   ext
                                                                                the product/service repository, while the collaborative based
                                                                                                                        8&%).1%$                                                               B.
                                                                                approach will derive the product review recommendations. In
                                                                                addition we will use text mining techniques to distinct the
                               !"8+99"(2%$,+(#)                                 polarity of the user review between positive or negative one.
                                                                                This information summarized would contribute in the product Architecture
                                                                                                   Fig. 3. Mobile Recommender System                                                           rat
                                                                                score recommendation computation. The final product recom-
                Fig. 1.    Meta Recommender Architecture
                                                                                mendation score is computed by integrating the result of both
                                                                                recommenders. By now, weproduct/service recommender, the user could
                                                                                        In our mobile are considering to use different                                                         and
   Since one of the goals of this work is to incorporate                        options regarding this integration approach, one and get a list of recommen-
different data sources of user opinions and descriptions, we                         filter some products or services at special                                                                oth
                                                                                is the symbolic data analysis approach (SDA) [19], which
have addopted an meta recommendation architecture. By using                     eachtations. The user user ratings/reviews arehis preferences or give his
                                                                                      product description and also can enter modeled                                                           ow
a meta recommender architecture, the system would provide
a personalized control over the generated recommendation list
                                                                                     feedback to some offered product recommendation.
                                                                                as set of modal symbolic descriptions that summarizes the                                                      Re
                                                                                information provided by the corresponding data sources. It is
Crab is already in production

  Brazilian Social Network called
         Educational network with more than 60.000 students and 120 video-classes

     Running on Python
    + Numpy + Scipy and

Backend for Recommendations
MongoDB - mongoengine

   Daily Recommendations
    with Explanations
Evaluating your recommender
 Crab implements the most used recommender metrics.
     Precision, Recall, F1-Score, RMSE

     Using matplotlib
     for a plotter utility

 Implement new metrics

Simulations support maybe (??)
Evaluating your recommender
Evaluating your recommender
>>> from crab.metrics.classes import CfEvaluator
Evaluating your recommender
>>> from crab.metrics.classes import CfEvaluator
>>> evaluator = CfEvaluator()
Evaluating your recommender
>>> from crab.metrics.classes import CfEvaluator
>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse')
Evaluating your recommender
>>> from crab.metrics.classes import CfEvaluator
>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse')
   {'rmse': 0.69467177857026907}
Evaluating your recommender
>>> from crab.metrics.classes import CfEvaluator
>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse')
   {'rmse': 0.69467177857026907}
>>> evaluator.evaluate_on_split(recommender=recsys, at =2)
Evaluating your recommender
>>> from crab.metrics.classes import CfEvaluator
>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse')
   {'rmse': 0.69467177857026907}
>>> evaluator.evaluate_on_split(recommender=recsys, at =2)
    ({'error': [{'mae': 0.345, 'nmae': 0.4567, 'rmse': 0.568},
          {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788},
          {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}],
 'ir': [{'f1score': 0.456, 'precision': 0.78557, 'recall':0.55677},
   {'f1score': 0.64567, 'precision': 0.67865, 'recall': 0.785955},
  {'f1score': 0.45070, 'precision': 0.74744, 'recall': 0.858585}]},
           {'final_score': {'avg': {'f1score': 0.495955,
                            'mae': 0.429292,
                           'nmae': 0.373739,
                        'precision': 0.63932929,
                         'recall': 0.729939393,
                          'rmse': 0.3466868},
                  'stdev': {'f1score': 0.09938383 ,
                           'mae': 0.0593933,
                          'nmae': 0.03393939,
                        'precision': 0.0192929,
                         'recall': 0.031293939,
                        'rmse': 0.234949494}}})
Distributing the recommendation computations

Use Hadoop and Map-Reduce intensively
  Investigating the Yelp mrjob framework

Develop the Netflix and novel standard-of-the-art used
    Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines

The most commonly used is Slope One technique.
   Simple algebra math with slope one algebra y = a*x+b
Cache/Paralelism with joblib

 from joblib import Memory
 memory = Memory(cachedir=’’, verbose=0)

 class UserSimilarity(BaseSimilarity):

        def get_similarity(self, source_id, target_id):
             source_preferences = self.model.preferences_from_user(source_id)
             target_preferences = self.model.preferences_from_user(target_id)
              return self.distance(source_preferences, target_preferences) 
                  if not source_preferences.shape[1] == 0 
                      and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

          def get_similarities(self, source_id):
              return[(other_id, self.get_similarity(source_id, other_id))
                                for other_id, v in self.model]
Cache/Paralelism with joblib

    from joblib import Memory
    memory = Memory(cachedir=’’, verbose=0)

    class UserSimilarity(BaseSimilarity):

           def get_similarity(self, source_id, target_id):
                source_preferences = self.model.preferences_from_user(source_id)
                target_preferences = self.model.preferences_from_user(target_id)
                 return self.distance(source_preferences, target_preferences) 
                     if not source_preferences.shape[1] == 0 
                         and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

             def get_similarities(self, source_id):
                 return[(other_id, self.get_similarity(source_id, other_id))
                                   for other_id, v in self.model]

>>> #Without memory.cache
Cache/Paralelism with joblib

    from joblib import Memory
    memory = Memory(cachedir=’’, verbose=0)

    class UserSimilarity(BaseSimilarity):

           def get_similarity(self, source_id, target_id):
                source_preferences = self.model.preferences_from_user(source_id)
                target_preferences = self.model.preferences_from_user(target_id)
                 return self.distance(source_preferences, target_preferences) 
                     if not source_preferences.shape[1] == 0 
                         and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

             def get_similarities(self, source_id):
                 return[(other_id, self.get_similarity(source_id, other_id))
                                   for other_id, v in self.model]

>>> #Without memory.cache                     >>># With memory.cache
Cache/Paralelism with joblib

     from joblib import Memory
     memory = Memory(cachedir=’’, verbose=0)

      class UserSimilarity(BaseSimilarity):

            def get_similarity(self, source_id, target_id):
                 source_preferences = self.model.preferences_from_user(source_id)
                 target_preferences = self.model.preferences_from_user(target_id)
                  return self.distance(source_preferences, target_preferences) 
                      if not source_preferences.shape[1] == 0 
                          and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

              def get_similarities(self, source_id):
                  return[(other_id, self.get_similarity(source_id, other_id))
                                    for other_id, v in self.model]

>>> #Without memory.cache                      >>># With memory.cache
>>> timeit similarity.get_similarities
Cache/Paralelism with joblib

     from joblib import Memory
     memory = Memory(cachedir=’’, verbose=0)

      class UserSimilarity(BaseSimilarity):

            def get_similarity(self, source_id, target_id):
                 source_preferences = self.model.preferences_from_user(source_id)
                 target_preferences = self.model.preferences_from_user(target_id)
                  return self.distance(source_preferences, target_preferences) 
                      if not source_preferences.shape[1] == 0 
                          and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

              def get_similarities(self, source_id):
                  return[(other_id, self.get_similarity(source_id, other_id))
                                    for other_id, v in self.model]

>>> #Without memory.cache                      >>># With memory.cache
>>> timeit similarity.get_similarities          >>> timeit similarity.get_similarities
       (‘marcel_caraciolo’)                            (‘marcel_caraciolo’)
Cache/Paralelism with joblib

     from joblib import Memory
     memory = Memory(cachedir=’’, verbose=0)

      class UserSimilarity(BaseSimilarity):

            def get_similarity(self, source_id, target_id):
                 source_preferences = self.model.preferences_from_user(source_id)
                 target_preferences = self.model.preferences_from_user(target_id)
                  return self.distance(source_preferences, target_preferences) 
                      if not source_preferences.shape[1] == 0 
                          and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

              def get_similarities(self, source_id):
                  return[(other_id, self.get_similarity(source_id, other_id))
                                    for other_id, v in self.model]

>>> #Without memory.cache                       >>># With memory.cache
>>> timeit similarity.get_similarities           >>> timeit similarity.get_similarities
       (‘marcel_caraciolo’)                             (‘marcel_caraciolo’)
   100 loops, best of 3: 978 ms per loop
Cache/Paralelism with joblib

     from joblib import Memory
     memory = Memory(cachedir=’’, verbose=0)

      class UserSimilarity(BaseSimilarity):

            def get_similarity(self, source_id, target_id):
                 source_preferences = self.model.preferences_from_user(source_id)
                 target_preferences = self.model.preferences_from_user(target_id)
                  return self.distance(source_preferences, target_preferences) 
                      if not source_preferences.shape[1] == 0 
                          and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

              def get_similarities(self, source_id):
                  return[(other_id, self.get_similarity(source_id, other_id))
                                    for other_id, v in self.model]

>>> #Without memory.cache                       >>># With memory.cache
>>> timeit similarity.get_similarities           >>> timeit similarity.get_similarities
       (‘marcel_caraciolo’)                             (‘marcel_caraciolo’)
   100 loops, best of 3: 978 ms per loop             100 loops, best of 3: 434 ms per loop
Cache/Paralelism with joblib

 Investigate how to use multiprocessing and parallel packages with similarities

    from joblib import Parallel

    def get_similarities(self, source_id):
        return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity)
            (source_id, other_id)) for other_id, v in self.model)
Distributed Computing with mrJob
Distributed Computing with mrJob

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
                                 local (for testing)
Distributed Computing with mrJob

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
                                 local (for testing)
Distributed Computing with mrJob

                                                """The classic MapReduce job: count the frequency of words.
                                                from mrjob.job import MRJob
                                                import re

                                                WORD_RE = re.compile(r"[w']+")

                                                class MRWordFreqCount(MRJob):

                                                    def mapper(self, _, line):
                                                        for word in WORD_RE.findall(line):
                                                            yield (word.lower(), 1)

                                                    def reducer(self, word, counts):
                                                        yield (word, sum(counts))

                                                if __name__ == '__main__':

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
                                 local (for testing)
Distributed Computing with mrJob

Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce
Distributed Computing with mrJob

Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce
Future studies with Sparse Matrices
 Real datasets come with lots of empty values


       scipy.sparse package

       Sharding operations

       Matrix Factorization
        techniques (SVD)

                                                  Apontador Reviews Dataset
Future studies with Sparse Matrices
     Real datasets come with lots of empty values


          scipy.sparse package

          Sharding operations

          Matrix Factorization
           techniques (SVD)

  Crab implements a Matrix
Factorization with Expectation
   Maximization algorithm

                                                      Apontador Reviews Dataset
Future studies with Sparse Matrices
     Real datasets come with lots of empty values


          scipy.sparse package

          Sharding operations

          Matrix Factorization
           techniques (SVD)

  Crab implements a Matrix
Factorization with Expectation
   Maximization algorithm
      scikits.crab.svd package
                                                      Apontador Reviews Dataset
Optimizations with Cython

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

Optimizations with Cython

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.


from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

# for notes on compiler flags see:



cmdclass = {'build_ext': build_ext},

ext_modules = [Extension("spearman_correlation_cython",


Optimizations with Cython

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.


from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

# for notes on compiler flags see:



cmdclass = {'build_ext': build_ext},

ext_modules = [Extension("spearman_correlation_cython",



                                    Pure Python w/   Python w/ Scipy
                                         dicts         and Numpy
MovieLens 100k                         15.32 s           9.56 s

                                       Old Crab         New Crab

                                         Pure Python w/       Python w/ Scipy
                                              dicts             and Numpy
    MovieLens 100k                             15.32 s            9.56 s

                                               Old Crab           New Crab

Time ellapsed ( Recommend 5 items)

                                           0              4   8       12        16

                                         Pure Python w/       Python w/ Scipy
                                              dicts             and Numpy
    MovieLens 100k                             15.32 s            9.56 s

                                               Old Crab           New Crab

Time ellapsed ( Recommend 5 items)

                                           0              4   8       12        16

                                         Pure Python w/       Python w/ Scipy
                                              dicts             and Numpy
    MovieLens 100k                             15.32 s            9.56 s

                                               Old Crab           New Crab

Time ellapsed ( Recommend 5 items)

                                           0              4   8       12        16
Why migrate ?
Old Crab running only using Pure Python
     Recommendations demand heavy maths calculations and lots of processing

Compatible with Numpy and Scipy libraries
   High Standard and popular scientific libraries optimized for scientific calculations in Python

Scikits projects are amazing!
    Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

Turn the Crab framework visible for the community
 Join the scientific researchers and machine learning developers around the Globe coding with
                                 Python to help us in this project

                              Be Fast and Furious
Why migrate ?

Numpy optimized with PyPy

     2.x - 48.x Faster
How are we working ?
            Sprints, Online Discussions and Issues
How are we working ?
      Our Project’s Home Page
Future Releases
       Planned Release 0.1
   Collaborative Filtering Algorithms working, sample datasets to load and test

       Planned Release 0.11
                Sparse Matrixes and Database Models support

       Planned Release 0.12
                Slope One Agorithm, new factorization techniques implemented

Join us!

1. Read our Wiki Page

2. Check out our current sprints and open issues

3. Forks, Pull Requests mandatory
4. Join us at #muricoca or at our
                     discussion list
Recommended Books

Toby Segaran, Programming Collective   SatnamAlag, Collective Intelligence in
Intelligence, O'Reilly, 2007           Action, Manning Publications, 2009

   ACM RecSys, KDD , SBSC...
              A Python Framework for Building
                  Recommendation Engines


Marcel Caraciolo Ricardo Caspirro                            Bruno Melo
   @marcelcaraciolo           @ricardocaspirro                 @brunomelo

                      {marcel, ricardo,bruno}

Contenu connexe

Tendances (6)

Moose workshop
Moose workshopMoose workshop
Moose workshop
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
OO Perl with Moose
OO Perl with MooseOO Perl with Moose
OO Perl with Moose
Moose talk at FOSDEM 2011 (Perl devroom)
Moose talk at FOSDEM 2011 (Perl devroom)Moose talk at FOSDEM 2011 (Perl devroom)
Moose talk at FOSDEM 2011 (Perl devroom)
Writing and Sharing Great Modules with the Puppet Forge
Writing and Sharing Great Modules with the Puppet ForgeWriting and Sharing Great Modules with the Puppet Forge
Writing and Sharing Great Modules with the Puppet Forge
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax

En vedette

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit

En vedette (8)

Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib

Similaire à Crab: A Python Framework for Building Recommender Systems

Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)
Damien Seguy
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
Ignacio Martín
Machine Learning, Key to Your Classification Challenges
Machine Learning, Key to Your Classification ChallengesMachine Learning, Key to Your Classification Challenges
Machine Learning, Key to Your Classification Challenges
Marc Borowczak
Django’s nasal passage
Django’s nasal passageDjango’s nasal passage
Django’s nasal passage
Erik Rose
Socket applications
Socket applicationsSocket applications
Socket applications
João Moura
Automated release management - DevConFu 2014
Automated release management - DevConFu 2014Automated release management - DevConFu 2014
Automated release management - DevConFu 2014
Kristoffer Deinoff
Machine Learning with Apache Mahout
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache Mahout
Daniel Glauser

Similaire à Crab: A Python Framework for Building Recommender Systems (20)

Introduction to Crab - Python Framework for Building Recommender Systems
Introduction to Crab - Python Framework for Building Recommender SystemsIntroduction to Crab - Python Framework for Building Recommender Systems
Introduction to Crab - Python Framework for Building Recommender Systems
Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
Advanced Topics in Continuous Deployment
Advanced Topics in Continuous DeploymentAdvanced Topics in Continuous Deployment
Advanced Topics in Continuous Deployment
Semantic search for Earth Observation products
Semantic search for Earth Observation productsSemantic search for Earth Observation products
Semantic search for Earth Observation products
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
Machine Learning, Key to Your Classification Challenges
Machine Learning, Key to Your Classification ChallengesMachine Learning, Key to Your Classification Challenges
Machine Learning, Key to Your Classification Challenges
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
CoffeeScript Design Patterns
CoffeeScript Design PatternsCoffeeScript Design Patterns
CoffeeScript Design Patterns — presented at JSFoo 2016 — presented at JSFoo — presented at JSFoo 2016 — presented at JSFoo 2016
Django’s nasal passage
Django’s nasal passageDjango’s nasal passage
Django’s nasal passage
Socket applications
Socket applicationsSocket applications
Socket applications
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?
Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...
Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...
Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...
Automated release management - DevConFu 2014
Automated release management - DevConFu 2014Automated release management - DevConFu 2014
Automated release management - DevConFu 2014
Choosing JavaScript Libraries -
Choosing JavaScript Libraries - Refresh-Detroit.orgChoosing JavaScript Libraries -
Choosing JavaScript Libraries -
What's new in Puppet 3.0
What's new in Puppet 3.0What's new in Puppet 3.0
What's new in Puppet 3.0
Monkeybars in the Manor
Monkeybars in the ManorMonkeybars in the Manor
Monkeybars in the Manor
Machine Learning with Apache Mahout
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache Mahout

Plus de Marcel Caraciolo

Plus de Marcel Caraciolo (20)

Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com Python
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratório
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python Scripts
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?
Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programação
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduce
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no Brasil
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursos
Arquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursosArquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursos
PyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for FoursquarePyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for Foursquare


Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024

Crab: A Python Framework for Building Recommender Systems

  • 1. Crab A Python Framework for Building Recommendation Engines PythonBrasil 2011, São Paulo, SP Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo
  • 2. What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license
  • 3. When started ? It began one year ago Community-driven, 4 members Since April,2011 the open-source labs Muriçoca incorporated it Since April,2011 rewritting it as Scikit
  • 4. Knowing Scikits Scikits are Scipy Toolkits - independent and projects hosted under a common namespace. Scikits Image Scikits MlabWrap Scikits AudioLab Scikit Learn ....
  • 5. Knowing Scikits Scikit-Learn Machine Learning Algorithms + scientific Python packages (Numpy, Scipy and Matplotlib) Our goal: Incorporate the Crab as Scikit and incorporate some parts of them at Scikit-learn
  • 6. Why Recommendations ? The world is an over-crowded place !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#
  • 7. Why Recommendations * +,&-.$/).#&0#/"1.#$%234(".# ? $/)#5(&6 7&.2.#"$4,#)$8 We are overloaded * 93((3&/.#&0#:&'3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Thousands of news articles and blog posts each day * =/#>$/&3;#?#@A#+B#4,$//"(.;# 2,&-.$/).#&0#7%&6%$:.# Millions of movies, books and music tracks online "$4,#)$8 Several Places, Offers and Events * =/#C"1#D&%<;#."'"%$(# Even Friends sometimes we are overloaded ! 2,&-.$/).#&0#$)#:"..$6".# ."/2#2&#-.#7"%#)$8
  • 8. Why Recommendations ? We really need and consume only a few of them! “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long Tail
  • 9. Why Recommendations ? Can Google help ? Yes, but only when we really know what we are looking for But, what’s does it mean by “interesting” ? Can Facebook help ? Yes, I tend to find my friends’ stuffs interesting What if i had only few friends and what they like do not always attract me ? Can experts help ? Yes, but it won’t scale well. But it is what they like, not me! Exactly same advice!
  • 10. Why Recommendations ? Recommendation Systems Systems designed to recommend to me something I may like
  • 11. Why Recommendations ? !"#$%&"'$"'(')*#*+,) Recommendation Systems -+*#)+. -#/') 0#)1# ! 2' 23&4"+')1 5,6 7),*%'"&863 Graph Representation
  • 12. The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Factorization Matrix (SVD) Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall Charts
  • 13. The current Crab Precision-Recall Charts
  • 14. Collaborative Filtering O Vento Toy Thor Armagedon Items Levou Store like recommends Marcel Rafael Amanda Users Similar
  • 17. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies
  • 18. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies()
  • 19. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> data
  • 20. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> data {'DESCR': 'sample_movies data set was collected by the book called nProgramming the Collective Intelligence by Toby Segaran nnNotesn----- nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.',  'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},   2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},   3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},   4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},   5: {2: 4.5, 3: 1.0, 4: 4.0},   6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},   7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},  'item_ids': {1: 'Lady in the Water',   2: 'Snakes on a Planet',   3: 'You, Me and Dupree',   4: 'Superman Returns',   5: 'The Night Listener',   6: 'Just My Luck'},  'user_ids': {1: 'Jack Matthews',   2: 'Mick LaSalle',   3: 'Claudia Puig',   4: 'Lisa Rose',   5: 'Toby',   6: 'Gene Seymour',   7: 'Michael Phillips'}}
  • 22. The current Crab >>> from crab.models import MatrixPreferenceDataModel
  • 23. The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(
  • 24. The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel( >>> print m MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ... 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000
  • 26. The current Crab >>> #import pairwise distance
  • 27. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances
  • 28. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity
  • 29. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity
  • 30. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances)
  • 31. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1]
  • 32. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]
  • 33. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 (4, 0.34054242658316669), 1 3.000000 4.000000 3.500000 5.000000 3.000000 (3, 0.32037724101704074), 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 (7, 0.32037724101704074), 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- (2, 0.2857142857142857), 6 3.000000 3.500000 3.500000 5.000000 3.000000 (5, 0.2674788903885893)] 7 2.500000 3.000000 --- 3.500000 4.000000
  • 35. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender
  • 36. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)
  • 37. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]])
  • 38. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ],        [ 6. , 3. ],        [ 7. , 2.5],        [ 4. , 2.5]])
  • 39. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ], MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ...        [ 6. , 3. ], 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000        [ 7. , 2.5], 3 --- 3.500000 2.500000 4.000000 4.500000        [ 4. , 2.5]]) 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000
  • 40. The current Crab Using REST APIs to deploy the recommender django-piston, django-rest, django-tastypie
  • 41. Crab is already in production News from Abril Publisher recommendations! Collecting over 10 magazines, 20 books and 100+ articles Running on Python + Scipy + Django Content-Based-Filtering Easy-to-use interface Still in development
  • 42. Content Based Filtering Similar Duro de O Vento Toy Armagedon Items Matar Levou Store recommend likes Marcel Users
  • 43. Crab is already in production PythonBrasil keynotes Recommender Recommending keynotes based on a hybrid approach Running on Python + Scipy + Django Content-Based-Filtering + Collaborative Filtering Schedule your keynotes Still in development
  • 44. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is, aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results the reviews from similar users. We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender achieved showed to be very promising [19]. approach. By combining the content-based filtering and the uct recommendation approach in a mobile application so the A. Crab is already in production users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues repositories which D ESIGN recommendation to the user, since the current approaches just the services to be recommended, and the review repository Application data information our mobile recommender sys- that contains the user opinions about those services. All this for only deliver product recommendations with a overall score without pointing out the appropriateness of such recommen- datatembecan be from data source containers in the web product description can extracted divided into two parts: the rec dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user as the as location, description and [17] as Hybrid Meta Approach gives the system’s architecture and suppliers, the system will deliver the explanation, providing relevant reviews of similar users, we believe that it will tags, etc.). The Figure 3 increase the confidence in the buying decision process and the displayed at the Figure 2 and the location recommendation engine from Google: Google HotPot [18]. by user (such as rating, comments, reviews or ratings provided mo wh product accepptance rate. In the mobile context this approach po could help the users in this process and showing the user relative components. thi opinions could contribute to achieve this task. rec spe !"#$"%&'$ 5&-$ !"#$%&'%($) !".,"/#) acc !"*+#,$+'-) !"*+#,$+'-) +,-*.&$ !(#$()&'*&%$ /01&'234&$ !6#$6,00&41&7$ wh res !<#$<'&2&'&04&%A$B,431*,0A$&14C$ ves 0+44%6+'%$,.")1%#"2) 0+($"($)1%#"2) 3,4$"',(5) ou 3,4$"',(5) )))67,8,#%)+,4%$91$'%4)-1":)))) suc !"#$%&"'()*+,#&-,.) /$%,0"12()*3$4%)3""5.) ))))1,;&,<4)<1&%%,')=2)4&:&8$1)) )))))))))))%$4%,5)94,14>?) <',7)41$ pro 8&=,%*1,'>$ exp 8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,'&7$<',7)41%$ !(#$()&'*&%$ ma 8&?*&@$ we Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,'>$ com 7"$%) !"8+99"(2"')) !8#$830E&7$<',7)41%$ The content-based filtering approach will be used to filter ext the product/service repository, while the collaborative based 8&%).1%$ B. approach will derive the product review recommendations. In addition we will use text mining techniques to distinct the !"8+99"(2%$,+(#) polarity of the user review between positive or negative one. This information summarized would contribute in the product Architecture Fig. 3. Mobile Recommender System rat score recommendation computation. The final product recom- Fig. 1. Meta Recommender Architecture mendation score is computed by integrating the result of both me recommenders. By now, weproduct/service recommender, the user could In our mobile are considering to use different and Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen- different data sources of user opinions and descriptions, we filter some products or services at special oth is the symbolic data analysis approach (SDA) [19], which have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his product description and also can enter modeled ow a meta recommender architecture, the system would provide a personalized control over the generated recommendation list feedback to some offered product recommendation. as set of modal symbolic descriptions that summarizes the Re information provided by the corresponding data sources. It is
  • 45. Crab is already in production Brazilian Social Network called Educational network with more than 60.000 students and 120 video-classes Running on Python + Numpy + Scipy and Django Backend for Recommendations MongoDB - mongoengine Daily Recommendations with Explanations
  • 46. Evaluating your recommender Crab implements the most used recommender metrics. Precision, Recall, F1-Score, RMSE Using matplotlib for a plotter utility Implement new metrics Simulations support maybe (??)
  • 48. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator
  • 49. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator()
  • 50. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse')
  • 51. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') {'rmse': 0.69467177857026907}
  • 52. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') {'rmse': 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2)
  • 53. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') {'rmse': 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2) ({'error': [{'mae': 0.345, 'nmae': 0.4567, 'rmse': 0.568}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}], 'ir': [{'f1score': 0.456, 'precision': 0.78557, 'recall':0.55677}, {'f1score': 0.64567, 'precision': 0.67865, 'recall': 0.785955}, {'f1score': 0.45070, 'precision': 0.74744, 'recall': 0.858585}]}, {'final_score': {'avg': {'f1score': 0.495955, 'mae': 0.429292, 'nmae': 0.373739, 'precision': 0.63932929, 'recall': 0.729939393, 'rmse': 0.3466868}, 'stdev': {'f1score': 0.09938383 , 'mae': 0.0593933, 'nmae': 0.03393939, 'precision': 0.0192929, 'recall': 0.031293939, 'rmse': 0.234949494}}})
  • 54. Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. Simple algebra math with slope one algebra y = a*x+b
  • 55. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]
  • 56. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache
  • 57. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache
  • 58. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities (‘marcel_caraciolo’)
  • 59. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’)
  • 60. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop
  • 61. Cache/Paralelism with joblib from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loop
  • 62. Cache/Paralelism with joblib Investigate how to use multiprocessing and parallel packages with similarities computation from joblib import Parallel ... def get_similarities(self, source_id):         return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity) (source_id, other_id)) for other_id, v in self.model)
  • 63. Distributed Computing with mrJob
  • 64. Distributed Computing with mrJob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)
  • 65. Distributed Computing with mrJob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)
  • 66. Distributed Computing with mrJob """The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import re WORD_RE = re.compile(r"[w']+") class MRWordFreqCount(MRJob):     def mapper(self, _, line):         for word in WORD_RE.findall(line):             yield (word.lower(), 1)     def reducer(self, word, counts):         yield (word, sum(counts)) if __name__ == '__main__': It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)
  • 67. Distributed Computing with mrJob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce
  • 68. Distributed Computing with mrJob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce
  • 69. Future studies with Sparse Matrices Real datasets come with lots of empty values Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Apontador Reviews Dataset
  • 70. Future studies with Sparse Matrices Real datasets come with lots of empty values Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm Apontador Reviews Dataset
  • 71. Future studies with Sparse Matrices Real datasets come with lots of empty values Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm scikits.crab.svd package Apontador Reviews Dataset
  • 72. Optimizations with Cython Cython is a Python extension that lets developers annotate functions so they can be compiled to C.
  • 73. Optimizations with Cython Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] )
  • 74. Optimizations with Cython Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] )
  • 75. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s Old Crab New Crab
  • 76. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16
  • 77. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16
  • 78. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16
  • 79. Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and Furious
  • 80. Why migrate ? Numpy optimized with PyPy 2.x - 48.x Faster
  • 81. How are we working ? Sprints, Online Discussions and Issues
  • 82. How are we working ? Our Project’s Home Page
  • 83. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Sparse Matrixes and Database Models support Planned Release 0.12 Slope One Agorithm, new factorization techniques implemented ....
  • 84. Join us! 1. Read our Wiki Page 2. Check out our current sprints and open issues 3. Forks, Pull Requests mandatory 4. Join us at #muricoca or at our discussion list
  • 85. Recommended Books Toby Segaran, Programming Collective SatnamAlag, Collective Intelligence in Intelligence, O'Reilly, 2007 Action, Manning Publications, 2009 ACM RecSys, KDD , SBSC...
  • 86. Crab A Python Framework for Building Recommendation Engines Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo {marcel, ricardo,bruno}