SlideShare a Scribd company logo
1 of 56
Download to read offline
Mahout becomes
   a researcher




           Kris Jack, PhD
Senior Data Mining Engineer
Overview

➔
    What's Mendeley?

➔
    Applications of Mahout's Recommender

➔
    Under Mahout's Bonnet

➔
    Mahout's Research Career so Far

➔
    Conclusions
What's Mendeley?
➔
    Mendeley is a data platform for researchers
    ➔
        We're bringing together researchers and the research
        that they produce from all over the world

    ➔
        We're structuring this data in a machine readable format

    ➔
        We're opening this data up for you to build applications
        on top of it using our API

    ➔
        These applications help researchers to do even better
        research and become more productive

➔
    How are we building our community?
Mendeley provides tools to help users...


...organise
their research

                                              ➔
                                               Reference
                                              management

                                              ➔
                                               Cite-as-you-
                                              write

                                              ➔
                                                Full-text
                                              article search

                                              ➔
                                               Digitalised
                                              annotations
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise
their research

                                        ➔
                                            Research network

                                        ➔
                                          Professional
                                        research groups
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                                ...discover new
their research                                    research

                                       ➔
                                           Mendeley Suggest

                                       ➔
                                         Personalised article
                                       recommendations

                                       ➔
                                         Weekly batch of 10
                                       recommended articles

                                       ➔
                                           Collaborative Filtering

                                       ➔
                                        The more data, the
                                       better
1.5 million+ users; the 20 largest user bases:
                            University of Cambridge
                                 Stanford University
                                                   MIT
                                 University of Michigan
                                       Harvard University
                                       University of Oxford
                                      Sao Paulo University
                                    Imperial College London
                                      University of Edinburgh
                                            Cornell University
                              University of California at Berkeley
                                                      RWTH Aachen
                                               Columbia University
                                                           Georgia Tech
                                               University of Wisconsin
                                                            UC San Diego
                                              University of California at LA
                                                        University of Florida

50m research articles                              University of North Carolina
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research



            We need a recommender
           that scales up, coping with
           our data and future growth
Applications of Mahout's
          Recommender
Mahout use cases:
                          ➔
                              Retrieve related items in
                              large collections




http://www.slideshare.net/kryton/the-data-layer
Mahout use cases:
                          ➔
                              Retrieve related items in
                              large collections

                          ➔
                              Discover relevant items that
                              you may have overlooked




http://engineering.foursquare.com/2011/03/22/build
ing-a-recommendation-engine-foursquare-style/
Mahout use cases:
                               ➔
                                   Retrieve related items in
                                   large collections

                               ➔
                                   Discover relevant items that
                                   you may have overlooked

                               ➔
                                   Find love!
                                   ➔
                                       Mahout implements collaborative
                                       filtering, a surprisingly powerful
                                       algorithm




http://www.speeddate.com/apps/site/views/mp/technology.php
Mahout use cases:
                                  ➔
                                      Retrieve related items in
                                      large collections

                                  ➔
                                      Discover relevant items that
                                      you may have overlooked

                                  ➔
                                      Find love!
                                      ➔
                                          Mahout implements collaborative
                                          filtering, a surprisingly powerful
                                          algorithm

                                  ➔
                                      Mendeley Suggest
                                      ➔
                                          Discover new research
                                      ➔
                                          Fill in gaps in your library
                                      ➔
                                          Your personal advisor

http://krisjack.blogspot.co.uk/2012/02/your-very-own-
personalised-research.html
Under Mahout's
       Bonnet
Generating recommendations
through matrix multiplication

                                                          This is item-based
                                                          recommendations as
                                                          similarity is based on
                                                          items, not users




Not convinced? Try reading these...
 Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender
 systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions
 on Knowledge and Data Engineering, 17(6), 734-749. Piscataway, NJ, USA.

 http://www.slideshare.net/srowen/collaborative-filtering-at-scale-2
 http://krisjack.blogspot.co.uk/2012/04/under-bonnet-of-mahouts-item-based.html
Researchers
                                      Turing Babbage Einstein   Newton




                    Comp Sci 1
Research Articles



                    Comp Sci 2



                      Physics 1



                      Physics 2



                                  Input (all user preferences)
Researchers
                                      Turing Babbage Einstein   Newton
                                                                         1.5M



                    Comp Sci 1
Research Articles



                    Comp Sci 2



                      Physics 1



                      Physics 2
                                                                          300M
                                                                          prefs

                                   50M

                                  Input (all user preferences)
Researchers




                               Research
                               Articles
item.RecommenderJob
 1. Prep. pref. matrix (1-3)
 2. Gen. sim. matrix (4-6)
 3. Multiply matrices (7-10)              All User Preferences
                                              (item x user)
Researchers




                                   Research
                                   Articles
item.RecommenderJob
 1. Prep. pref. matrix (1-3)
 2. Gen. sim. matrix (4-6)
 3. Multiply matrices (7-10)                  All User Preferences
                                                  (item x user)




                               Research       Turing
                               Articles




                               A User's Preferences
                                  (item x user)
Researchers




                                    Research
                                    Articles
item.RecommenderJob
  1. Prep. pref. matrix (1-3)
  2. Gen. sim. matrix (4-6)
  3. Multiply matrices (7-10)                  All User Preferences
                                                   (item x user)


                Research
                Articles                       Turing


            2   1    0     0
                                Research
Research




                     0     0
                                Articles


            1   1
Articles




            0   0    2     2
            0   0    2     2
           Item Similarity      A User's Preferences
            (item x item)          (item x user)
Researchers




                                                                          Research
                                                                          Articles
                                          Research Articles
                                  Comp Sci 1         Physics 1
                                           Comp Sci 2         Physics 2
                                                                                     Input (all user
                                                                                     preferences)



                    Comp Sci 1       2        1         0        0
Research Articles




                    Comp Sci 2       1        1         0        0
                      Physics 1
                                     0         0        2        2
                      Physics 2
                                     0         0        2        2
Researchers




                                       Research
                                       Articles
item.RecommenderJob
  1. Prep. pref. matrix (1-3)
  2. Gen. sim. matrix (4-6)
  3. Multiply matrices (7-10)                     All User Preferences
                                                      (item x user)


                Research
                Articles                          Turing                       Turing


            2   1    0     0
                                   Research




                                                                    Research
Research




                     0     0
                                   Articles




                                                                    Articles
            1   1
Articles




            0   0    2     2   X                             =
            0   0    2     2
           Item Similarity         A User's Preferences               Recommendations
            (item x item)             (item x user)                     (item x user)
Running on Amazon's Elastic Map Reduce




                On demand use and easy to cost
Mahout's Research
    Career so Far
Mendeley Suggest
Mahout's
Normalised Amazon Hours          Performance




                          No. Good Recommendations/10
Mahout's
               Costly & Bad
Normalised Amazon Hours              Performance            Costly & Good




           Cheap & Bad        No. Good Recommendations/10   Cheap & Good
Mahout's
               Costly & Bad
Normalised Amazon Hours              Performance            Costly & Good




           Cheap & Bad        No. Good Recommendations/10   Cheap & Good
Mahout's
               Costly & Bad
Normalised Amazon Hours              Performance            Costly & Good




           Cheap & Bad        No. Good Recommendations/10   Cheap & Good
Mahout's
               Costly & Bad        Performance           Costly & Good
                          7K
Normalised Amazon Hours


                          6K

                          5K

                          4K

                          3K

                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5         3
           Cheap & Bad   No. Good Recommendations/10     Cheap & Good
Mahout's
               Costly & Bad          Performance         Costly & Good
                          7K
                                   6.5K, 1.5
Normalised Amazon Hours


                          6K       Orig. item-based


                          5K

                          4K

                          3K

                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5         3
           Cheap & Bad   No. Good Recommendations/10     Cheap & Good
Mahout's
               Costly & Bad              Performance      Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5          3
           Cheap & Bad   No. Good Recommendations/10      Cheap & Good
Mahout's
               Costly & Bad              Performance              Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K
                                                          -4.1K
                                                          (63%)
                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5                  3
           Cheap & Bad   No. Good Recommendations/10              Cheap & Good
Reducing processing time and cost

➔
    Mahout's recommender is already efficient
    ➔
        but your data may have unusual properties
➔
    We got improvements by:
    ➔
        tuning Hadoop's mapper and reducer allocation over the 10
        steps in the RecommenderJob
    ➔
        using an appropriate partitioner
Task Allocation              37 hours to complete




    1 reducer allocated, despite having 48 available...
Task Allocation

Allocating more reducers on a per job basis

                job.getConfiguration().setInt(
                    "mapred.reduce.tasks",
                    numMappers);



Allocating more mappers on a per job basis

                job.getConfiguration().set(
                    "mapred.max.split.size",
                    String.valueOf(splitSize));
Task Allocation   37 hours to complete
                      14 hours




                      From 1 → 40
                      reducers
Partitioners   14 hours to complete
Partitioners   14 hours to complete

                                      ~50KB




                            ~500MB
InputSampler.Sampler<IntWritable, Text> sampler =
      new InputSampler.RandomSampler<IntWritable, Text>(...);
  InputSampler.writePartitionFile(conf, sampler);
  conf.setPartitionerClass(TotalOrderPartitioner.class);




http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-
series-issue-2-getting-started-with-customized-partitioning/
Partitioners        14 hours to complete
                   2 hours




               Evenly
               distributed
Mahout's
               Costly & Bad              Performance              Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K
                                                          -4.1K
                                                          (63%)
                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5                  3
           Cheap & Bad   No. Good Recommendations/10              Cheap & Good
Researchers




                                       Research
                                       Articles
item.RecommenderJob
  1. Prep. pref. matrix (1-3)
  2. Gen. sim. matrix (4-6)
  3. Multiply matrices (7-10)                     All User Preferences
                                                      (item x user)


                Research
                Articles                          Turing                       Turing


            2   1    0     0
                                   Research




                                                                    Research
Research




                     0     0
                                   Articles




                                                                    Articles
            1   1
Articles




            0   0    2     2   X                             =
            0   0    2     2
           Item Similarity         A User's Preferences               Recommendations
            (item x item)             (item x user)                     (item x user)
Researchers


   user




                                         Research
                                         Articles
   item.RecommenderJob
      1. Prep. pref. matrix (1-3)
      2. Gen. sim. matrix (4-6)
      3. Multiply matrices (7-10)                   All User Preferences
                                                        (item x user)

                Researchers
                  Research
                  Articles                          Turing                       Turing


               2   1    0   0
Researchers




                                     Research




                                                                      Research
  Research




                        0   0
                                     Articles




                                                                      Articles
               1   1
  Articles




               0   0    2   2   X                              =
               0   0    2   2
              Item Similarity        A User's Preferences               Recommendations
               (item x item)            (item x user)                     (item x user)
     User Similarity (user x user)
Mahout's
               Costly & Bad              Performance                        Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                              Orig. user-based
                          1K
                                                          ➔
                                                              1K, 2.5


                           0
                       0.5     0
                               1      1.5   2      2.5                            3
           Cheap & Bad   No. Good Recommendations/10                         Cheap & Good
Mahout's
               Costly & Bad              Performance                        Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                                          +1 (67%)
                                   ➔
                                       2.4K, 1.5
                          2K              -1.4K
                                                              Orig. user-based
                                          (58%)
                          1K
                                                          ➔
                                                              1K, 2.5


                           0
                       0.5     0
                               1      1.5   2      2.5                            3
           Cheap & Bad   No. Good Recommendations/10                         Cheap & Good
Mahout's
               Costly & Bad              Performance                      Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                            Orig. user-based
                          1K
                                                          ➔
                                                            1K, 2.5
                                                            Cust. user-based
                                                          ➔
                                                            0.3K, 2.5
                           0
                       0.5     0
                               1      1.5   2      2.5                          3
           Cheap & Bad   No. Good Recommendations/10                       Cheap & Good
Mahout's
               Costly & Bad              Performance                   Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K
                                                          -4.1K
                                                          (63%)
                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                         Orig. user-based
                          1K                             1K, 2.5
                                                           ➔


                                                  -0.7K  Cust. user-based
                                                  (70%) ➔0.3K, 2.5
                           0
                       0.5     0
                               1      1.5   2      2.5                       3
           Cheap & Bad   No. Good Recommendations/10                    Cheap & Good
Mahout's
               Costly & Bad              Performance                      Costly & Good
                          7K                              +1 (67%)
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K
                                                                     -6.2K
                                                                     (95%)
                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                            Orig. user-based
                          1K
                                                          ➔
                                                            1K, 2.5
                                                            Cust. user-based
                                                          ➔
                                                            0.3K, 2.5
                           0
                       0.5     0
                               1      1.5   2      2.5                          3
           Cheap & Bad   No. Good Recommendations/10                       Cheap & Good
Conclusions
Conclusions
➔
    Mahout is doing a great job of powering Mendeley Suggest
    ➔
        Large scale data set
    ➔
        Excellent for batch processing requirements
➔
 We'll soon be feeding our user-based implementation into
Mahout
    ➔
        User-based can outperform item-based
    ➔
        Makes Mahout's offering more rounded
➔
    Save resources and money by understanding your data
    ➔
        Help Hadoop with task allocation if necessary
    ➔
        Paritition your data appropriately
We're Hiring!
➔
    Hadoop Data Architect
    ➔
        design a coherent data model across the company
    ➔
        take ownership of our data
    ➔
        hands on Hadoop administration
➔
    Marie Curie Senior Research Fellow
    ➔
        ensure that Mendeley’s research catalogue is of high quality
    ➔
        research and development opportunity
➔
    £500 Finder's Fee if you find someone who we hire
➔
    http://www.mendeley.com/careers/
www.mendeley.com

More Related Content

Similar to Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011Lee Dirks
 
Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationChris Clarke
 
Teaching with Technology Institute Training
Teaching with Technology Institute TrainingTeaching with Technology Institute Training
Teaching with Technology Institute TrainingEmily Puckett Rodgers
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pkuguest8ed46d
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pkuwiser pku
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
Effective Literature Searching 2011
Effective Literature Searching 2011Effective Literature Searching 2011
Effective Literature Searching 2011Middlesex University
 
Prizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language TeachingPrizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language TeachingAlannah Fitzgerald
 
Towards a Cloud Library
Towards a Cloud LibraryTowards a Cloud Library
Towards a Cloud LibraryRachel Frick
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Guus van den Brekel
 
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
P2Pvalue Directory:  A collaborative resource to map common-based peer produc...P2Pvalue Directory:  A collaborative resource to map common-based peer produc...
P2Pvalue Directory: A collaborative resource to map common-based peer produc...P2Pvalue
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Jeanne Kitchens
 
21stcenturye learningslideshare
21stcenturye learningslideshare21stcenturye learningslideshare
21stcenturye learningslidesharetsimatsima
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling
 
2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzz2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzzf kersten
 

Similar to Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley (20)

Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
 
Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource Recommendation
 
Teaching with Technology Institute Training
Teaching with Technology Institute TrainingTeaching with Technology Institute Training
Teaching with Technology Institute Training
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pku
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pku
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Libraries meet research 2.0
Libraries meet research 2.0Libraries meet research 2.0
Libraries meet research 2.0
 
Effective Literature Searching 2011
Effective Literature Searching 2011Effective Literature Searching 2011
Effective Literature Searching 2011
 
Prizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language TeachingPrizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language Teaching
 
Towards a Cloud Library
Towards a Cloud LibraryTowards a Cloud Library
Towards a Cloud Library
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
P2Pvalue Directory:  A collaborative resource to map common-based peer produc...P2Pvalue Directory:  A collaborative resource to map common-based peer produc...
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012
 
21stcenturye learningslideshare
21stcenturye learningslideshare21stcenturye learningslideshare
21stcenturye learningslideshare
 
University 2.0
University 2.0University 2.0
University 2.0
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzz2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzz
 
Shearer "Next Generation Repositories: Developing a Distributed Architecture ...
Shearer "Next Generation Repositories: Developing a Distributed Architecture ...Shearer "Next Generation Repositories: Developing a Distributed Architecture ...
Shearer "Next Generation Repositories: Developing a Distributed Architecture ...
 
To Wiki or Not to Wiki
To Wiki or Not to WikiTo Wiki or Not to Wiki
To Wiki or Not to Wiki
 

More from Kris Jack

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureKris Jack
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 

More from Kris Jack (14)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 

Recently uploaded

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

  • 1. Mahout becomes a researcher Kris Jack, PhD Senior Data Mining Engineer
  • 2. Overview ➔ What's Mendeley? ➔ Applications of Mahout's Recommender ➔ Under Mahout's Bonnet ➔ Mahout's Research Career so Far ➔ Conclusions
  • 4. Mendeley is a data platform for researchers ➔ We're bringing together researchers and the research that they produce from all over the world ➔ We're structuring this data in a machine readable format ➔ We're opening this data up for you to build applications on top of it using our API ➔ These applications help researchers to do even better research and become more productive ➔ How are we building our community?
  • 5. Mendeley provides tools to help users... ...organise their research ➔ Reference management ➔ Cite-as-you- write ➔ Full-text article search ➔ Digitalised annotations
  • 6. Mendeley provides tools to help users... ...collaborate with one another ...organise their research ➔ Research network ➔ Professional research groups
  • 7. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research ➔ Mendeley Suggest ➔ Personalised article recommendations ➔ Weekly batch of 10 recommended articles ➔ Collaborative Filtering ➔ The more data, the better
  • 8. 1.5 million+ users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida 50m research articles University of North Carolina
  • 9. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research We need a recommender that scales up, coping with our data and future growth
  • 11.
  • 12.
  • 13. Mahout use cases: ➔ Retrieve related items in large collections http://www.slideshare.net/kryton/the-data-layer
  • 14. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked http://engineering.foursquare.com/2011/03/22/build ing-a-recommendation-engine-foursquare-style/
  • 15. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked ➔ Find love! ➔ Mahout implements collaborative filtering, a surprisingly powerful algorithm http://www.speeddate.com/apps/site/views/mp/technology.php
  • 16. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked ➔ Find love! ➔ Mahout implements collaborative filtering, a surprisingly powerful algorithm ➔ Mendeley Suggest ➔ Discover new research ➔ Fill in gaps in your library ➔ Your personal advisor http://krisjack.blogspot.co.uk/2012/02/your-very-own- personalised-research.html
  • 17. Under Mahout's Bonnet
  • 18. Generating recommendations through matrix multiplication This is item-based recommendations as similarity is based on items, not users Not convinced? Try reading these... Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749. Piscataway, NJ, USA. http://www.slideshare.net/srowen/collaborative-filtering-at-scale-2 http://krisjack.blogspot.co.uk/2012/04/under-bonnet-of-mahouts-item-based.html
  • 19. Researchers Turing Babbage Einstein Newton Comp Sci 1 Research Articles Comp Sci 2 Physics 1 Physics 2 Input (all user preferences)
  • 20. Researchers Turing Babbage Einstein Newton 1.5M Comp Sci 1 Research Articles Comp Sci 2 Physics 1 Physics 2 300M prefs 50M Input (all user preferences)
  • 21. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user)
  • 22. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Turing Articles A User's Preferences (item x user)
  • 23. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing 2 1 0 0 Research Research 0 0 Articles 1 1 Articles 0 0 2 2 0 0 2 2 Item Similarity A User's Preferences (item x item) (item x user)
  • 24. Researchers Research Articles Research Articles Comp Sci 1 Physics 1 Comp Sci 2 Physics 2 Input (all user preferences) Comp Sci 1 2 1 0 0 Research Articles Comp Sci 2 1 1 0 0 Physics 1 0 0 2 2 Physics 2 0 0 2 2
  • 25. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing Turing 2 1 0 0 Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A User's Preferences Recommendations (item x item) (item x user) (item x user)
  • 26. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  • 27. Mahout's Research Career so Far
  • 29. Mahout's Normalised Amazon Hours Performance No. Good Recommendations/10
  • 30. Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 31. Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 32. Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 33. Mahout's Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 34. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 35. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 36. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 37. Reducing processing time and cost ➔ Mahout's recommender is already efficient ➔ but your data may have unusual properties ➔ We got improvements by: ➔ tuning Hadoop's mapper and reducer allocation over the 10 steps in the RecommenderJob ➔ using an appropriate partitioner
  • 38. Task Allocation 37 hours to complete 1 reducer allocated, despite having 48 available...
  • 39. Task Allocation Allocating more reducers on a per job basis job.getConfiguration().setInt( "mapred.reduce.tasks", numMappers); Allocating more mappers on a per job basis job.getConfiguration().set( "mapred.max.split.size", String.valueOf(splitSize));
  • 40. Task Allocation 37 hours to complete 14 hours From 1 → 40 reducers
  • 41. Partitioners 14 hours to complete
  • 42. Partitioners 14 hours to complete ~50KB ~500MB
  • 43. InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(...); InputSampler.writePartitionFile(conf, sampler); conf.setPartitionerClass(TotalOrderPartitioner.class); http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial- series-issue-2-getting-started-with-customized-partitioning/
  • 44. Partitioners 14 hours to complete 2 hours Evenly distributed
  • 45. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 46. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing Turing 2 1 0 0 Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A User's Preferences Recommendations (item x item) (item x user) (item x user)
  • 47. Researchers user Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Researchers Research Articles Turing Turing 2 1 0 0 Researchers Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A User's Preferences Recommendations (item x item) (item x user) (item x user) User Similarity (user x user)
  • 48. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 49. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 50. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 51. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 52. Mahout's Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 54. Conclusions ➔ Mahout is doing a great job of powering Mendeley Suggest ➔ Large scale data set ➔ Excellent for batch processing requirements ➔ We'll soon be feeding our user-based implementation into Mahout ➔ User-based can outperform item-based ➔ Makes Mahout's offering more rounded ➔ Save resources and money by understanding your data ➔ Help Hadoop with task allocation if necessary ➔ Paritition your data appropriately
  • 55. We're Hiring! ➔ Hadoop Data Architect ➔ design a coherent data model across the company ➔ take ownership of our data ➔ hands on Hadoop administration ➔ Marie Curie Senior Research Fellow ➔ ensure that Mendeley’s research catalogue is of high quality ➔ research and development opportunity ➔ £500 Finder's Fee if you find someone who we hire ➔ http://www.mendeley.com/careers/