SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Slope One Recommender
             on Hadoop
                     YONG ZHENG
         Center for Web Intelligence
                  DePaul University
                      Nov 15, 2012
Overview
• Introduction

• Recommender Systems & Slope One Recommender

• Distributed Slope One on Mahout and Hadoop

• Experimental Setup and Analyses

• Drive Mahout on Hadoop

• Interesting Communities




                            Center for Web Intelligence, DePaul University, USA
Introduction
• About Me: a recommendation guy

• My Research: data mining and recommender systems

• Typical Experimental Research

   1)   Design or improve an algorithm;
   2)   Run algorithms and baseline algs on datasets;
   3)   Compare experimental results;
   4)   Try different parameters, find reasons and even re-design
        and improve algorithm itself;
   5)   Run algorithms and baseline algs on datasets;
   6)   Compare experimental results;
   7)   Try different parameters, find reasons and even re-design
        and improve algorithm itself;
   8)   And so on… Until it approaches expected results.
Introduction
• Sometimes, data is large-scale.
  e.g. one algorithm may spend days to complete, how
  about experimental results are not as expected. Then
  improve algorithms and run it for days again, and again.

  How can we do previously? (for tasks not that complicated)
  1). Paralleling but complicated synchronization and limited
      resources, such as CPU, memory, etc;
  2). Take advantage of PC Labs, let’s do it with 10 PCs




• Nearly all research will ultimately face the large-scale
  problems , especially in the domain of data mining.

• But, we have Map-Reduce NOW!
Introduction



• Do not need to distribute data and tasks manually.
  Instead we just simply generate configurations.
• Do not need to care about more details, e.g. how data is
  distributed, when one specific task will be ran on which
  machine, or how they conduct tasks one by one.
• Instead, we can pre-define working flow. We can take
  advantage of the functional contributions from mappers
  and reducers.
• More benefits: replication, balancing, robustness, etc
Recommender Systems

• Collaborative Filtering

• Slope One and Simple Weighted Slope One

• Slope One in Mahout

• Distributed Slope One in Mahout

• Mappers and Reducers




                            Center for Web Intelligence, DePaul University, USA
Recommender Systems
Collaborative Filtering (CF)
One of most popular recommendation algorithms.
 User-based: User-CF
 Item-based: Item-CF, Slope One


                          User 5
          Rating?
                                 5

                             4
             4
        4 star
                             5



 Example: User-based Collaborative Filtering
Slope One Recommender
Reference: Daniel Lemire, Anna Maclachlan, Slope One Predictors for
Online Rating-Based Collaborative Filtering, In SIAM Data Mining
(SDM'05), April 21-23, 2005. http://lemire.me/fr/abstracts/SDM2005.html

            User                 Batman              Spiderman
             U1                     3                      4
             U2                     2                      4
             U3                     2                      ?

1). How different two movies were rated?
U1 rated Spiderman higher by (4-3) = 1
U2 rated Spiderman higher by (4-2) = 2
On average, Spiderman is rated (1+2)/2 = 1.5 higher

2). Rating difference can tell predictions
If we know U3 gave Batman a 2-star, probably he will rated
Spiderman by (2+1.5) = 3.5 star
Simple Weighted Slope One
Usually user rated multiple items
        User        HarryPotter       Batman       Spiderman
         U1              5               3              4
         U2              ?               2              4
         U3              4               2              ?

1). How different the two movies were rated?
Diff(Batman, Spiderman) = [(4-3)+(4-2)]/2 = 1.5
Diff(HarryPotter, Spiderman) = (4-5)/1 = -1
“2” and “1” here we call them as “count”.

2). Weighted rating difference can tell predictions
We use a simple weighted approach
Refer to Batman only, rating = 2+1.5 = 3.5
Refer to HarryPotter only, rating = 4-1 = 3
Consider them all, predicted rating = (3.5*2 + 3*1])/ (2+1) = 3.33
Simple Weighted Slope One
      User      HarryPotter       Batman        Spiderman
       u1               5            3             4
       u2               ?            2             4
       u3               4            2             ?
            Question: Online or Offline?
To calculate the prediction ratings, we need 2 matrices:
1).Difference Matrix
               Movie1       Movie2       Movie3        Movie4
     Movie1
     Movie2      -1.5
     Movie3       2           1
     Movie4      -1           0.5          -2

2). Count Matrix
Just number of users co-rated on two items
Slope One in Mahout
Mahout, an open-source machine learning library.

1). Recommendation algorithms
   User-based CF, Item-based CF, Slope One, etc

2). Clustering
   KMeans, Fuzzy KMeans, etc

3). Classification
   Decision Trees, Naive Bayes, SVM, etc

4). Latent Factor Models
   LDA, SVD, Matrix Factorization, etc
Slope One in Mahout
org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender
Pre-Processing Stage: (class MemoryDiffStorage with Map)
for every item i
   for every other item j
     for every user u expressing preference for both i and j
      add the difference in u’s preference for i and j to an average

Recommendation Stage:
for every item i the user u expresses no preference for
   for every item j that user u expresses a preference for
     find the average preference difference between j and i
     add this diff to u’s preference value for j
     add this to a running average
return the top items, ranked by these averages

Simple weighting: as introduced previously
StdDev weighting: item-item rating diffs with lower sd should be
                  weighted highly
Distributed Slope One in Mahout
Similar to our previous practice, e.g. the matrix factorization
Process, what we need is the Difference Matrix.

Suppose there are M users rated N items, the matrix
requires N(N-1)/2 cells. Also, the density is another aspect
– how user rated items. If there are several items and the
rating matrix is dense, the computational costs will increase
accordingly.

Question again: Online or Offline?
Depends on tasks & data.

Large-scale data. Let’s do it offline!
Distributed Slope One in Mahout
package org.apache.mahout.cf.taste.hadoop.slopeone;
      class SlopeOneAverageDiffsJob
      class SlopeOnePrefsToDiffsReducer
      class SlopeOneDiffsToAveragesReducer

package org.apache.mahout.cf.taste.hadoop;
      class ToItemPrefsMapper
      org.apache.hadoop.mapreduce.Mapper

Two Mapper-Reducer Stages:
      1). Create DiffMatrix for each user
      2). Collect AvgDiff info, counts, StdDev

Let’s see how it works…
Mapper and Reducer - 1
          User      HarryPotter        Batman        Spiderman
          U1              5               3              4
          U2              ?               2              4
          U3              4               2              ?

 Mapper1 (ToItemPrefsMapper)
  <UserID, Pair<ItemID, Rating>>
 Reducer1 (PrefsToDiffsReducer)
  <Pair<Item1,Item2>, Diff> (for all three users)

 <U1>      Potter   Bat       Spider   <U2>     Potter   Bat   Spider

 Potter                                Potter

  Bat          -2                       Bat     NULL

 Spider        -1    1                 Spider   NULL     2
Mapper and Reducer - 2
 <U1>     Potter    Bat     Spider   <U2>      Potter   Bat   Spider

Potter                               Potter

 Bat           -2                     Bat      NULL

Spider         -1   1                Spider    NULL     2

Mapper2 (org.apache.hadoop.mapreduce.Mapper)
Reducer2 (DiffsToAveragesReducer)
Average Diffs, Count, StedDev
  <Aggregate>             Potter         Bat             Spider
       Potter
         Bat              -2, 1
       Spider             -1, 1         1.5, 2
Simply, <a,b> pair denotes a=averge diff, b=count
Notice: we should use three matrices in practice, here I used 2.
Predictions
        User        HarryPotter      Batman        Spiderman
         U1              5              3               4
         U2              ?              2               4
         U3              4              2               ?

  <Aggregate>          Potter            Bat             Spider
      Potter
       Bat              -2, 1
      Spider            -1, 1           1.5, 2
 Simply, <a,b> pair denotes a=averge diff, b=count
 Notice: we should use three matrices in practice, here I used 2.


 Prediction(U3, Spiderman) = [(4-1)*1 + (2+1.5)*2] / (1+2)
                           = 3.33333333333333333333
Experiments

• Data

• Hadoop Setup

• Running Performances




                         Center for Web Intelligence, DePaul University, USA
Experiment Setup
Data: MovieLens-1M ratings
       # of users:     6,040
       # of movies:    3,900
       # of ratings:   1,000,209

Density of the ratings:
       each user has at least 20 ratings
       obviously, some users have many more ratings

Rating format: UserID, ItemID, Rating (scale 1-5)

Data Split: 80% training, 20% testing
Experiment Setup
Hadoop Cluster Setup
 IBM SmartCloud
 1 master node, 7 slave nodes
 Each node is as SUSE Linux Enterprise Server v11 SP1
 Server Configuration:
  64 bit (vCPU: 2, RAM: 4 GiB, Disk: 60 GiB)
 Hadoop v.0.20.205.0
 Mahout distribution-0.6

The environment setup follows the typical workflow as:
http://irecsys.blogspot.com/2012/11/configurate-map-reduce-
environment-on.html

Thanks Scott Young, neat writeup!!
Experimental Analyses
Stage-1: SlopeOneAverageDiffsJob by Map-Reduce
         Goal: Build DiffStorage
         Output: DiffStorage txt file, 1.45GB
         Running Time:
            real 13m 34.228s
            user 0m 5.136s
            sys      0m 1.028s
        Item1     Item2     Diff     Count    StdDev
         221      223       -1.02     197       0.5
Stage-2: Java evaluator to measure MAE on testing set
         Running Time:
            Load Testing Set (21K records), 299ms
            Load Training Set (79K records), 1,771ms
            Load DiffStorage, 176,352ms = 2.9m
            Prediction (21K records), 18,182ms = 0.3m
            MAE = 0.71330756
Experimental Experiences
1. Why not MovieLens 10M data?
   Map-Reduce on 10M data may cost several hrs;
   Running time depends on cluster and configuration;
   Also, DiffStorage file will be too large.
2. Java Evaluator
    Load full DiffStorage file is time-consuming.
   Also, incur Java heap space and GCOverlimit errors;
    Those errors can not be fixed by –Xmx or other solutions;
    Two solutions:
     1). Just use simple weighting, discard StdDev weighting.
     2). Simple Mapper and Reducer, run it on clusters.

   For MovieLens 1M, it is not that efficient compared with
   the live SlopeOne recommendation; 10M data may be
   better, will try MovieLens-10M data later; Slope One is
   simple but memory-expensive.
More …

• Drive Mahout on Hadoop

• Interesting Communities




                            Center for Web Intelligence, DePaul University, USA
Mahout + Hadoop
How to put more Mahout algorithms to Hadoop?
1. Pre-set Command in Mahout
  Let’s see bin/mahout – help, then it provides a list of
  available programs such as svd, fkmeans, etc.

  Some are basic functions, such as splitDataset
  Some can be executed as Hadoop tasks

  e.g. Run and evaluate Matrix Factorization on rating dataset

  bin/mahout parallelALS --input inputSource --output outputSource
  --tempDir tmpFolder --numFeatures 20 --numIterations 10

  bin/mahout evaluateFactorization --input inputSource --output
  outputSource --userFeatures als/out/U/ --itemFeatures als/out/M/
  --tempDir tmpFolder
Mahout + Hadoop
2. More Algorithms on Hadoop
  Mahout provides a way to run more Mahout algorithms. Simply,

$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-
<version>.jar <Job Class> --recommenderClassName Class <OPTIONS>

   Which kinds of Jobs it supports? Mahout implemented some versions.




   Some popular ones:
   1).org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob
        --recommenderClassName ClassName
   2).org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
   3).org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob
   4).org.apache.mahout.cf.taste.hadoop.slopeone.SlopeOneAverageDiffsJob
Interesting Communities
Beyond Hadoop and Mahout official sites

1. Data Mining
  KDnuggets, http://www.kdnuggets.com
  Popular community for Data Mining & Analytics. Lots of useful
  information, such as news, materials, datasets, jobs, etc.

2. Big Data
  SmartData Collective, http://smartdatacollective.com/
  Smarter Computing, http://www.smartercomputingblog.com/
  Big Data Meetup, http://big-data.meetup.com/

3. Recommender Systems
  ACM Official Site, http://recsys.acm.org/
  RecSys Wiki, http://recsyswiki.com/
Thank You!


      Center for Web Intelligence, DePaul University, USA

Contenu connexe

Tendances

Computer Graphics C Version - Hearn & Baker.pdf
Computer Graphics C Version - Hearn & Baker.pdfComputer Graphics C Version - Hearn & Baker.pdf
Computer Graphics C Version - Hearn & Baker.pdfSUSHIL KUMAR
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
The Golden Rules by Theo Mandel - Software Engineering
The Golden Rules by Theo Mandel - Software EngineeringThe Golden Rules by Theo Mandel - Software Engineering
The Golden Rules by Theo Mandel - Software EngineeringAmit Baghel
 
Image Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):BasicsImage Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):BasicsKalyan Acharjya
 
Architectural structures and views
Architectural structures and viewsArchitectural structures and views
Architectural structures and viewsDr Reeja S R
 
Fundamentals and image compression models
Fundamentals and image compression modelsFundamentals and image compression models
Fundamentals and image compression modelslavanya marichamy
 
Semantic net in AI
Semantic net in AISemantic net in AI
Semantic net in AIShahDhruv21
 
Software architecture Unit 1 notes
Software architecture Unit 1 notesSoftware architecture Unit 1 notes
Software architecture Unit 1 notesSudarshan Dhondaley
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Design pattern & categories
Design pattern & categoriesDesign pattern & categories
Design pattern & categoriesHimanshu
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
Lect 02 second portion
Lect 02  second portionLect 02  second portion
Lect 02 second portionMoe Moe Myint
 
Requirements engineering for agile methods
Requirements engineering for agile methodsRequirements engineering for agile methods
Requirements engineering for agile methodsSyed Zaid Irshad
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: ClusteringDeepak George
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 

Tendances (20)

Unified process Model
Unified process ModelUnified process Model
Unified process Model
 
Software Testing and UML Lab
Software Testing and UML LabSoftware Testing and UML Lab
Software Testing and UML Lab
 
Computer Graphics C Version - Hearn & Baker.pdf
Computer Graphics C Version - Hearn & Baker.pdfComputer Graphics C Version - Hearn & Baker.pdf
Computer Graphics C Version - Hearn & Baker.pdf
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
The Golden Rules by Theo Mandel - Software Engineering
The Golden Rules by Theo Mandel - Software EngineeringThe Golden Rules by Theo Mandel - Software Engineering
The Golden Rules by Theo Mandel - Software Engineering
 
Image Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):BasicsImage Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):Basics
 
Architectural structures and views
Architectural structures and viewsArchitectural structures and views
Architectural structures and views
 
Histogram processing
Histogram processingHistogram processing
Histogram processing
 
Fundamentals and image compression models
Fundamentals and image compression modelsFundamentals and image compression models
Fundamentals and image compression models
 
Corba
CorbaCorba
Corba
 
Semantic net in AI
Semantic net in AISemantic net in AI
Semantic net in AI
 
Software architecture Unit 1 notes
Software architecture Unit 1 notesSoftware architecture Unit 1 notes
Software architecture Unit 1 notes
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Design pattern & categories
Design pattern & categoriesDesign pattern & categories
Design pattern & categories
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Lect 02 second portion
Lect 02  second portionLect 02  second portion
Lect 02 second portion
 
Requirements engineering for agile methods
Requirements engineering for agile methodsRequirements engineering for agile methods
Requirements engineering for agile methods
 
Software design
Software designSoftware design
Software design
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 

En vedette

American Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott RoenAmerican Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott RoenLinkedIn
 
The good the bad and the ugly - final
The good the bad and the ugly - finalThe good the bad and the ugly - final
The good the bad and the ugly - finalAndre Verschelling
 
Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Loïc Knuchel
 
PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...Victor Codina
 
Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales Netwave
 
Case Study Amex
Case Study AmexCase Study Amex
Case Study AmexFM Signal
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahoutGregg Barrett
 
American Express Case Study
American Express Case StudyAmerican Express Case Study
American Express Case StudyShivani Chavan
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Strategic Brand Assessment - Amex
Strategic Brand Assessment - AmexStrategic Brand Assessment - Amex
Strategic Brand Assessment - AmexHenry Jenkins
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
The Best in Financial Services Content Marketing
The Best in Financial Services Content MarketingThe Best in Financial Services Content Marketing
The Best in Financial Services Content MarketingNewsCred
 
American express case study
American express case studyAmerican express case study
American express case studyChinmoy Nanda
 
Strategic management - lowes home improvement case study
Strategic management - lowes home improvement case studyStrategic management - lowes home improvement case study
Strategic management - lowes home improvement case studySarah Lee
 
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation ApproachYONG ZHENG
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessPipedrive
 
Yrecommender, machine learning sur Hybris
Yrecommender, machine learning sur HybrisYrecommender, machine learning sur Hybris
Yrecommender, machine learning sur HybrisGuillaume Kpotufe
 

En vedette (20)

American Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott RoenAmerican Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott Roen
 
The good the bad and the ugly - final
The good the bad and the ugly - finalThe good the bad and the ugly - final
The good the bad and the ugly - final
 
Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014
 
PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...
 
Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales
 
Case Study Amex
Case Study AmexCase Study Amex
Case Study Amex
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahout
 
American Express Case Study
American Express Case StudyAmerican Express Case Study
American Express Case Study
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Strategic Brand Assessment - Amex
Strategic Brand Assessment - AmexStrategic Brand Assessment - Amex
Strategic Brand Assessment - Amex
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
The Best in Financial Services Content Marketing
The Best in Financial Services Content MarketingThe Best in Financial Services Content Marketing
The Best in Financial Services Content Marketing
 
Big Data en Retail
Big Data en RetailBig Data en Retail
Big Data en Retail
 
American express case study
American express case studyAmerican express case study
American express case study
 
Strategic management - lowes home improvement case study
Strategic management - lowes home improvement case studyStrategic management - lowes home improvement case study
Strategic management - lowes home improvement case study
 
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of Serverless
 
Yrecommender, machine learning sur Hybris
Yrecommender, machine learning sur HybrisYrecommender, machine learning sur Hybris
Yrecommender, machine learning sur Hybris
 

Similaire à Slope one recommender on hadoop

Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...hyunsung lee
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseDatabricks
 
Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)brian d foy
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsRuofei Du
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Jeong-Gwan Lee
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative FilteringYONG ZHENG
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1San Kim
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - PyData
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e práticaPET Computação
 
Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)brian d foy
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkDalei Li
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2DMohamed Nassar
 

Similaire à Slope one recommender on hadoop (20)

Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
OpenAI Retro Contest
OpenAI Retro ContestOpenAI Retro Contest
OpenAI Retro Contest
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
AI Lesson 39
AI Lesson 39AI Lesson 39
AI Lesson 39
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam Penrose
 
Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e prática
 
Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 

Plus de YONG ZHENG

[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...YONG ZHENG
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...YONG ZHENG
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User StudiesYONG ZHENG
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie RecommendationYONG ZHENG
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsYONG ZHENG
 
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware PersonalizationYONG ZHENG
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context SuggestionYONG ZHENG
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware RecommendationYONG ZHENG
 
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...YONG ZHENG
 
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...YONG ZHENG
 
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender SystemsYONG ZHENG
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...YONG ZHENG
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...YONG ZHENG
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware RecommendationYONG ZHENG
 
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...YONG ZHENG
 

Plus de YONG ZHENG (20)

[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation
 
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
 
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
 
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
 
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
 

Dernier

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Dernier (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Slope one recommender on hadoop

  • 1. Slope One Recommender on Hadoop YONG ZHENG Center for Web Intelligence DePaul University Nov 15, 2012
  • 2. Overview • Introduction • Recommender Systems & Slope One Recommender • Distributed Slope One on Mahout and Hadoop • Experimental Setup and Analyses • Drive Mahout on Hadoop • Interesting Communities Center for Web Intelligence, DePaul University, USA
  • 3. Introduction • About Me: a recommendation guy • My Research: data mining and recommender systems • Typical Experimental Research 1) Design or improve an algorithm; 2) Run algorithms and baseline algs on datasets; 3) Compare experimental results; 4) Try different parameters, find reasons and even re-design and improve algorithm itself; 5) Run algorithms and baseline algs on datasets; 6) Compare experimental results; 7) Try different parameters, find reasons and even re-design and improve algorithm itself; 8) And so on… Until it approaches expected results.
  • 4. Introduction • Sometimes, data is large-scale. e.g. one algorithm may spend days to complete, how about experimental results are not as expected. Then improve algorithms and run it for days again, and again. How can we do previously? (for tasks not that complicated) 1). Paralleling but complicated synchronization and limited resources, such as CPU, memory, etc; 2). Take advantage of PC Labs, let’s do it with 10 PCs • Nearly all research will ultimately face the large-scale problems , especially in the domain of data mining. • But, we have Map-Reduce NOW!
  • 5. Introduction • Do not need to distribute data and tasks manually. Instead we just simply generate configurations. • Do not need to care about more details, e.g. how data is distributed, when one specific task will be ran on which machine, or how they conduct tasks one by one. • Instead, we can pre-define working flow. We can take advantage of the functional contributions from mappers and reducers. • More benefits: replication, balancing, robustness, etc
  • 6. Recommender Systems • Collaborative Filtering • Slope One and Simple Weighted Slope One • Slope One in Mahout • Distributed Slope One in Mahout • Mappers and Reducers Center for Web Intelligence, DePaul University, USA
  • 8. Collaborative Filtering (CF) One of most popular recommendation algorithms.  User-based: User-CF  Item-based: Item-CF, Slope One User 5 Rating? 5 4 4 4 star 5 Example: User-based Collaborative Filtering
  • 9. Slope One Recommender Reference: Daniel Lemire, Anna Maclachlan, Slope One Predictors for Online Rating-Based Collaborative Filtering, In SIAM Data Mining (SDM'05), April 21-23, 2005. http://lemire.me/fr/abstracts/SDM2005.html User Batman Spiderman U1 3 4 U2 2 4 U3 2 ? 1). How different two movies were rated? U1 rated Spiderman higher by (4-3) = 1 U2 rated Spiderman higher by (4-2) = 2 On average, Spiderman is rated (1+2)/2 = 1.5 higher 2). Rating difference can tell predictions If we know U3 gave Batman a 2-star, probably he will rated Spiderman by (2+1.5) = 3.5 star
  • 10. Simple Weighted Slope One Usually user rated multiple items User HarryPotter Batman Spiderman U1 5 3 4 U2 ? 2 4 U3 4 2 ? 1). How different the two movies were rated? Diff(Batman, Spiderman) = [(4-3)+(4-2)]/2 = 1.5 Diff(HarryPotter, Spiderman) = (4-5)/1 = -1 “2” and “1” here we call them as “count”. 2). Weighted rating difference can tell predictions We use a simple weighted approach Refer to Batman only, rating = 2+1.5 = 3.5 Refer to HarryPotter only, rating = 4-1 = 3 Consider them all, predicted rating = (3.5*2 + 3*1])/ (2+1) = 3.33
  • 11. Simple Weighted Slope One User HarryPotter Batman Spiderman u1 5 3 4 u2 ? 2 4 u3 4 2 ? Question: Online or Offline? To calculate the prediction ratings, we need 2 matrices: 1).Difference Matrix Movie1 Movie2 Movie3 Movie4 Movie1 Movie2 -1.5 Movie3 2 1 Movie4 -1 0.5 -2 2). Count Matrix Just number of users co-rated on two items
  • 12. Slope One in Mahout Mahout, an open-source machine learning library. 1). Recommendation algorithms User-based CF, Item-based CF, Slope One, etc 2). Clustering KMeans, Fuzzy KMeans, etc 3). Classification Decision Trees, Naive Bayes, SVM, etc 4). Latent Factor Models LDA, SVD, Matrix Factorization, etc
  • 13. Slope One in Mahout org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender Pre-Processing Stage: (class MemoryDiffStorage with Map) for every item i for every other item j for every user u expressing preference for both i and j add the difference in u’s preference for i and j to an average Recommendation Stage: for every item i the user u expresses no preference for for every item j that user u expresses a preference for find the average preference difference between j and i add this diff to u’s preference value for j add this to a running average return the top items, ranked by these averages Simple weighting: as introduced previously StdDev weighting: item-item rating diffs with lower sd should be weighted highly
  • 14. Distributed Slope One in Mahout Similar to our previous practice, e.g. the matrix factorization Process, what we need is the Difference Matrix. Suppose there are M users rated N items, the matrix requires N(N-1)/2 cells. Also, the density is another aspect – how user rated items. If there are several items and the rating matrix is dense, the computational costs will increase accordingly. Question again: Online or Offline? Depends on tasks & data. Large-scale data. Let’s do it offline!
  • 15. Distributed Slope One in Mahout package org.apache.mahout.cf.taste.hadoop.slopeone; class SlopeOneAverageDiffsJob class SlopeOnePrefsToDiffsReducer class SlopeOneDiffsToAveragesReducer package org.apache.mahout.cf.taste.hadoop; class ToItemPrefsMapper org.apache.hadoop.mapreduce.Mapper Two Mapper-Reducer Stages: 1). Create DiffMatrix for each user 2). Collect AvgDiff info, counts, StdDev Let’s see how it works…
  • 16. Mapper and Reducer - 1 User HarryPotter Batman Spiderman U1 5 3 4 U2 ? 2 4 U3 4 2 ? Mapper1 (ToItemPrefsMapper)  <UserID, Pair<ItemID, Rating>> Reducer1 (PrefsToDiffsReducer)  <Pair<Item1,Item2>, Diff> (for all three users) <U1> Potter Bat Spider <U2> Potter Bat Spider Potter Potter Bat -2 Bat NULL Spider -1 1 Spider NULL 2
  • 17. Mapper and Reducer - 2 <U1> Potter Bat Spider <U2> Potter Bat Spider Potter Potter Bat -2 Bat NULL Spider -1 1 Spider NULL 2 Mapper2 (org.apache.hadoop.mapreduce.Mapper) Reducer2 (DiffsToAveragesReducer) Average Diffs, Count, StedDev <Aggregate> Potter Bat Spider Potter Bat -2, 1 Spider -1, 1 1.5, 2 Simply, <a,b> pair denotes a=averge diff, b=count Notice: we should use three matrices in practice, here I used 2.
  • 18. Predictions User HarryPotter Batman Spiderman U1 5 3 4 U2 ? 2 4 U3 4 2 ? <Aggregate> Potter Bat Spider Potter Bat -2, 1 Spider -1, 1 1.5, 2 Simply, <a,b> pair denotes a=averge diff, b=count Notice: we should use three matrices in practice, here I used 2. Prediction(U3, Spiderman) = [(4-1)*1 + (2+1.5)*2] / (1+2) = 3.33333333333333333333
  • 19. Experiments • Data • Hadoop Setup • Running Performances Center for Web Intelligence, DePaul University, USA
  • 20. Experiment Setup Data: MovieLens-1M ratings # of users: 6,040 # of movies: 3,900 # of ratings: 1,000,209 Density of the ratings: each user has at least 20 ratings obviously, some users have many more ratings Rating format: UserID, ItemID, Rating (scale 1-5) Data Split: 80% training, 20% testing
  • 21. Experiment Setup Hadoop Cluster Setup  IBM SmartCloud  1 master node, 7 slave nodes  Each node is as SUSE Linux Enterprise Server v11 SP1  Server Configuration: 64 bit (vCPU: 2, RAM: 4 GiB, Disk: 60 GiB)  Hadoop v.0.20.205.0  Mahout distribution-0.6 The environment setup follows the typical workflow as: http://irecsys.blogspot.com/2012/11/configurate-map-reduce- environment-on.html Thanks Scott Young, neat writeup!!
  • 22. Experimental Analyses Stage-1: SlopeOneAverageDiffsJob by Map-Reduce Goal: Build DiffStorage Output: DiffStorage txt file, 1.45GB Running Time:  real 13m 34.228s  user 0m 5.136s  sys 0m 1.028s Item1 Item2 Diff Count StdDev 221 223 -1.02 197 0.5 Stage-2: Java evaluator to measure MAE on testing set Running Time:  Load Testing Set (21K records), 299ms  Load Training Set (79K records), 1,771ms  Load DiffStorage, 176,352ms = 2.9m  Prediction (21K records), 18,182ms = 0.3m  MAE = 0.71330756
  • 23. Experimental Experiences 1. Why not MovieLens 10M data? Map-Reduce on 10M data may cost several hrs; Running time depends on cluster and configuration; Also, DiffStorage file will be too large. 2. Java Evaluator Load full DiffStorage file is time-consuming. Also, incur Java heap space and GCOverlimit errors; Those errors can not be fixed by –Xmx or other solutions; Two solutions: 1). Just use simple weighting, discard StdDev weighting. 2). Simple Mapper and Reducer, run it on clusters. For MovieLens 1M, it is not that efficient compared with the live SlopeOne recommendation; 10M data may be better, will try MovieLens-10M data later; Slope One is simple but memory-expensive.
  • 24. More … • Drive Mahout on Hadoop • Interesting Communities Center for Web Intelligence, DePaul University, USA
  • 25. Mahout + Hadoop How to put more Mahout algorithms to Hadoop? 1. Pre-set Command in Mahout Let’s see bin/mahout – help, then it provides a list of available programs such as svd, fkmeans, etc. Some are basic functions, such as splitDataset Some can be executed as Hadoop tasks e.g. Run and evaluate Matrix Factorization on rating dataset bin/mahout parallelALS --input inputSource --output outputSource --tempDir tmpFolder --numFeatures 20 --numIterations 10 bin/mahout evaluateFactorization --input inputSource --output outputSource --userFeatures als/out/U/ --itemFeatures als/out/M/ --tempDir tmpFolder
  • 26. Mahout + Hadoop 2. More Algorithms on Hadoop Mahout provides a way to run more Mahout algorithms. Simply, $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core- <version>.jar <Job Class> --recommenderClassName Class <OPTIONS> Which kinds of Jobs it supports? Mahout implemented some versions. Some popular ones: 1).org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob --recommenderClassName ClassName 2).org.apache.mahout.cf.taste.hadoop.item.RecommenderJob 3).org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob 4).org.apache.mahout.cf.taste.hadoop.slopeone.SlopeOneAverageDiffsJob
  • 27. Interesting Communities Beyond Hadoop and Mahout official sites 1. Data Mining KDnuggets, http://www.kdnuggets.com Popular community for Data Mining & Analytics. Lots of useful information, such as news, materials, datasets, jobs, etc. 2. Big Data SmartData Collective, http://smartdatacollective.com/ Smarter Computing, http://www.smartercomputingblog.com/ Big Data Meetup, http://big-data.meetup.com/ 3. Recommender Systems ACM Official Site, http://recsys.acm.org/ RecSys Wiki, http://recsyswiki.com/
  • 28. Thank You! Center for Web Intelligence, DePaul University, USA