SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
Sistemas de Recomendação
sem enrolação!
Gabriel Moreira - @gspmoreira
Lead Data Scientist DSc. student
GCG Campinas
DataFest 2018
Life is too short!
Introduction
"We are leaving the Information Age and entering the
Recommendation Age.".
Cris Anderson, "The long tail"
3
38% of sales
2/3 views
Recommendations are responsible for...
39% of top news
visualization
What else may I recommend?
What can a Recommender Systems do?
2 - Prediction
Given an item, what is its relevance for
each user?
1 - Recommendation
Given a user, produce an ordered list matching the
user needs
Recommender System Methods
Recommender System
Content-based filtering Collaborative filtering
Model-based filteringMemory-based filtering
Item-basedUser-based
ML-based: Clustering, Association Rules,
Matrix Factorization, Neural Networks
Hybrid filtering+ =
Most popular
Collaborative Filtering
User-Based Collaborative Filtering
Similar interests
Likes
Recommends
Item-Based Collaborative Filtering
Likes Recommends
Who likes A also likes B
Likes
Likes
Collaborative Filtering based on Matrix Factorization
Collaborative Filtering
Advantages
● Works to any item kind (ignore attributes)
Drawbacks
● Usually recommends more popular items
● Cold-start
○ Cannot recommend items not already
rated/consumed
○ Needs a minimum amount of users to match
similar users
Frameworks - Recommender Systems
Python
Python / ScalaJava
.NET
Java
User-Based Collaborative Filtering (Java / Mahout)
// Loads user-item ratings
DataModel model = new FileDataModel(new File("input.csv"));
// Defines a similarity metric to compare users (Person's correlation coefficient)
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
// Threshold the minimum similarity to consider two users similar
UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1,
similarity, model);
// Create a User-Based Collaborative Filtering recommender
UserBasedRecommender recommender = new
GenericUserBasedRecommender(model, neighborhood, similarity);
// Return the top 3 recommendations for userId=2
List recommendations = recommender.recommend(2, 3);
User,Item,Rating1,
15,4.0
1,16,5.0
1,17,1.0
1,18,5.0
2,10,1.0
2,11,2.0
2,15,5.0
2,16,4.5
2,17,1.0
2,18,5.0
3,11,2.5
input.csv
User-Based Collaborative Filtering example (Mahout)
1
2
3
4
5
6
https://mahout.apache.org/users/recommender/userbased-5-minutes.html
Content-Based Filtering
Content-Based Filtering
Similar content (e.g. actor)
Likes
Recommends
Advantages
● Does not depend upon other users
● May recommend new and unpopular items
● Recommendations can be easily explained
Drawbacks
● Overspecialization
● May be difficult to extract attributes from audio,
movies or images
Content-Based Filtering
Hybrid Recommender Systems
Composite
Iterates by a chain of algorithm, aggregating
recommendations.
Weighted
Each algorithm has as a weight and the final
recommendations are defined by weighted averages.
Some approaches...
CI&T Deskdrop dataset on Kaggle!
https://www.kaggle.com/gspmoreira/articles-sharing-reading-from-cit-deskdrop
● 12 months logs
(Mar. 2016 - Feb. 2017)
● ~ 73k logged users interactions
● ~ 3k public articles shared in
the platform.
Recommender Systems in Python 101
https://www.kaggle.com/gspmoreira/recommender-systems-in-python-101
Recommender Systems in Python 101
1. Popularity Based: log(# clicks)
2. Content-Based Filtering: TF-IDF + Cosine Similarity
3. Collaborative Filtering: SVD Matrix Factorization
4. Hybrid Filtering: CB * CF
Recommender Systems in Python 101
3 - Article A
2 - Non-relevant art.
7 - Non-relevant art.
4 - Non-relevant art.
5 - Non-relevant art.
6 - Non-relevant art.
1 - Non-relevant art.
Relevant (clicked) articles
Article A
9 - Non-relevant art.
8 - Non-relevant art.
Recall@N: Percentage of relevant
items within the Top-N items in a
ranked list.
Article B
10 - Non-relevant art.
11 - Non-relevant art.
100 - Non-relevant art.
...
3 - Non-relevant art.
4 - Non-relevant art.
5 - Non-relevant art.
6 - Non-relevant art.
2 - Non-relevant art.
9 - Non-relevant art.
8 - Non-relevant art.
10 - Non-relevant art.
11 - Non-relevant art.
100 - Non-relevant art.
...
1 - Non-relevant art.
7 - Article B
Recall@5
Recall@10
Evaluation - Top-N accuracy metrics
Recommender Systems in Python 101
Recommender Systems in Python 101
Going deeper...
Why Deep Learning has a potential for RecSys?
1. Feature extraction directly from the content (e.g., image, text, audio)
Images Text Audio/Music
● CNN ● 1D CNN
● RNNs
● Weighted word
embeddings
● CNN
● RNN
Why Deep Learning has a potential for RecSys?
2. Heterogenous data handled easily
3. Dynamic behaviour modeling with RNNs
4. More accurate representation learning of users and items
○ Natural extensions of CF
5. RecSys is a complex domain
○ Deep learning worked well in other complex domains
News Recommender Systems
using Deep Learning
RecSys 2018 - DLRS - Deep Learning for Recommender Systems Workshop
29http://dlrs-workshop.org/dlrs-2018/program/
https://arxiv.org/abs/1808.00076
News Recommender Systems
The majority of web traffic (TREVISIOL et al. , 2014b)
30
News Recommender Systems (1/2)
News RS Challenges
1. Sparse user profiling
(LI et al. , 2011) (LIN et al. , 2014) (PELÁEZ et al. , 2016)
2. Users’ preferences shift
(PELÁEZ et al. , 2016) (EPURE et al. , 2017)
3. Fast growing number of items
(PELÁEZ et al. , 2016) (MOHALLICK; ÖZGÖBEK , 2017)
31
News Recommender Systems (2/2)
News RS Challenges
4. Accelerated item’s value decay (DAS et al. , 2007)
32
Articles age at click time distribution (G1 dataset)
10% 4 hours
25% 5 hours
50% (Median) 8 hours
75% 14 hours
90% 26 hours
CHAMELEON: A Deep Learning
Meta-Architecture for News
Recommendation
A conceptual model of news relevance factors
News
relevance
Topics Entities Publisher
News static properties
Recency Popularity
News dynamic properties
News article
User
TimeLocation Device
User current context
Long-term
interests
Short-term
interests
Global factors
Season-
ality
User interests
Breaking
events
Popular
Topics
Referrer
34
CHAMELEON meta-architecture
1. Deals with item cold-start scenario
2. Learns item representation from articles text and metadata
3. Leverages user context (location, time, device) and article context (popularity,
recency)
4. Provides session-based recommendations based on users’ short-term preferences
5. Supports streaming users clicks (online learning), without the need to retrain on
the whole historical dataset
6. Provides a modular structure for news recommendation, allowing its modules to
be instantiated by different advanced neural network architectures and methods
35
The CHAMELEON Meta-Architecture for News RS
Article
Context
Article
Content
Embeddings
Article Content Representation (ACR)
Textual Features Representation (TFR)
Metadata Prediction (MP)
Category Tags Entities
Article Metadata Attributes
Next-Article Recommendation (NAR)
Time
Location
Device
When a news article is published...
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
Article
Content
Embedding
candidate next articles
(positive and neg.)
active article
Active
Sessions
When a user reads a news article...
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Recommended
articles
Contextual Article Representation (CAR)
Content word embeddings
New York is a multicultural city , ...Publisher
Metadata
Attributes
News Article
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributes
Article Content Embedding
Legend:
Word
Embeddings
36
The CHAMELEON Meta-Architecture for News RS
Article
Context
Article
Content
Embeddings
Next-Article Recommendation (NAR)
past read articles
Users Past
Sessions
candidate next articles
(positive and neg.)
active article
Active
Sessions
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Recommended
articles
Contextual Article Representation (CAR)
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend:
Next-Article Recommendation
(NAR) module
● Provide news articles
recommendations for each interaction
(I) in active user sessions.
● For each recommendation request,
NAR module generates a ranked list
of the most likely articles user might
read in a given session.
37
Sessions mini-batch
I1,1
I1,2
I1,3
I1,4
I1,5
I2,1
I2,2
I3,1
I3,2
I3,3
Time
Location
Device
User context
User interaction
Popularity
Recency
Article context
Article
Content
Embedding
When a user reads a news article...
The CHAMELEON Meta-Architecture for News RS
Article
Context
Article
Content
Embeddings
Next-Article Recommendation (NAR)
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
candidate next articles
(positive and neg.)
active article
Active
Sessions
When a user reads a news article...
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Recommended
articles
Contextual Article Representation (CAR)
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend:
Article
Content
Embedding
38
Active User Session
Trains an RNN to predict next-clicked
items (represented by their
User-Personalized Contextual Article
Embedding), for each active session:
RNN input
I1
I2
I3
I4
I2
I3
I4
I5
Expected RNN output
(next-clicked items)
The CHAMELEON Meta-Architecture for News RS
Article
Context
Article
Content
Embeddings
Next-Article Recommendation (NAR)
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
candidate next articles
(positive and neg.)
active article
Active
Sessions
When a user reads a news article...
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Recommended
articles
Contextual Article Representation (CAR)
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend:
Article
Content
Embedding
39
Negative Sampling strategy
Unique articles read by users within the last
N hour/clicks buffer
Unique articles read by other user sessions
in the mini-batch
Next article read by user in his session
Samples
The CHAMELEON Meta-Architecture for News RS
Article
Context
Article
Content
Embeddings
Next-Article Recommendation (NAR)
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
candidate next articles
(positive and neg.)
active article
Active
Sessions
When a user reads a news article...
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Contextual Article Representation (CAR)
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend:
Recommendations Ranking
(RR) sub-module
Article
Content
Embedding
Eq. 7 - Loss function (HUANG et al., 2013)
Eq. 4 - Relevance Score of an item for a user session
Eq. 5 - Cosine similarity
Eq. 6 - Softmax over Relevance Score (HUANG et al., 2013)
40
Recommended
articles
CHAMELEON - NAR module loss function
Recommendation loss function implemented on TensorFlow
CHAMELEON
Architecture Instatiations
An architecture instantiation of CHAMELEON (1D CNN and LSTM)
43
Article
Context
Article
Content
Embeddings
Article Content Representation (ACR)
Textual Features Representation (TFR)
Metadata Prediction (MP)
Category
Target Article Metadata Attributes
Next-Article Recommendation (NAR)
Platform
Device Type
When a news article is published...
User context
User interaction
past read articles
Popularity
Recency
Article contextArticle
Content
Embedding
candidate next articles
(positive and neg.)
active articleActive
Sessions
When a user reads a news article...
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Recommended
articles
Contextual Article Representation (CAR)
Content word embeddings
New York is a multicultural city , ...Publisher
Metadata
Attributes
News Article
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributes
Article Content Embedding
Legend:
Word
Embeddings
Convolutional Neural Network (CNN)
conv-3 (128)
max-pooling
conv-4 (128)
max-pooling
conv-5 (128)
max-pooling
Fully Connected
Fully Connected
Fully Connected
Fully Connected
LSTM
CHAMELEON Instantiation - Implementation
44
● This CHAMELEON architecture instantiation was implemented using
TensorFlow (available in https://github.com/gabrielspmoreira/chameleon_recsys)
● Training and evaluation performed in Google Cloud Platform ML Engine
Preliminary
Experiments
Preliminary experiments - Dataset
46
● Provided by Globo.com (G1), the most popular news portal in Brazil
● Sample from Oct., 1 to 16, 2017, with over 3 M clicks, distributed in 1.2 M
sessions from 330 K users, who read over 50 K unique news articles
https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom
ACR module training
47
Trained in a dataset with 364 K articles from 461 categories, to generate the
Articles Content Embeddings (vectors with 250 dimensions)
t-SNE visualization of trained Article Content
Embeddings (from top 15 categories)
Distribution of articles by the top 200 categories
NAR module evaluation
48
Temporal offline evaluation method:
1. Train the NAR module with sessions within the active hour
2. Evaluate the NAR module with sessions within the next hour
Task: For each item within a session, predict the next-clicked item from a set
composed by the positive sample (correct article) and 50 negative samples.
Metrics:
● Recall@5 - Checks whether the positive item is among the top-5 ranked
items
● MRR@5 - Ranking metric which assigns higher scores at top ranks.
NAR module evaluation
49
Benchmark methods for session-based recommendations:
Neural Networks methods
1. GRU4Rec - Seminal neural architecture using RNNs for session-based recommendations
(Hidasi, 2016) with the improvements of (Hidasi, 2017) (v2).
Frequent patterns methods
2. Co-occurrent - Recommends articles commonly viewed together with the last read article, in
other user sessions (simplified version of the association rules technique, with the maximum rule
size of two) (Jugovac, 2018) (Ludewig, 2018)
3. Sequential Rules (SR) - A more sophisticated version of association rules, which considers the
sequence of clicked items within the session. A rule is created when an item q appeared after an
item p in a session, even when other items were viewed between p and q. The rules are
weighted by the distance x (number of steps) between p and q in the session with a linear
weighting function (Ludewig, 2018)
NAR module evaluation
50
Benchmark methods for session-based recommendations:
KNN methods
4. Item-kNN - Returns most similar items to the last read article, in terms of the cosine similarity
between the vector of their sessions, i.e. it is the number of co-occurrences of two items in sessions
divided by the square root of the product of the numbers of sessions in which the individual items
are occurred.
5. Vector Multiplication Session-Based kNN (V-SkNN) - Compares the entire active session with
past sessions and find items to be recommended. The comparison emphasizes items more recently
clicked within the session, when computing the similarities with past sessions (Jannach,2017)
(Jugovac,2018) (Ludewig,2018)
Other baselines
6. Recently Popular - Recommends the most viewed articles from the last N clicks buffer
7. Content-Based - For each article read by the user, recommends similar articles based on the
cosine similarity of their Article Content Embeddings, from the last N clicks buffer.
NAR module evaluation
51
Experiment #1
Continuous training and evaluating during 15 days (Oct. 1-15, 2017)
Average MRR@5 by hour (evaluation each 5 hours), for a 15-days period
NAR module evaluation
52
Experiment #1
Continuous training and evaluating each five hours, during 15 days (Oct. 1-15, 2017)
13% of relative
improvement on MRR@5
Distribution of average MRR@5 by hour (sampled for evaluation), for a 15-days period
NAR module evaluation
53
Experiment #2
Continuous training and evaluating each hour, on the subsequent day (Oct. 16, 2017)
Average MRR@5 by hour, for Oct. 16, 2017
References
ACM RecSys
55
RecSys 2018 - Vancouver, CA
● 12th edition
● 6 days of tutorials, main conference, workshops
● Over 800 attendants
○ 73% from industry
○ Top companies: Amazon, Google, Spotify
● 28% of paper acceptance rate
https://recsys.acm.org
References
https://www.slideshare.net/gabrielspmoreira/deep-recommender-systems-papisio-latam-2018
Questions?
Gabriel Moreira - @gspmoreira
Lead Data Scientist DSc. student
GCG Campinas
DataFest 2018

Contenu connexe

Tendances

Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
Jonathan Mugan
 

Tendances (20)

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web Application
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep Learning
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Recommender System with Distributed Representation
Recommender System with Distributed RepresentationRecommender System with Distributed Representation
Recommender System with Distributed Representation
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
 
Distributed Representation-based Recommender Systems in E-commerce
Distributed Representation-based Recommender Systems in E-commerceDistributed Representation-based Recommender Systems in E-commerce
Distributed Representation-based Recommender Systems in E-commerce
 
Real World Guide to Building Your Knowledge Graph
Real World Guide to Building Your Knowledge GraphReal World Guide to Building Your Knowledge Graph
Real World Guide to Building Your Knowledge Graph
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern MarketingHow Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
 

Similaire à Sistemas de Recomendação sem Enrolação

Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Gabriel Moreira
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 

Similaire à Sistemas de Recomendação sem Enrolação (20)

How recommender systems work
How recommender systems work How recommender systems work
How recommender systems work
 
In search of better deep Recommender Systems
In search of better deep Recommender Systems In search of better deep Recommender Systems
In search of better deep Recommender Systems
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
One Stop Recommendation
One Stop RecommendationOne Stop Recommendation
One Stop Recommendation
 
One Stop Recommendation
One Stop RecommendationOne Stop Recommendation
One Stop Recommendation
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking
 
Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
Analysis on Recommended System for Web Information Retrieval Using HMM
Analysis on Recommended System for Web Information Retrieval Using HMMAnalysis on Recommended System for Web Information Retrieval Using HMM
Analysis on Recommended System for Web Information Retrieval Using HMM
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 

Plus de Gabriel Moreira

Continuous Inspection - An effective approch towards Software Quality Product...
Continuous Inspection - An effective approch towards Software Quality Product...Continuous Inspection - An effective approch towards Software Quality Product...
Continuous Inspection - An effective approch towards Software Quality Product...
Gabriel Moreira
 

Plus de Gabriel Moreira (17)

PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
CI&T Tech Summit 2017 - Machine Learning para Sistemas de Recomendação
CI&T Tech Summit 2017 - Machine Learning para Sistemas de RecomendaçãoCI&T Tech Summit 2017 - Machine Learning para Sistemas de Recomendação
CI&T Tech Summit 2017 - Machine Learning para Sistemas de Recomendação
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
lsrs15_ciandt
lsrs15_ciandtlsrs15_ciandt
lsrs15_ciandt
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Using Neural Networks and 3D sensors data to model LIBRAS gestures recognitio...
Using Neural Networks and 3D sensors data to model LIBRAS gestures recognitio...Using Neural Networks and 3D sensors data to model LIBRAS gestures recognitio...
Using Neural Networks and 3D sensors data to model LIBRAS gestures recognitio...
 
Developing GeoGames for Education with Kinect and Android for ArcGIS Runtime
Developing GeoGames for Education with Kinect and Android for ArcGIS RuntimeDeveloping GeoGames for Education with Kinect and Android for ArcGIS Runtime
Developing GeoGames for Education with Kinect and Android for ArcGIS Runtime
 
Dojo Imagem de Android - 19/06/2012
Dojo Imagem de Android - 19/06/2012Dojo Imagem de Android - 19/06/2012
Dojo Imagem de Android - 19/06/2012
 
Agile Testing e outros amendoins
Agile Testing e outros amendoinsAgile Testing e outros amendoins
Agile Testing e outros amendoins
 
ArcGIS Runtime For Android
ArcGIS Runtime For AndroidArcGIS Runtime For Android
ArcGIS Runtime For Android
 
EARLY-FIX: Um Framework para Predição de Manutenção Corretiva de Software uti...
EARLY-FIX: Um Framework para Predição de Manutenção Corretiva de Software uti...EARLY-FIX: Um Framework para Predição de Manutenção Corretiva de Software uti...
EARLY-FIX: Um Framework para Predição de Manutenção Corretiva de Software uti...
 
Continuous Inspection - An effective approch towards Software Quality Product...
Continuous Inspection - An effective approch towards Software Quality Product...Continuous Inspection - An effective approch towards Software Quality Product...
Continuous Inspection - An effective approch towards Software Quality Product...
 
An Investigation Of EXtreme Programming Practices
An Investigation Of EXtreme Programming PracticesAn Investigation Of EXtreme Programming Practices
An Investigation Of EXtreme Programming Practices
 
METACOM – Uma análise de correlação entre métricas de produto e propensão à m...
METACOM – Uma análise de correlação entre métricas de produto e propensão à m...METACOM – Uma análise de correlação entre métricas de produto e propensão à m...
METACOM – Uma análise de correlação entre métricas de produto e propensão à m...
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Sistemas de Recomendação sem Enrolação

  • 1. Sistemas de Recomendação sem enrolação! Gabriel Moreira - @gspmoreira Lead Data Scientist DSc. student GCG Campinas DataFest 2018
  • 2. Life is too short!
  • 3. Introduction "We are leaving the Information Age and entering the Recommendation Age.". Cris Anderson, "The long tail" 3
  • 4. 38% of sales 2/3 views Recommendations are responsible for... 39% of top news visualization
  • 5. What else may I recommend?
  • 6. What can a Recommender Systems do? 2 - Prediction Given an item, what is its relevance for each user? 1 - Recommendation Given a user, produce an ordered list matching the user needs
  • 7. Recommender System Methods Recommender System Content-based filtering Collaborative filtering Model-based filteringMemory-based filtering Item-basedUser-based ML-based: Clustering, Association Rules, Matrix Factorization, Neural Networks Hybrid filtering+ = Most popular
  • 9. User-Based Collaborative Filtering Similar interests Likes Recommends
  • 10. Item-Based Collaborative Filtering Likes Recommends Who likes A also likes B Likes Likes
  • 11. Collaborative Filtering based on Matrix Factorization
  • 12. Collaborative Filtering Advantages ● Works to any item kind (ignore attributes) Drawbacks ● Usually recommends more popular items ● Cold-start ○ Cannot recommend items not already rated/consumed ○ Needs a minimum amount of users to match similar users
  • 13. Frameworks - Recommender Systems Python Python / ScalaJava .NET Java
  • 14. User-Based Collaborative Filtering (Java / Mahout) // Loads user-item ratings DataModel model = new FileDataModel(new File("input.csv")); // Defines a similarity metric to compare users (Person's correlation coefficient) UserSimilarity similarity = new PearsonCorrelationSimilarity(model); // Threshold the minimum similarity to consider two users similar UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1, similarity, model); // Create a User-Based Collaborative Filtering recommender UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity); // Return the top 3 recommendations for userId=2 List recommendations = recommender.recommend(2, 3); User,Item,Rating1, 15,4.0 1,16,5.0 1,17,1.0 1,18,5.0 2,10,1.0 2,11,2.0 2,15,5.0 2,16,4.5 2,17,1.0 2,18,5.0 3,11,2.5 input.csv User-Based Collaborative Filtering example (Mahout) 1 2 3 4 5 6 https://mahout.apache.org/users/recommender/userbased-5-minutes.html
  • 16. Content-Based Filtering Similar content (e.g. actor) Likes Recommends
  • 17. Advantages ● Does not depend upon other users ● May recommend new and unpopular items ● Recommendations can be easily explained Drawbacks ● Overspecialization ● May be difficult to extract attributes from audio, movies or images Content-Based Filtering
  • 18. Hybrid Recommender Systems Composite Iterates by a chain of algorithm, aggregating recommendations. Weighted Each algorithm has as a weight and the final recommendations are defined by weighted averages. Some approaches...
  • 19. CI&T Deskdrop dataset on Kaggle! https://www.kaggle.com/gspmoreira/articles-sharing-reading-from-cit-deskdrop ● 12 months logs (Mar. 2016 - Feb. 2017) ● ~ 73k logged users interactions ● ~ 3k public articles shared in the platform.
  • 20. Recommender Systems in Python 101 https://www.kaggle.com/gspmoreira/recommender-systems-in-python-101
  • 21. Recommender Systems in Python 101 1. Popularity Based: log(# clicks) 2. Content-Based Filtering: TF-IDF + Cosine Similarity 3. Collaborative Filtering: SVD Matrix Factorization 4. Hybrid Filtering: CB * CF
  • 22. Recommender Systems in Python 101 3 - Article A 2 - Non-relevant art. 7 - Non-relevant art. 4 - Non-relevant art. 5 - Non-relevant art. 6 - Non-relevant art. 1 - Non-relevant art. Relevant (clicked) articles Article A 9 - Non-relevant art. 8 - Non-relevant art. Recall@N: Percentage of relevant items within the Top-N items in a ranked list. Article B 10 - Non-relevant art. 11 - Non-relevant art. 100 - Non-relevant art. ... 3 - Non-relevant art. 4 - Non-relevant art. 5 - Non-relevant art. 6 - Non-relevant art. 2 - Non-relevant art. 9 - Non-relevant art. 8 - Non-relevant art. 10 - Non-relevant art. 11 - Non-relevant art. 100 - Non-relevant art. ... 1 - Non-relevant art. 7 - Article B Recall@5 Recall@10 Evaluation - Top-N accuracy metrics
  • 26. Why Deep Learning has a potential for RecSys? 1. Feature extraction directly from the content (e.g., image, text, audio) Images Text Audio/Music ● CNN ● 1D CNN ● RNNs ● Weighted word embeddings ● CNN ● RNN
  • 27. Why Deep Learning has a potential for RecSys? 2. Heterogenous data handled easily 3. Dynamic behaviour modeling with RNNs 4. More accurate representation learning of users and items ○ Natural extensions of CF 5. RecSys is a complex domain ○ Deep learning worked well in other complex domains
  • 29. RecSys 2018 - DLRS - Deep Learning for Recommender Systems Workshop 29http://dlrs-workshop.org/dlrs-2018/program/ https://arxiv.org/abs/1808.00076
  • 30. News Recommender Systems The majority of web traffic (TREVISIOL et al. , 2014b) 30
  • 31. News Recommender Systems (1/2) News RS Challenges 1. Sparse user profiling (LI et al. , 2011) (LIN et al. , 2014) (PELÁEZ et al. , 2016) 2. Users’ preferences shift (PELÁEZ et al. , 2016) (EPURE et al. , 2017) 3. Fast growing number of items (PELÁEZ et al. , 2016) (MOHALLICK; ÖZGÖBEK , 2017) 31
  • 32. News Recommender Systems (2/2) News RS Challenges 4. Accelerated item’s value decay (DAS et al. , 2007) 32 Articles age at click time distribution (G1 dataset) 10% 4 hours 25% 5 hours 50% (Median) 8 hours 75% 14 hours 90% 26 hours
  • 33. CHAMELEON: A Deep Learning Meta-Architecture for News Recommendation
  • 34. A conceptual model of news relevance factors News relevance Topics Entities Publisher News static properties Recency Popularity News dynamic properties News article User TimeLocation Device User current context Long-term interests Short-term interests Global factors Season- ality User interests Breaking events Popular Topics Referrer 34
  • 35. CHAMELEON meta-architecture 1. Deals with item cold-start scenario 2. Learns item representation from articles text and metadata 3. Leverages user context (location, time, device) and article context (popularity, recency) 4. Provides session-based recommendations based on users’ short-term preferences 5. Supports streaming users clicks (online learning), without the need to retrain on the whole historical dataset 6. Provides a modular structure for news recommendation, allowing its modules to be instantiated by different advanced neural network architectures and methods 35
  • 36. The CHAMELEON Meta-Architecture for News RS Article Context Article Content Embeddings Article Content Representation (ACR) Textual Features Representation (TFR) Metadata Prediction (MP) Category Tags Entities Article Metadata Attributes Next-Article Recommendation (NAR) Time Location Device When a news article is published... User context User interaction past read articles Popularity Recency Article context Users Past Sessions Article Content Embedding candidate next articles (positive and neg.) active article Active Sessions When a user reads a news article... Predicted Next-Article Embedding Session Representation (SR) Recommendations Ranking (RR) User-Personalized Contextual Article Embedding Recommended articles Contextual Article Representation (CAR) Content word embeddings New York is a multicultural city , ...Publisher Metadata Attributes News Article Active user session Module Sub-Module EmbeddingInput Output Data repositoryAttributes Article Content Embedding Legend: Word Embeddings 36
  • 37. The CHAMELEON Meta-Architecture for News RS Article Context Article Content Embeddings Next-Article Recommendation (NAR) past read articles Users Past Sessions candidate next articles (positive and neg.) active article Active Sessions Predicted Next-Article Embedding Session Representation (SR) Recommendations Ranking (RR) User-Personalized Contextual Article Embedding Recommended articles Contextual Article Representation (CAR) Active user session Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend: Next-Article Recommendation (NAR) module ● Provide news articles recommendations for each interaction (I) in active user sessions. ● For each recommendation request, NAR module generates a ranked list of the most likely articles user might read in a given session. 37 Sessions mini-batch I1,1 I1,2 I1,3 I1,4 I1,5 I2,1 I2,2 I3,1 I3,2 I3,3 Time Location Device User context User interaction Popularity Recency Article context Article Content Embedding When a user reads a news article...
  • 38. The CHAMELEON Meta-Architecture for News RS Article Context Article Content Embeddings Next-Article Recommendation (NAR) Time Location Device User context User interaction past read articles Popularity Recency Article context Users Past Sessions candidate next articles (positive and neg.) active article Active Sessions When a user reads a news article... Predicted Next-Article Embedding Session Representation (SR) Recommendations Ranking (RR) User-Personalized Contextual Article Embedding Recommended articles Contextual Article Representation (CAR) Active user session Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend: Article Content Embedding 38 Active User Session Trains an RNN to predict next-clicked items (represented by their User-Personalized Contextual Article Embedding), for each active session: RNN input I1 I2 I3 I4 I2 I3 I4 I5 Expected RNN output (next-clicked items)
  • 39. The CHAMELEON Meta-Architecture for News RS Article Context Article Content Embeddings Next-Article Recommendation (NAR) Time Location Device User context User interaction past read articles Popularity Recency Article context Users Past Sessions candidate next articles (positive and neg.) active article Active Sessions When a user reads a news article... Predicted Next-Article Embedding Session Representation (SR) Recommendations Ranking (RR) User-Personalized Contextual Article Embedding Recommended articles Contextual Article Representation (CAR) Active user session Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend: Article Content Embedding 39 Negative Sampling strategy Unique articles read by users within the last N hour/clicks buffer Unique articles read by other user sessions in the mini-batch Next article read by user in his session Samples
  • 40. The CHAMELEON Meta-Architecture for News RS Article Context Article Content Embeddings Next-Article Recommendation (NAR) Time Location Device User context User interaction past read articles Popularity Recency Article context Users Past Sessions candidate next articles (positive and neg.) active article Active Sessions When a user reads a news article... Predicted Next-Article Embedding Session Representation (SR) Recommendations Ranking (RR) User-Personalized Contextual Article Embedding Contextual Article Representation (CAR) Active user session Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend: Recommendations Ranking (RR) sub-module Article Content Embedding Eq. 7 - Loss function (HUANG et al., 2013) Eq. 4 - Relevance Score of an item for a user session Eq. 5 - Cosine similarity Eq. 6 - Softmax over Relevance Score (HUANG et al., 2013) 40 Recommended articles
  • 41. CHAMELEON - NAR module loss function Recommendation loss function implemented on TensorFlow
  • 43. An architecture instantiation of CHAMELEON (1D CNN and LSTM) 43 Article Context Article Content Embeddings Article Content Representation (ACR) Textual Features Representation (TFR) Metadata Prediction (MP) Category Target Article Metadata Attributes Next-Article Recommendation (NAR) Platform Device Type When a news article is published... User context User interaction past read articles Popularity Recency Article contextArticle Content Embedding candidate next articles (positive and neg.) active articleActive Sessions When a user reads a news article... Predicted Next-Article Embedding Session Representation (SR) Recommendations Ranking (RR) User-Personalized Contextual Article Embedding Recommended articles Contextual Article Representation (CAR) Content word embeddings New York is a multicultural city , ...Publisher Metadata Attributes News Article Active user session Module Sub-Module EmbeddingInput Output Data repositoryAttributes Article Content Embedding Legend: Word Embeddings Convolutional Neural Network (CNN) conv-3 (128) max-pooling conv-4 (128) max-pooling conv-5 (128) max-pooling Fully Connected Fully Connected Fully Connected Fully Connected LSTM
  • 44. CHAMELEON Instantiation - Implementation 44 ● This CHAMELEON architecture instantiation was implemented using TensorFlow (available in https://github.com/gabrielspmoreira/chameleon_recsys) ● Training and evaluation performed in Google Cloud Platform ML Engine
  • 46. Preliminary experiments - Dataset 46 ● Provided by Globo.com (G1), the most popular news portal in Brazil ● Sample from Oct., 1 to 16, 2017, with over 3 M clicks, distributed in 1.2 M sessions from 330 K users, who read over 50 K unique news articles https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom
  • 47. ACR module training 47 Trained in a dataset with 364 K articles from 461 categories, to generate the Articles Content Embeddings (vectors with 250 dimensions) t-SNE visualization of trained Article Content Embeddings (from top 15 categories) Distribution of articles by the top 200 categories
  • 48. NAR module evaluation 48 Temporal offline evaluation method: 1. Train the NAR module with sessions within the active hour 2. Evaluate the NAR module with sessions within the next hour Task: For each item within a session, predict the next-clicked item from a set composed by the positive sample (correct article) and 50 negative samples. Metrics: ● Recall@5 - Checks whether the positive item is among the top-5 ranked items ● MRR@5 - Ranking metric which assigns higher scores at top ranks.
  • 49. NAR module evaluation 49 Benchmark methods for session-based recommendations: Neural Networks methods 1. GRU4Rec - Seminal neural architecture using RNNs for session-based recommendations (Hidasi, 2016) with the improvements of (Hidasi, 2017) (v2). Frequent patterns methods 2. Co-occurrent - Recommends articles commonly viewed together with the last read article, in other user sessions (simplified version of the association rules technique, with the maximum rule size of two) (Jugovac, 2018) (Ludewig, 2018) 3. Sequential Rules (SR) - A more sophisticated version of association rules, which considers the sequence of clicked items within the session. A rule is created when an item q appeared after an item p in a session, even when other items were viewed between p and q. The rules are weighted by the distance x (number of steps) between p and q in the session with a linear weighting function (Ludewig, 2018)
  • 50. NAR module evaluation 50 Benchmark methods for session-based recommendations: KNN methods 4. Item-kNN - Returns most similar items to the last read article, in terms of the cosine similarity between the vector of their sessions, i.e. it is the number of co-occurrences of two items in sessions divided by the square root of the product of the numbers of sessions in which the individual items are occurred. 5. Vector Multiplication Session-Based kNN (V-SkNN) - Compares the entire active session with past sessions and find items to be recommended. The comparison emphasizes items more recently clicked within the session, when computing the similarities with past sessions (Jannach,2017) (Jugovac,2018) (Ludewig,2018) Other baselines 6. Recently Popular - Recommends the most viewed articles from the last N clicks buffer 7. Content-Based - For each article read by the user, recommends similar articles based on the cosine similarity of their Article Content Embeddings, from the last N clicks buffer.
  • 51. NAR module evaluation 51 Experiment #1 Continuous training and evaluating during 15 days (Oct. 1-15, 2017) Average MRR@5 by hour (evaluation each 5 hours), for a 15-days period
  • 52. NAR module evaluation 52 Experiment #1 Continuous training and evaluating each five hours, during 15 days (Oct. 1-15, 2017) 13% of relative improvement on MRR@5 Distribution of average MRR@5 by hour (sampled for evaluation), for a 15-days period
  • 53. NAR module evaluation 53 Experiment #2 Continuous training and evaluating each hour, on the subsequent day (Oct. 16, 2017) Average MRR@5 by hour, for Oct. 16, 2017
  • 55. ACM RecSys 55 RecSys 2018 - Vancouver, CA ● 12th edition ● 6 days of tutorials, main conference, workshops ● Over 800 attendants ○ 73% from industry ○ Top companies: Amazon, Google, Spotify ● 28% of paper acceptance rate https://recsys.acm.org
  • 57. Questions? Gabriel Moreira - @gspmoreira Lead Data Scientist DSc. student GCG Campinas DataFest 2018