SlideShare une entreprise Scribd logo
“Deep Reinforcement Learning
based Recommendation with
Explicit User-Item Interactions
Modeling“
by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou
Zhang
Presented by Kishor Datta Gupta
Problem
• A Recommender System refers to
a system that is capable of
predicting the future preference of
a set of items for a user, and
recommend the top items..
• How to build an effective
recommender system?
Recommender
System
Analyzed
Content-based collaborative filtering
Matrix factorization based methods
Logistic regression
Factorization machines and its variants
Deep learning models
Multi-armed bandits
Problem in
Existing systems
• They consider the
recommendation procedure as
a static process, i.e., they
assume the underlying user’s
preference keeps static and
they aim to learn the user’s
preference as precise as
possible.
• They are learned to maximize
the immediate rewards of
recommendations, but ignore
the long-term benefits that the
recommendations can make
Analysis on sequential patterns on user’s behavior in
MovieLens and Yahoo!Music datasets
Proposed Solution
• A deep reinforcement learning based recommendation
framework DRR. Unlike the conventional studies, DRR
adopts an “Actor-Critic” structure and treats the
recommendation as a sequential decision making
process, which takes both the immediate and long-
term rewards into consideration
Deep RL based Recommendation (DRR) Framework
Deep RL based Recommendation (DRR) Framework
The Actor network:
The user state, denoted by the embeddings of
her n latest positively interacted items, is
regarded as the input. Then the embeddings are
fed into a state representation module (which will
be introduced in detail later) to produce a
summarized representations for the user.
The top ranked item (w.r.t. the ranking scores) is
recommended to the user.
used ε-greedy exploration technique.
Deep RL based Recommendation (DRR) Framework
The Critic network:
According to the Q-value, the
parameters of the Actor
network are updated in the
direction of improving the
performance of action a Based
on the deterministic policy
gradient theorem
Critic network is updated
accordingly by the temporal-
difference learning approach ,
i.e., minimizing the mean
squared error
Deep RL based Recommendation (DRR) Framework
State Representation Module:
Modeling the feature
interactions explicitly can boost
the performance
DRR-p.
DRR-u
DRR-Ave
Deep RL based Recommendation (DRR) Framework
DRR-p.
Product based neural network for the state representation module
utilizes a product operator to capture the pairwise local dependency between
items.
Deep RL based Recommendation (DRR) Framework
DRR-u.
In DRR-u, we can see that the user embedding is also incorporated. In addition to
the local dependency between items, the pairwise interactions of user-item are
also taken into account.
Deep RL based Recommendation (DRR) Framework
DRR-Ave.
the embeddings of items are first transformed by a weighted average pooling
layer. Then, the resulting vector is leveraged to model the interactions with the
input user. Finally, the embedding of the user, the interaction vector, and the
average pooling result of items are concatenate into a vector to denote the state
representation.
Deep RL based Recommendation (DRR) Training
DRR utilizes the users’ interaction history with the recommender agent as training data.
During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the
user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1.
According to the feedback, the recommender updates its recommendation policy.
The training procedure mainly includes two phases, i.e., transition generation and model updating.
• For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an
actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can
be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the
transition(st,at,rt,st+1)into the replay bufferD.
• In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling
technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’
parameters with the soft replace strategy.
Experiments
Dataset
•Movielens 100k
•Movielens 1m
•Yahoo Music
•Jester
Comparison Method
•Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number
of positive ratings from current available items to the users at each timestep.
•PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements.
•SVD++mixes strengths of the latent model as well as the neighborhood model
•DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies.
Reward Function
•in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j
comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore,
the reward function can be defined as follows: R(s,a) =rate i,j/10
Results
Normalized Discounted Cumulative Gain (NDCG)
Claims
both the immediate and long-term
rewards into account. Incorporated and
three instantiation structures are
designed, which can explicitly model the
interactions between users and items.
Extensive experiments on four real-world
datasets demonstrate the superiority of
the proposed DRR method over state-of-
the-art competitors
My thaughts
A neural network to convert user-
item relation into a state is not clear
to me.
Authors didn’t consider cold-start
problem.
Questions ?

Contenu connexe

Tendances

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
Sang Jun Lee
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
Vivian S. Zhang
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning models
Michael Manapat
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
Owen Zhang
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
SC5.io
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
NAVER Engineering
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Qi Guo
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
Sumit Rangwala
 

Tendances (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning models
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
 

Similaire à Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
Milind Gokhale
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
Shrutika Oswal
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
B1802021823
B1802021823B1802021823
B1802021823
IOSR Journals
 
Summer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversitySummer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversityRishabh Misra
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
LokeshKumarReddy8
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
Crossing Minds
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
YONG ZHENG
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
Harivamshi D
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
Aravindharamanan S
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks
 
Recommender system
Recommender systemRecommender system
Recommender system
Saiguru P.v
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
Ding Li
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learning
Institute of Contemporary Sciences
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
Giuseppe Ricci
 

Similaire à Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling (20)

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
B1802021823
B1802021823B1802021823
B1802021823
 
Summer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversitySummer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar University
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learning
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
 

Plus de Kishor Datta Gupta

GAN introduction.pptx
GAN introduction.pptxGAN introduction.pptx
GAN introduction.pptx
Kishor Datta Gupta
 
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Kishor Datta Gupta
 
A safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable dataA safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable data
Kishor Datta Gupta
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
Kishor Datta Gupta
 
Who is responsible for adversarial defense
Who is responsible for adversarial defenseWho is responsible for adversarial defense
Who is responsible for adversarial defense
Kishor Datta Gupta
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Kishor Datta Gupta
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
Kishor Datta Gupta
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Kishor Datta Gupta
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
Kishor Datta Gupta
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
Kishor Datta Gupta
 
Cyber intrusion
Cyber intrusionCyber intrusion
Cyber intrusion
Kishor Datta Gupta
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
Kishor Datta Gupta
 
Different representation space for MNIST digit
Different representation space for MNIST digitDifferent representation space for MNIST digit
Different representation space for MNIST digit
Kishor Datta Gupta
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
Kishor Datta Gupta
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Kishor Datta Gupta
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
Kishor Datta Gupta
 
Clustering report
Clustering reportClustering report
Clustering report
Kishor Datta Gupta
 
Basic digital image concept
Basic digital image conceptBasic digital image concept
Basic digital image concept
Kishor Datta Gupta
 
An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)
Kishor Datta Gupta
 
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Kishor Datta Gupta
 

Plus de Kishor Datta Gupta (20)

GAN introduction.pptx
GAN introduction.pptxGAN introduction.pptx
GAN introduction.pptx
 
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
 
A safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable dataA safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable data
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
Who is responsible for adversarial defense
Who is responsible for adversarial defenseWho is responsible for adversarial defense
Who is responsible for adversarial defense
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Cyber intrusion
Cyber intrusionCyber intrusion
Cyber intrusion
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
 
Different representation space for MNIST digit
Different representation space for MNIST digitDifferent representation space for MNIST digit
Different representation space for MNIST digit
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
 
Clustering report
Clustering reportClustering report
Clustering report
 
Basic digital image concept
Basic digital image conceptBasic digital image concept
Basic digital image concept
 
An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)
 
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
 

Dernier

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 

Dernier (20)

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 

Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

  • 1. “Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling“ by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou Zhang Presented by Kishor Datta Gupta
  • 2. Problem • A Recommender System refers to a system that is capable of predicting the future preference of a set of items for a user, and recommend the top items.. • How to build an effective recommender system?
  • 3. Recommender System Analyzed Content-based collaborative filtering Matrix factorization based methods Logistic regression Factorization machines and its variants Deep learning models Multi-armed bandits
  • 4. Problem in Existing systems • They consider the recommendation procedure as a static process, i.e., they assume the underlying user’s preference keeps static and they aim to learn the user’s preference as precise as possible. • They are learned to maximize the immediate rewards of recommendations, but ignore the long-term benefits that the recommendations can make Analysis on sequential patterns on user’s behavior in MovieLens and Yahoo!Music datasets
  • 5. Proposed Solution • A deep reinforcement learning based recommendation framework DRR. Unlike the conventional studies, DRR adopts an “Actor-Critic” structure and treats the recommendation as a sequential decision making process, which takes both the immediate and long- term rewards into consideration
  • 6. Deep RL based Recommendation (DRR) Framework
  • 7. Deep RL based Recommendation (DRR) Framework The Actor network: The user state, denoted by the embeddings of her n latest positively interacted items, is regarded as the input. Then the embeddings are fed into a state representation module (which will be introduced in detail later) to produce a summarized representations for the user. The top ranked item (w.r.t. the ranking scores) is recommended to the user. used ε-greedy exploration technique.
  • 8. Deep RL based Recommendation (DRR) Framework The Critic network: According to the Q-value, the parameters of the Actor network are updated in the direction of improving the performance of action a Based on the deterministic policy gradient theorem Critic network is updated accordingly by the temporal- difference learning approach , i.e., minimizing the mean squared error
  • 9. Deep RL based Recommendation (DRR) Framework State Representation Module: Modeling the feature interactions explicitly can boost the performance DRR-p. DRR-u DRR-Ave
  • 10. Deep RL based Recommendation (DRR) Framework DRR-p. Product based neural network for the state representation module utilizes a product operator to capture the pairwise local dependency between items.
  • 11. Deep RL based Recommendation (DRR) Framework DRR-u. In DRR-u, we can see that the user embedding is also incorporated. In addition to the local dependency between items, the pairwise interactions of user-item are also taken into account.
  • 12. Deep RL based Recommendation (DRR) Framework DRR-Ave. the embeddings of items are first transformed by a weighted average pooling layer. Then, the resulting vector is leveraged to model the interactions with the input user. Finally, the embedding of the user, the interaction vector, and the average pooling result of items are concatenate into a vector to denote the state representation.
  • 13. Deep RL based Recommendation (DRR) Training DRR utilizes the users’ interaction history with the recommender agent as training data. During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1. According to the feedback, the recommender updates its recommendation policy. The training procedure mainly includes two phases, i.e., transition generation and model updating. • For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the transition(st,at,rt,st+1)into the replay bufferD. • In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’ parameters with the soft replace strategy.
  • 14. Experiments Dataset •Movielens 100k •Movielens 1m •Yahoo Music •Jester Comparison Method •Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number of positive ratings from current available items to the users at each timestep. •PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements. •SVD++mixes strengths of the latent model as well as the neighborhood model •DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies. Reward Function •in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore, the reward function can be defined as follows: R(s,a) =rate i,j/10
  • 16. Claims both the immediate and long-term rewards into account. Incorporated and three instantiation structures are designed, which can explicitly model the interactions between users and items. Extensive experiments on four real-world datasets demonstrate the superiority of the proposed DRR method over state-of- the-art competitors
  • 17. My thaughts A neural network to convert user- item relation into a state is not clear to me. Authors didn’t consider cold-start problem.