SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Reward-Constrained Interactive Recommendation
with Natural Language Feedback
2020. 02. 24.
Jeong-Gwan Lee
1
"Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning." NeurIPS 2019
(Duke University, Samsung Research America, University at Buffalo)
2
Table of contents
● Visual Item Interactive Recommendation
● Non-Natural Language Feedback
● Natural Language Feedback
● Dataset and Setup
● MDP & Constrained MDP
● Recommendation as MDP
● Reward Constrained Recommender Model
● Model Detail(Feature Extractor, Discriminator, Recommender)
● Reward function
● Recommendation as Constrained MDP
● Model Training
● Evaluation
● Conclusion
3
Visual Item Interactive Recommendation
Recommender system has sought to interact with users,
to adapt to user preferences over time.
• Non-Natural Language Feedback
• Clicking Data
• Updated Rating
They provide little information to reflect complex user attitude.
……Round 1
Round 2
……Round 1
Round 2
0.2 0.2 0.6 0.8
4
Visual Item Interactive Recommendation
Text-based recommendation provides richer user feedback.
• Natural Language Feedback (Not dialogue-based)
This paper targets this setting.
Recommender
Seeker
5
Visual Item Recommendation
with Natural Language Feedback Setting
UT-Zappos50K
• A shoe dataset consisting of 50,025 shoe images.
• Samples
• Labels
6
Visual Item Recommendation
with Natural Language Feedback Setting
UT-Zappos50K
• A shoe dataset consisting of 50,025 shoe images.
• Rich attribute data
1. shoes category(4) = {Shoes, Boots, Sandals, Slippers}
2. shoes subcategory(21) = {Oxfords, MidCalf, Heel, Ankle,…}
3. heel height(7) = {flat, Under 1inch, 1~2inch, 2~3inch,…}
4. closure(18) = {leather, padded, removable,…}
5. gender(8) = {men, women, boys, girls,…}
6. toe style(17) = {Capped, Round, Square,…}
7
Dataset and Setup
User simulator
• Unfortunately, Zappos50K didn’t collect the user’s comments relevant
to attributes with ground truth.
1. Given pairs of recommended item and desired item, (10,000 pairs)
the real-world sentences are collected from annotators.
2. From above, the authors derive several sentence templates and
synthesize 20,000 labeled sentence by filling these templates
with the attribute label.
3. They train a Seq2seq based user simulator.
(input : the difference on one attribute value between two items,
output: a sentence describing the visual attribute difference)
Template
recommended desired
Show me more shoes with round toe.
Gender : Men Gender : Women
I prefer shoes for women.
8
Reward Constrained Recommendation
They propose Reward Constrained Recommendation(RCR),
which sequentially incorporates constraints from previous
feedback.
• A constraint-augmented RL problem setting
• A learnable discriminator to detect violations of user
preferences in an adversarial manner
9
MDP & Constrained MDP
MDP(Markov Decision Process)
Constrained MDP
10
Recommendation as MDP
We can model the recommendation-feedback loop as an MDP,
abstractly.
Recommender
Seeker
𝒔 𝟏
𝒂 𝟏
𝒙 𝟏
𝒓 𝟏?
𝒔 𝟐
𝒂 𝟐
𝒙 𝟐
𝒓 𝟐?
𝒔 𝟑
𝒂 𝟑
𝒙 𝟑
𝒓 𝟑?
𝒔 𝟒
𝒓 𝟒?
𝒂 𝟒
𝒙 𝟒
11
Remind of dataset
UT-Zappos50K
• A shoe dataset consisting of 50,025 shoe images.
• Rich attribute data (shoes category(4), shoes subcategory(21), heel
height(7), closure(18), gender(8) and toe style(17))
• Samples
• Labels
12
Reward Constrained Recommender Model
Feature Extractor (extract features of feedback, recommended items)
Recommender (predict attributes, match, and recommend)
Discriminator (prevent constraint violation)
13
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
14
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
Cat : Shoes
SubCat : Dress shoes
HeelHei. : X
Closure : …
Attributes (at training time)
15
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
ResNet50
AttrNet
Concat
Visual Encoder
Cat : Shoes
SubCat : Dress shoes
HeelHei. : X
Closure : …
Attributes (at training time)
16
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
ResNet50
AttrNet
Concat
Visual Encoder
Cat : Shoes
SubCat : Dress shoes
HeelHei. : X
Closure : …
Attributes (at training time)
Category(4)
SubCategory(21)
Heel Height(7)
AttrNet
…
ResNet
Features
Attribute Net
17
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
Feature Representation
18
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
Categorical
Sampling!
FCs
FCs
…
Policy 𝝅 𝜽 with multi-discrete action space
Softmax
Softmax
FCs Softmax
Category(4)
SubCategory(21)
Heel Height(7)…
Feature Representation
19
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
ResNet50 AttrNet
Visual Encoder
Categorical
Sampling!
FCs
FCs
…
Policy 𝝅 𝜽 with multi-discrete action space
Softmax
Softmax
FCs Softmax
Category(4)
SubCategory(21)
Heel Height(7)…
Feature Representation
20
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
ResNet50 AttrNet
Visual Encoder
Categorical
Sampling!
FCs
FCs
…
Policy 𝝅 𝜽 with multi-discrete action space
Softmax
Softmax
FCs Softmax
Category(4)
SubCategory(21)
Heel Height(7)…
Feature Representation
Category = shoes
SubCat = heel
Heel.H = 3 inch.
[1,0,0,0]
[0,0,0,1,….]
[0,0,1,0,….]
Categorical Sampling Results
…
Euclidean
distance
Distance-based Matching
21
Reward function
Reward : the visual and attribute similarity between the
recommended and desired items.
• It is desired that the recommended one becomes more similar to the
desired one with more interaction
• We want to minimize visual and attribute difference.
• to ensure the scales of the two distances are similar
• If the system can’t find the desired item before 50 iterations,
the system will receive an extra reward -3 (as a penalty)
Recommender
Seeker
22
Why explicitly constraints need?
RL algorithms which doesn’t consider constraints easily violate
preference from past feedback, since it needs to explore new items
for further improvement.
• Success case
• Failure case
Recommender
Seeker
23
Discriminator
Discriminator 𝐶" outputs whether the recommended item
violates the user comment.
𝑥!"# : I prefer leather.
𝑥! : I prefer high heel.
…
Feedback History
24
Collecting (non-)violation distribution
One user session
User session finish!
25
Collecting (non-)violation distribution
One user session
Non-violation pair
26
Collecting (non-)violation distribution
One user session
Violation pair
27
Collecting (non-)violation distribution
One user session
Non-violation pair
28
Discriminator
A discriminator is defined as a constraint function.
• Discriminator training
• 𝐶" 𝒔, 𝒂 is induced to 1, if violation.
• 𝐶" 𝒔, 𝒂 is induced to 0, if non-violation.
violation pair non-violation pair
29
Collecting (non-)violation distribution
Discriminator is updated after each user session.
It can’t be pretrained.
• To judge violations or not, we need sequential feedbacks.
• But the dataset doesn’t have sequential feedback.
(only user simulator)
One user session
User session finish!
30
Remind: Reward Constrained Recommender Model
Feature Extractor (extract features of feedback, rec. items)
Discriminator (prevent constraint violation)
Recommender (predict attributes, match, and recommend)
𝑪 𝝓(𝒔, 𝒂)
𝝅 𝜽(𝒂|𝐬)
31
Recommendation as Constrained MDP
Directly solving the constrained-optimization is difficult,
Lagrange relaxation transforms the objective to dual problem.
• Primal problem
• Dual problem(refer to Appendix: Lagrange Relaxation)
• Lagrangian function
• Relaxed objective
Lagrange multiplier
32
Recommendation as Constrained MDP
The goal is to find a saddle point,
can be achieved by alternating gradient descent/ascent
approximately.
Reward function with constraints penalizes the policy for violation.
𝜆 is also optimized to ensure the constraints.
1) If violations happen, 𝜆 will increase to penalize the policy.
2) If there is no violation, 𝜆 will decrease to give the policy more reward
Reward function with Constraints
33
Model Training
Reward Constrained Recommendation Process
• Alternatively training the discriminator 𝐶& and the recommender 𝜋'
: a projection operator, which
keeps the stability as the parameters
are updated within a trust region[1]
: projects 𝜆 into the range [0, 𝜆()*]
[1] Schulman, John, et al. "Trust region policy optimization." International conference on machine
learning. 2015.
One user session
34
Evaluation
SR@K : Success Rate after K interactions
NI : Number of user Interactions before success
NV : Number of Violated attributes compared with the desired
attributes of users
𝜆 increases at early stage
(since violation ↑),
𝜆 becomes stable more.
𝜆 ≈ 0.04 is automatically learned
discriminator weight.
35
Evaluation
RL baseline : ignoring the constraints.
RL + Naive constraints : Fixed the lagrange multiplier 𝜆
• All models are trained for 100,000 iterations (user sessions)
• Seen : training data
• Unseen : test data
• Averaged over 100 sessions with standard error
The learned constraint (discriminator) has better generalization.
36
Conclusion
They propose Reward Constrained Recommendation(RCR), which
sequentially incorporates constraints from previous feedback.
• A constraint-augmented RL problem setting
• A learnable discriminator to detect violations of user preferences in an
adversarial manner
The proposed method can be extended to other applications,
such as,
1. vision-and-dialogue navigation
2. Interactive Recommendation with user’s prior information
3. Dialogue-based Recommendation
37
Appendix: Lagrange Relaxation
38
Appendix: Generated feedback
Simulator only generates simple comments on the visual
attribute difference between the candidate image and the
desired image
39
Appendix: Hyperparameter setting
In reinforcement learning, they use Adam as the optimizer.
They set ,
• 𝛼 : threshold of constraints (refer to page 15)
• 𝜆()* : projection boundary of 𝜆

Contenu connexe

Tendances

“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...Edge AI and Vision Alliance
 
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...Jian Wu
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...LEE HOSEONG
 
Learning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifierLearning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifierJian Wu
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceLEE HOSEONG
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...SBGC
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...Edge AI and Vision Alliance
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summaryankit_ppt
 
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...IJMER
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Iaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd Iaetsd
 
Image super resolution based on
Image super resolution based onImage super resolution based on
Image super resolution based onjpstudcorner
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYcsandit
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Ontico
 

Tendances (20)

“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
 
C3 w1
C3 w1C3 w1
C3 w1
 
C3 w2
C3 w2C3 w2
C3 w2
 
Learning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifierLearning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifier
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
 
Technical Portion of PhD Research
Technical Portion of PhD ResearchTechnical Portion of PhD Research
Technical Portion of PhD Research
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
 
C3 w5
C3 w5C3 w5
C3 w5
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Iaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognition
 
Image super resolution based on
Image super resolution based onImage super resolution based on
Image super resolution based on
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
 

Similaire à Reward constrained interactive recommendation with natural language feedback noani

[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative FilteringYONG ZHENG
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profilingyingfeng
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ DataikuPAPIs.io
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Suvadip Shome
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD WorkshopWolfram Arnold
 
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...YONG ZHENG
 
Silhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human posesSilhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human posesAVVENIRE TECHNOLOGIES
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 

Similaire à Reward constrained interactive recommendation with natural language feedback noani (20)

[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profiling
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
 
Silhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human posesSilhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human poses
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 

Dernier

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Dernier (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

Reward constrained interactive recommendation with natural language feedback noani

  • 1. Reward-Constrained Interactive Recommendation with Natural Language Feedback 2020. 02. 24. Jeong-Gwan Lee 1 "Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning." NeurIPS 2019 (Duke University, Samsung Research America, University at Buffalo)
  • 2. 2 Table of contents ● Visual Item Interactive Recommendation ● Non-Natural Language Feedback ● Natural Language Feedback ● Dataset and Setup ● MDP & Constrained MDP ● Recommendation as MDP ● Reward Constrained Recommender Model ● Model Detail(Feature Extractor, Discriminator, Recommender) ● Reward function ● Recommendation as Constrained MDP ● Model Training ● Evaluation ● Conclusion
  • 3. 3 Visual Item Interactive Recommendation Recommender system has sought to interact with users, to adapt to user preferences over time. • Non-Natural Language Feedback • Clicking Data • Updated Rating They provide little information to reflect complex user attitude. ……Round 1 Round 2 ……Round 1 Round 2 0.2 0.2 0.6 0.8
  • 4. 4 Visual Item Interactive Recommendation Text-based recommendation provides richer user feedback. • Natural Language Feedback (Not dialogue-based) This paper targets this setting. Recommender Seeker
  • 5. 5 Visual Item Recommendation with Natural Language Feedback Setting UT-Zappos50K • A shoe dataset consisting of 50,025 shoe images. • Samples • Labels
  • 6. 6 Visual Item Recommendation with Natural Language Feedback Setting UT-Zappos50K • A shoe dataset consisting of 50,025 shoe images. • Rich attribute data 1. shoes category(4) = {Shoes, Boots, Sandals, Slippers} 2. shoes subcategory(21) = {Oxfords, MidCalf, Heel, Ankle,…} 3. heel height(7) = {flat, Under 1inch, 1~2inch, 2~3inch,…} 4. closure(18) = {leather, padded, removable,…} 5. gender(8) = {men, women, boys, girls,…} 6. toe style(17) = {Capped, Round, Square,…}
  • 7. 7 Dataset and Setup User simulator • Unfortunately, Zappos50K didn’t collect the user’s comments relevant to attributes with ground truth. 1. Given pairs of recommended item and desired item, (10,000 pairs) the real-world sentences are collected from annotators. 2. From above, the authors derive several sentence templates and synthesize 20,000 labeled sentence by filling these templates with the attribute label. 3. They train a Seq2seq based user simulator. (input : the difference on one attribute value between two items, output: a sentence describing the visual attribute difference) Template recommended desired Show me more shoes with round toe. Gender : Men Gender : Women I prefer shoes for women.
  • 8. 8 Reward Constrained Recommendation They propose Reward Constrained Recommendation(RCR), which sequentially incorporates constraints from previous feedback. • A constraint-augmented RL problem setting • A learnable discriminator to detect violations of user preferences in an adversarial manner
  • 9. 9 MDP & Constrained MDP MDP(Markov Decision Process) Constrained MDP
  • 10. 10 Recommendation as MDP We can model the recommendation-feedback loop as an MDP, abstractly. Recommender Seeker 𝒔 𝟏 𝒂 𝟏 𝒙 𝟏 𝒓 𝟏? 𝒔 𝟐 𝒂 𝟐 𝒙 𝟐 𝒓 𝟐? 𝒔 𝟑 𝒂 𝟑 𝒙 𝟑 𝒓 𝟑? 𝒔 𝟒 𝒓 𝟒? 𝒂 𝟒 𝒙 𝟒
  • 11. 11 Remind of dataset UT-Zappos50K • A shoe dataset consisting of 50,025 shoe images. • Rich attribute data (shoes category(4), shoes subcategory(21), heel height(7), closure(18), gender(8) and toe style(17)) • Samples • Labels
  • 12. 12 Reward Constrained Recommender Model Feature Extractor (extract features of feedback, recommended items) Recommender (predict attributes, match, and recommend) Discriminator (prevent constraint violation)
  • 13. 13 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • 14. 14 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Cat : Shoes SubCat : Dress shoes HeelHei. : X Closure : … Attributes (at training time)
  • 15. 15 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. ResNet50 AttrNet Concat Visual Encoder Cat : Shoes SubCat : Dress shoes HeelHei. : X Closure : … Attributes (at training time)
  • 16. 16 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. ResNet50 AttrNet Concat Visual Encoder Cat : Shoes SubCat : Dress shoes HeelHei. : X Closure : … Attributes (at training time) Category(4) SubCategory(21) Heel Height(7) AttrNet … ResNet Features Attribute Net
  • 17. 17 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. Feature Representation
  • 18. 18 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. Categorical Sampling! FCs FCs … Policy 𝝅 𝜽 with multi-discrete action space Softmax Softmax FCs Softmax Category(4) SubCategory(21) Heel Height(7)… Feature Representation
  • 19. 19 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. ResNet50 AttrNet Visual Encoder Categorical Sampling! FCs FCs … Policy 𝝅 𝜽 with multi-discrete action space Softmax Softmax FCs Softmax Category(4) SubCategory(21) Heel Height(7)… Feature Representation
  • 20. 20 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. ResNet50 AttrNet Visual Encoder Categorical Sampling! FCs FCs … Policy 𝝅 𝜽 with multi-discrete action space Softmax Softmax FCs Softmax Category(4) SubCategory(21) Heel Height(7)… Feature Representation Category = shoes SubCat = heel Heel.H = 3 inch. [1,0,0,0] [0,0,0,1,….] [0,0,1,0,….] Categorical Sampling Results … Euclidean distance Distance-based Matching
  • 21. 21 Reward function Reward : the visual and attribute similarity between the recommended and desired items. • It is desired that the recommended one becomes more similar to the desired one with more interaction • We want to minimize visual and attribute difference. • to ensure the scales of the two distances are similar • If the system can’t find the desired item before 50 iterations, the system will receive an extra reward -3 (as a penalty)
  • 22. Recommender Seeker 22 Why explicitly constraints need? RL algorithms which doesn’t consider constraints easily violate preference from past feedback, since it needs to explore new items for further improvement. • Success case • Failure case Recommender Seeker
  • 23. 23 Discriminator Discriminator 𝐶" outputs whether the recommended item violates the user comment. 𝑥!"# : I prefer leather. 𝑥! : I prefer high heel. … Feedback History
  • 24. 24 Collecting (non-)violation distribution One user session User session finish!
  • 25. 25 Collecting (non-)violation distribution One user session Non-violation pair
  • 26. 26 Collecting (non-)violation distribution One user session Violation pair
  • 27. 27 Collecting (non-)violation distribution One user session Non-violation pair
  • 28. 28 Discriminator A discriminator is defined as a constraint function. • Discriminator training • 𝐶" 𝒔, 𝒂 is induced to 1, if violation. • 𝐶" 𝒔, 𝒂 is induced to 0, if non-violation. violation pair non-violation pair
  • 29. 29 Collecting (non-)violation distribution Discriminator is updated after each user session. It can’t be pretrained. • To judge violations or not, we need sequential feedbacks. • But the dataset doesn’t have sequential feedback. (only user simulator) One user session User session finish!
  • 30. 30 Remind: Reward Constrained Recommender Model Feature Extractor (extract features of feedback, rec. items) Discriminator (prevent constraint violation) Recommender (predict attributes, match, and recommend) 𝑪 𝝓(𝒔, 𝒂) 𝝅 𝜽(𝒂|𝐬)
  • 31. 31 Recommendation as Constrained MDP Directly solving the constrained-optimization is difficult, Lagrange relaxation transforms the objective to dual problem. • Primal problem • Dual problem(refer to Appendix: Lagrange Relaxation) • Lagrangian function • Relaxed objective Lagrange multiplier
  • 32. 32 Recommendation as Constrained MDP The goal is to find a saddle point, can be achieved by alternating gradient descent/ascent approximately. Reward function with constraints penalizes the policy for violation. 𝜆 is also optimized to ensure the constraints. 1) If violations happen, 𝜆 will increase to penalize the policy. 2) If there is no violation, 𝜆 will decrease to give the policy more reward Reward function with Constraints
  • 33. 33 Model Training Reward Constrained Recommendation Process • Alternatively training the discriminator 𝐶& and the recommender 𝜋' : a projection operator, which keeps the stability as the parameters are updated within a trust region[1] : projects 𝜆 into the range [0, 𝜆()*] [1] Schulman, John, et al. "Trust region policy optimization." International conference on machine learning. 2015. One user session
  • 34. 34 Evaluation SR@K : Success Rate after K interactions NI : Number of user Interactions before success NV : Number of Violated attributes compared with the desired attributes of users 𝜆 increases at early stage (since violation ↑), 𝜆 becomes stable more. 𝜆 ≈ 0.04 is automatically learned discriminator weight.
  • 35. 35 Evaluation RL baseline : ignoring the constraints. RL + Naive constraints : Fixed the lagrange multiplier 𝜆 • All models are trained for 100,000 iterations (user sessions) • Seen : training data • Unseen : test data • Averaged over 100 sessions with standard error The learned constraint (discriminator) has better generalization.
  • 36. 36 Conclusion They propose Reward Constrained Recommendation(RCR), which sequentially incorporates constraints from previous feedback. • A constraint-augmented RL problem setting • A learnable discriminator to detect violations of user preferences in an adversarial manner The proposed method can be extended to other applications, such as, 1. vision-and-dialogue navigation 2. Interactive Recommendation with user’s prior information 3. Dialogue-based Recommendation
  • 38. 38 Appendix: Generated feedback Simulator only generates simple comments on the visual attribute difference between the candidate image and the desired image
  • 39. 39 Appendix: Hyperparameter setting In reinforcement learning, they use Adam as the optimizer. They set , • 𝛼 : threshold of constraints (refer to page 15) • 𝜆()* : projection boundary of 𝜆