SlideShare a Scribd company logo
1 of 40
Download to read offline
Counterfactual Learning for Recommendation
Olivier Jeunen,
Dmytro Mykhaylov, David Rohde, Flavian Vasile, Alexandre Gilotte, Martin Bompaire
September 25, 2019
Adrem Data Lab, University of Antwerp
Criteo AI Lab, Paris
olivier.jeunen@uantwerp.be
1
Table of contents
1. Introduction
2. Methods
3. Learning for Recommendation
4. Experiments
5. Conclusion
2
Introduction
Introduction - Recommender Systems
Motivation
• Web-scale systems (Amazon, Google, Netflix, Spotify,. . . )
typically have millions of items in their catalogue.
• Users are often only interested in a handful of them.
• Recommendation Systems aim to identify these items for every user,
encouraging users to engage with relevant content.
3
4
Introduction
Traditional Approaches
• Typically based on collaborative
filtering on the user-item matrix:
o Nearest-neighbour models,
o Latent factor models,
o Neural networks,
o . . .
• Goal is to identify which items the user
interacted with in a historical dataset,
regardless of the recommender.




















0 0 0 . . . 0 1 0
1 0 0 . . . 0 0 1
0 0 0 . . . 1 0 0
0 0 1 . . . 0 0 0
. . . . . . . . . . . . . . . . . . . . .
0 1 0 . . . 0 1 0
0 0 0 . . . 0 1 0
0 1 1 . . . 0 0 0
0 0 0 . . . 1 0 0
1 0 1 . . . 0 1 0




















5
Introduction
Learning from Bandit Feedback
• Why not learn directly from the recommender’s logs?
What was shown in what context and what happened as a result?
• Not straightforward, as we only observe the
result of recommendations we actually show.
• Broad existing literature on Counterfactual Risk Minimisation (CRM)
exists, but has never been validated in a recommendation context.
6
Introduction - Reinforcement Learning Parallels
Figure 1: Schematic representation of the reinforcement learning paradigm. 7
Methods
Background
Notation
We assume:
• A stochastic logging policy π0 that describes a probability distribution
over actions, conditioned on the context.
• Dataset of logged feedback D with N tuples (x, a, p, c) with
x ∈ Rn a context vector (historical counts),
a ∈ [1, n] an action identifier,
p ≡ π0(a|x) the logging propensity,
c ∈ {0, 1} the observed reward (click).
8
Methods: Value-based
Likelihood (Logistic Regression) Hosmer Jr. et al. [2013]
Model the probability of a click, conditioned on the action and context:
P(c = 1|x, a) (1)
You can optimise your favourite classifier for this! (e.g. Logistic Regression)
Obtain a decision rule from:
a∗
= arg max
a
P(c = 1|x, a). (2)
9
Methods: Value-based
IPS-weighted Likelihood Storkey [2009]
Naturally, as the logging policy is trying to achieve some goal (e.g. clicks,
views, dwell time, . . . ), it will take some actions more often than others.
We can use Inverse Propensity Scoring (IPS) to force the error of the fit
to be distributed evenly across the action space.
Reweight samples (x, a) by:
1
π0(a|x)
(3)
10
Methods: Policy-based
Contextual Bandit Bottou et al. [2013]
Model the counterfactual reward:
“How many clicks would a policy πθ have gotten if it was deployed instead of π0?”
Directly optimise πθ, with θ ∈ Rn×n the model parameters:
P(a|x, θ) = πθ(a|x) (4)
θ∗
= arg max
θ
N
i=1
ci
πθ(ai |xi)
π0(ai |xi)
(5)
a∗
= arg max
a
P(a|x, θ) (6)
11
Methods: Policy-based
POEM Swaminathan and Joachims [2015a]
IPS estimators tend to have high variance, clip weights and introduce
sample variance penalisation:
θ∗
= arg max
θ
1
N
N
i=1
ci min M,
πθ(ai |xi)
π0(ai |xi)
− λ
Varθ
N
(7)
12
Methods: Policy-based
NormPOEM Swaminathan and Joachims [2015b]
Variance penalisation is insufficient, use the self-normalised IPS estimator:
θ∗
= arg max
θ
N
i=1 ci
πθ(ai |xi)
π0(ai |xi)
N
i=1
πθ(ai |xi)
π0(ai |xi)
− λ
Varθ
N
(8)
BanditNet Joachims et al. [2018]
Equivalent to a certain optimal translation of the reward:
θ∗
= arg max
θ
N
i=1
(ci − γ)
πθ(ai |xi)
π0(ai |xi)
(9)
13
Methods: Overview
Family Method P(c|x, a) P(a|x) IPS SVP Equivariant
Value learning
Likelihood
IPS Likelihood
Policy learning
Contextual Bandit
POEM
BanditNet
Table 1: An overview of the methods we discuss in our work.
14
Learning for Recommendation
Learning for Recommendation
Up until now, most of these methods have been evaluated on a simulated
bandit-feedback setting for multi-class or multi-label classification tasks.
Recommendation, however, brings along specific issues such as:
o Stochastic rewards
o Sparse rewards
15
Stochastic Rewards
Contextual Bandits, POEM and BanditNet all use variants of the
empirical IPS estimator of the reward for a new policy πθ, given
samples D collected under logging policy π0.
ˆRIPS(πθ, D) =
N
i=1
ci
πθ(ai |xi)
π0(ai |xi)
(10)
We propose the use of a novel, logarithmic variant of this estimator.
ˆRln(IPS)(πθ, D) =
N
i=1
ci
ln(πθ(ai |xi ))
π0(ai |xi )
(11)
16
Example: Deterministic Multi-class Rewards
Either action a or b is correct, let’s assume it’s action a.
Thus, we have logged samples (a, c = 1) and (b, c = 0).
0.0 0.2 0.4 0.6 0.8 1.0
p(a) 1 p(b)
0.0
0.5
1.0
1.5
2.0
RIPS
Multi-class rewards
20
15
10
5
0
Rln(IPS)
17
Example: Deterministic Multi-label Rewards
Both action a or b can be correct, let’s assume they are.
Thus, we have logged samples (a, c = 1) and (b, c = 1).
0.0 0.2 0.4 0.6 0.8 1.0
p(a) 1 p(b)
1.90
1.95
2.00
2.05
2.10
RIPS
Multi-label rewards
20
15
10
5
Rln(IPS)
18
Example: Stochastic Multi-label Rewards
Both action a or b can be correct, let’s assume they are. Thus, we can have
logged samples (a, c = 1), (a, c = 0), (b, c = 1) and (b, c = 0).
Assume we have observed 2 clicks on a, and 1 on b.
0.0 0.2 0.4 0.6 0.8 1.0
p(a) 1 p(b)
2.0
2.5
3.0
3.5
4.0RIPS
Stochastic multi-label rewards
20
15
10
5
0
Rln(IPS)
p(a) = 2/3
19
Stochastic Rewards
• ˆRln(IPS) can be seen as a more strict version of ˆRIPS:
missing a single sample completely leads to an infinite loss.
20
Stochastic Rewards
• ˆRln(IPS) can be seen as a more strict version of ˆRIPS:
missing a single sample completely leads to an infinite loss.
• ˆRln(IPS) takes into account all positive samples instead of only the
empirical best arm. Intuitively, this might lead to less overfitting.
20
Stochastic Rewards
• ˆRln(IPS) can be seen as a more strict version of ˆRIPS:
missing a single sample completely leads to an infinite loss.
• ˆRln(IPS) takes into account all positive samples instead of only the
empirical best arm. Intuitively, this might lead to less overfitting.
• ˆRln(IPS) can be straightforwardly plugged into existing methods
such as contextual bandits, POEM and BanditNet.
20
Sparse Rewards
Policy-based methods tend to ignore negative feedback, but exhibit robust
performance. Value-based methods are much more sensitive to the input data,
with high variance in their performance as a result.
Why not combine them?
21
Dual Bandit
Jointly optimise the Contextual Bandit and Likelihood objectives to get
the best of both worlds:
θ∗
= arg max
θ
(1 − α)
N
i=1
ci
πθ(ai , xi)
π0(ai , xi)
+α
N
i=1
ci ln (σ(xi θ·,ai )) + (1 − ci ) ln (1 − σ(xi θ·,ai ))
(12)
where 0 ≤ α ≤ 1 regulates rescaling and reweighting.
22
Dual Bandit
Family Method P(c|x, a) P(a|x) IPS SVP Equivariant
Value learning
Likelihood
IPS Likelihood
Policy learning
Contextual Bandit
POEM
BanditNet
Joint learning Dual Bandit
Table 2: Where the Dual Bandit fits in the bigger picture.
23
Experiments
Experimental Setup
All code is written in PyTorch, and all models are optimised through LBFGS.
We adopt RecoGym as simulation environment, and consider four logging
policies:
• Popularity-based (no support over all actions)
πpop(a|x) =
xa
n
i=1 xi
• Popularity-based (with support over all actions, = 1
2)
πpop-eps(a|x) =
xa +
n
i=1 xi +
24
Experimental Setup
• Inverse popularity-based
πinv-pop(a|x) =
1 − πpop(a|x)
n
i=1 1 − πpop(a|x)
• Uniform
πuniform(a|x) =
1
n
25
Experimental Results
The research questions we aim to answer are the following:
RQ1 How does the logged IPS estimator ˆRln(IPS) influence the
performance of counterfactual learning methods?
RQ2 How do the various methods presented in this paper compare in
terms of performance in a recommendation setting?
RQ3 How sensitive is the performance of the learned models with
respect to the quality of the initial logging policy π0?
RQ4 How do the number of items n and the number of available
samples N influence performance?
26
RQ1 - Impact of ˆRln(IPS)
Contextual
Bandit
POEM BanditNet Dual
Bandit
1.0
1.1
1.2
1.3
1.4
1.5
1.6
CTR
1e 2 Effect of Rln(IPS)
RIPS
Rln(IPS)
Figure 2: Averaged CTR for models trained for varying objective functions. 27
RQ2-4 - Performance Comparison under varying settings
0.25 0.50 0.75 1.00
# Users in 1e4
1.0
1.2
1.4
1.6
1.8
CTR
1e 2 Popularity ( =0)
Logging
Skyline
Likelihood
IPS Likelihood
Contextual Bandit
POEM
BanditNet
Dual Bandit
0.25 0.50 0.75 1.00
# Users in 1e4
1.0
1.2
1.4
1.6
1.8
1e 2 Popularity ( =1/2)
0.25 0.50 0.75 1.00
# Users in 1e4
1.0
1.2
1.4
1.6
1.8
1e 2 Uniform
0.25 0.50 0.75 1.00
# Users in 1e4
1.0
1.2
1.4
1.6
1.8
1e 2 Inverse Popularity
Figure 3: Simulated A/B-test results for various models trained on data collected under
various logging policies. We increase the size of the training set over the x axis (n = 10).
28
RQ2-4 - Performance Comparison under varying settings
0.25 0.50 0.75 1.00
# Users in 1e4
1.00
1.25
1.50
1.75
2.00
2.25
2.50
CTR
1e 2 Popularity ( =0)
Logging
Skyline
Likelihood
IPS Likelihood
Contextual Bandit
POEM
BanditNet
Dual Bandit
0.25 0.50 0.75 1.00
# Users in 1e4
1.00
1.25
1.50
1.75
2.00
2.25
2.50 1e 2 Popularity ( =1/2)
0.25 0.50 0.75 1.00
# Users in 1e4
1.00
1.25
1.50
1.75
2.00
2.25
2.50 1e 2 Uniform
0.25 0.50 0.75 1.00
# Users in 1e4
1.00
1.25
1.50
1.75
2.00
2.25
2.50 1e 2 Inverse Popularity
Figure 4: Simulated A/B-test results for various models trained on data collected under
various logging policies. We increase the size of the training set over the x axis (n = 50).
29
Conclusion
Conclusion
• Counterfactual learning approaches can achieve decent performance
on recommendation tasks.
• Performance can be improved by straightforward adaptations to deal
with e.g. stochastic rewards.
• Performance is dependent on the amount of randomisation in the
logging policy, but even for policies without full support over the action
space, decent performance can be achieved.
30
Questions?
31
References i
References
L. Bottou, J. Peters, J. Qui˜nonero-Candela, D. Charles, D. Chickering, E. Portugaly,
D. Ray, P. Simard, and E. Snelson. Counterfactual reasoning and learning systems:
The example of computational advertising. The Journal of Machine Learning
Research, 14(1):3207–3260, 2013.
D. Hosmer Jr., S. Lemeshow, and R. Sturdivant. Applied logistic regression, volume
398. John Wiley & Sons, 2013.
32
References ii
T. Joachims, A. Swaminathan, and M. de Rijke. Deep learning with logged bandit
feedback. In Proc. of the 6th International Conference on Learning Representations,
ICLR ’18, 2018.
A. Storkey. When training and test sets are different: characterizing learning transfer.
Dataset shift in machine learning, pages 3–28, 2009.
A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from
logged bandit feedback. In Proc. of the 32nd International Conference on
International Conference on Machine Learning - Volume 37, ICML’15, pages
814–823. JMLR.org, 2015a.
A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual
learning. In Advances in Neural Information Processing Systems, pages 3231–3239,
2015b.
33

More Related Content

What's hot

Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix ScaleJustin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 

What's hot (20)

Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 

Similar to Counterfactual Learning for Recommendation

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsMax Pagels
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...Olivier Jeunen
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final finaldinesh malla
 
Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Flavian Vasile
 
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...SYRTO Project
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...PyData
 

Similar to Counterfactual Learning for Recommendation (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2
 
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Predictive Testing
Predictive TestingPredictive Testing
Predictive Testing
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...
PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...
PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...
 

Recently uploaded

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Counterfactual Learning for Recommendation

  • 1. Counterfactual Learning for Recommendation Olivier Jeunen, Dmytro Mykhaylov, David Rohde, Flavian Vasile, Alexandre Gilotte, Martin Bompaire September 25, 2019 Adrem Data Lab, University of Antwerp Criteo AI Lab, Paris olivier.jeunen@uantwerp.be 1
  • 2. Table of contents 1. Introduction 2. Methods 3. Learning for Recommendation 4. Experiments 5. Conclusion 2
  • 4. Introduction - Recommender Systems Motivation • Web-scale systems (Amazon, Google, Netflix, Spotify,. . . ) typically have millions of items in their catalogue. • Users are often only interested in a handful of them. • Recommendation Systems aim to identify these items for every user, encouraging users to engage with relevant content. 3
  • 5. 4
  • 6. Introduction Traditional Approaches • Typically based on collaborative filtering on the user-item matrix: o Nearest-neighbour models, o Latent factor models, o Neural networks, o . . . • Goal is to identify which items the user interacted with in a historical dataset, regardless of the recommender.                     0 0 0 . . . 0 1 0 1 0 0 . . . 0 0 1 0 0 0 . . . 1 0 0 0 0 1 . . . 0 0 0 . . . . . . . . . . . . . . . . . . . . . 0 1 0 . . . 0 1 0 0 0 0 . . . 0 1 0 0 1 1 . . . 0 0 0 0 0 0 . . . 1 0 0 1 0 1 . . . 0 1 0                     5
  • 7. Introduction Learning from Bandit Feedback • Why not learn directly from the recommender’s logs? What was shown in what context and what happened as a result? • Not straightforward, as we only observe the result of recommendations we actually show. • Broad existing literature on Counterfactual Risk Minimisation (CRM) exists, but has never been validated in a recommendation context. 6
  • 8. Introduction - Reinforcement Learning Parallels Figure 1: Schematic representation of the reinforcement learning paradigm. 7
  • 10. Background Notation We assume: • A stochastic logging policy π0 that describes a probability distribution over actions, conditioned on the context. • Dataset of logged feedback D with N tuples (x, a, p, c) with x ∈ Rn a context vector (historical counts), a ∈ [1, n] an action identifier, p ≡ π0(a|x) the logging propensity, c ∈ {0, 1} the observed reward (click). 8
  • 11. Methods: Value-based Likelihood (Logistic Regression) Hosmer Jr. et al. [2013] Model the probability of a click, conditioned on the action and context: P(c = 1|x, a) (1) You can optimise your favourite classifier for this! (e.g. Logistic Regression) Obtain a decision rule from: a∗ = arg max a P(c = 1|x, a). (2) 9
  • 12. Methods: Value-based IPS-weighted Likelihood Storkey [2009] Naturally, as the logging policy is trying to achieve some goal (e.g. clicks, views, dwell time, . . . ), it will take some actions more often than others. We can use Inverse Propensity Scoring (IPS) to force the error of the fit to be distributed evenly across the action space. Reweight samples (x, a) by: 1 π0(a|x) (3) 10
  • 13. Methods: Policy-based Contextual Bandit Bottou et al. [2013] Model the counterfactual reward: “How many clicks would a policy πθ have gotten if it was deployed instead of π0?” Directly optimise πθ, with θ ∈ Rn×n the model parameters: P(a|x, θ) = πθ(a|x) (4) θ∗ = arg max θ N i=1 ci πθ(ai |xi) π0(ai |xi) (5) a∗ = arg max a P(a|x, θ) (6) 11
  • 14. Methods: Policy-based POEM Swaminathan and Joachims [2015a] IPS estimators tend to have high variance, clip weights and introduce sample variance penalisation: θ∗ = arg max θ 1 N N i=1 ci min M, πθ(ai |xi) π0(ai |xi) − λ Varθ N (7) 12
  • 15. Methods: Policy-based NormPOEM Swaminathan and Joachims [2015b] Variance penalisation is insufficient, use the self-normalised IPS estimator: θ∗ = arg max θ N i=1 ci πθ(ai |xi) π0(ai |xi) N i=1 πθ(ai |xi) π0(ai |xi) − λ Varθ N (8) BanditNet Joachims et al. [2018] Equivalent to a certain optimal translation of the reward: θ∗ = arg max θ N i=1 (ci − γ) πθ(ai |xi) π0(ai |xi) (9) 13
  • 16. Methods: Overview Family Method P(c|x, a) P(a|x) IPS SVP Equivariant Value learning Likelihood IPS Likelihood Policy learning Contextual Bandit POEM BanditNet Table 1: An overview of the methods we discuss in our work. 14
  • 18. Learning for Recommendation Up until now, most of these methods have been evaluated on a simulated bandit-feedback setting for multi-class or multi-label classification tasks. Recommendation, however, brings along specific issues such as: o Stochastic rewards o Sparse rewards 15
  • 19. Stochastic Rewards Contextual Bandits, POEM and BanditNet all use variants of the empirical IPS estimator of the reward for a new policy πθ, given samples D collected under logging policy π0. ˆRIPS(πθ, D) = N i=1 ci πθ(ai |xi) π0(ai |xi) (10) We propose the use of a novel, logarithmic variant of this estimator. ˆRln(IPS)(πθ, D) = N i=1 ci ln(πθ(ai |xi )) π0(ai |xi ) (11) 16
  • 20. Example: Deterministic Multi-class Rewards Either action a or b is correct, let’s assume it’s action a. Thus, we have logged samples (a, c = 1) and (b, c = 0). 0.0 0.2 0.4 0.6 0.8 1.0 p(a) 1 p(b) 0.0 0.5 1.0 1.5 2.0 RIPS Multi-class rewards 20 15 10 5 0 Rln(IPS) 17
  • 21. Example: Deterministic Multi-label Rewards Both action a or b can be correct, let’s assume they are. Thus, we have logged samples (a, c = 1) and (b, c = 1). 0.0 0.2 0.4 0.6 0.8 1.0 p(a) 1 p(b) 1.90 1.95 2.00 2.05 2.10 RIPS Multi-label rewards 20 15 10 5 Rln(IPS) 18
  • 22. Example: Stochastic Multi-label Rewards Both action a or b can be correct, let’s assume they are. Thus, we can have logged samples (a, c = 1), (a, c = 0), (b, c = 1) and (b, c = 0). Assume we have observed 2 clicks on a, and 1 on b. 0.0 0.2 0.4 0.6 0.8 1.0 p(a) 1 p(b) 2.0 2.5 3.0 3.5 4.0RIPS Stochastic multi-label rewards 20 15 10 5 0 Rln(IPS) p(a) = 2/3 19
  • 23. Stochastic Rewards • ˆRln(IPS) can be seen as a more strict version of ˆRIPS: missing a single sample completely leads to an infinite loss. 20
  • 24. Stochastic Rewards • ˆRln(IPS) can be seen as a more strict version of ˆRIPS: missing a single sample completely leads to an infinite loss. • ˆRln(IPS) takes into account all positive samples instead of only the empirical best arm. Intuitively, this might lead to less overfitting. 20
  • 25. Stochastic Rewards • ˆRln(IPS) can be seen as a more strict version of ˆRIPS: missing a single sample completely leads to an infinite loss. • ˆRln(IPS) takes into account all positive samples instead of only the empirical best arm. Intuitively, this might lead to less overfitting. • ˆRln(IPS) can be straightforwardly plugged into existing methods such as contextual bandits, POEM and BanditNet. 20
  • 26. Sparse Rewards Policy-based methods tend to ignore negative feedback, but exhibit robust performance. Value-based methods are much more sensitive to the input data, with high variance in their performance as a result. Why not combine them? 21
  • 27. Dual Bandit Jointly optimise the Contextual Bandit and Likelihood objectives to get the best of both worlds: θ∗ = arg max θ (1 − α) N i=1 ci πθ(ai , xi) π0(ai , xi) +α N i=1 ci ln (σ(xi θ·,ai )) + (1 − ci ) ln (1 − σ(xi θ·,ai )) (12) where 0 ≤ α ≤ 1 regulates rescaling and reweighting. 22
  • 28. Dual Bandit Family Method P(c|x, a) P(a|x) IPS SVP Equivariant Value learning Likelihood IPS Likelihood Policy learning Contextual Bandit POEM BanditNet Joint learning Dual Bandit Table 2: Where the Dual Bandit fits in the bigger picture. 23
  • 30. Experimental Setup All code is written in PyTorch, and all models are optimised through LBFGS. We adopt RecoGym as simulation environment, and consider four logging policies: • Popularity-based (no support over all actions) πpop(a|x) = xa n i=1 xi • Popularity-based (with support over all actions, = 1 2) πpop-eps(a|x) = xa + n i=1 xi + 24
  • 31. Experimental Setup • Inverse popularity-based πinv-pop(a|x) = 1 − πpop(a|x) n i=1 1 − πpop(a|x) • Uniform πuniform(a|x) = 1 n 25
  • 32. Experimental Results The research questions we aim to answer are the following: RQ1 How does the logged IPS estimator ˆRln(IPS) influence the performance of counterfactual learning methods? RQ2 How do the various methods presented in this paper compare in terms of performance in a recommendation setting? RQ3 How sensitive is the performance of the learned models with respect to the quality of the initial logging policy π0? RQ4 How do the number of items n and the number of available samples N influence performance? 26
  • 33. RQ1 - Impact of ˆRln(IPS) Contextual Bandit POEM BanditNet Dual Bandit 1.0 1.1 1.2 1.3 1.4 1.5 1.6 CTR 1e 2 Effect of Rln(IPS) RIPS Rln(IPS) Figure 2: Averaged CTR for models trained for varying objective functions. 27
  • 34. RQ2-4 - Performance Comparison under varying settings 0.25 0.50 0.75 1.00 # Users in 1e4 1.0 1.2 1.4 1.6 1.8 CTR 1e 2 Popularity ( =0) Logging Skyline Likelihood IPS Likelihood Contextual Bandit POEM BanditNet Dual Bandit 0.25 0.50 0.75 1.00 # Users in 1e4 1.0 1.2 1.4 1.6 1.8 1e 2 Popularity ( =1/2) 0.25 0.50 0.75 1.00 # Users in 1e4 1.0 1.2 1.4 1.6 1.8 1e 2 Uniform 0.25 0.50 0.75 1.00 # Users in 1e4 1.0 1.2 1.4 1.6 1.8 1e 2 Inverse Popularity Figure 3: Simulated A/B-test results for various models trained on data collected under various logging policies. We increase the size of the training set over the x axis (n = 10). 28
  • 35. RQ2-4 - Performance Comparison under varying settings 0.25 0.50 0.75 1.00 # Users in 1e4 1.00 1.25 1.50 1.75 2.00 2.25 2.50 CTR 1e 2 Popularity ( =0) Logging Skyline Likelihood IPS Likelihood Contextual Bandit POEM BanditNet Dual Bandit 0.25 0.50 0.75 1.00 # Users in 1e4 1.00 1.25 1.50 1.75 2.00 2.25 2.50 1e 2 Popularity ( =1/2) 0.25 0.50 0.75 1.00 # Users in 1e4 1.00 1.25 1.50 1.75 2.00 2.25 2.50 1e 2 Uniform 0.25 0.50 0.75 1.00 # Users in 1e4 1.00 1.25 1.50 1.75 2.00 2.25 2.50 1e 2 Inverse Popularity Figure 4: Simulated A/B-test results for various models trained on data collected under various logging policies. We increase the size of the training set over the x axis (n = 50). 29
  • 37. Conclusion • Counterfactual learning approaches can achieve decent performance on recommendation tasks. • Performance can be improved by straightforward adaptations to deal with e.g. stochastic rewards. • Performance is dependent on the amount of randomisation in the logging policy, but even for policies without full support over the action space, decent performance can be achieved. 30
  • 39. References i References L. Bottou, J. Peters, J. Qui˜nonero-Candela, D. Charles, D. Chickering, E. Portugaly, D. Ray, P. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, 14(1):3207–3260, 2013. D. Hosmer Jr., S. Lemeshow, and R. Sturdivant. Applied logistic regression, volume 398. John Wiley & Sons, 2013. 32
  • 40. References ii T. Joachims, A. Swaminathan, and M. de Rijke. Deep learning with logged bandit feedback. In Proc. of the 6th International Conference on Learning Representations, ICLR ’18, 2018. A. Storkey. When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, pages 3–28, 2009. A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In Proc. of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 814–823. JMLR.org, 2015a. A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems, pages 3231–3239, 2015b. 33