Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Bayesian Bandits
Byron Galbraith, PhD
Cofounder / Chief Data Scientist, Talla
2017.03.24
Bayesian Bandits for the Impatient
Online adaptive learning: “Earn while you Learn”1
2
3
Powerful alternative to A/B testi...
Dining Ware VR Experiences on Demand
Dining Ware VR Experiences on Demand
Iterated Decision Problems
What product recommendations
should we present to subscribers
to keep them engaged?
A/B Testing
Exploit vs Explore - What should we do?
Choose what seems best so far
🙂 Feel good about our decision
🙂 There still may be ...
A/B/n Testing
Regret - What did that experiment cost us?
The Multi-Armed Bandit Problem
http://blog.yhat.com/posts/the-beer-bandit.html
Bandit Solutions
𝑅 𝑇 =
𝑡=1
𝑇
𝑟(𝑌𝑡 𝑎∗ ) − 𝑟 𝑌𝑡 𝑎 𝑡
k-MAB = 𝐴, 𝑌, 𝑃, 𝑟
𝑟𝑎 𝑛+1
= 𝑟𝑎 𝑛
+
1
𝑛 𝑎
𝑟𝑎 𝑡
− 𝑟𝑎 𝑛
𝑎 𝑡 = argmax
𝑖
𝑟𝑖 𝑡...
Thompson Sampling
𝑷 𝜽 𝒓, 𝒂 ∝ 𝑷 𝒓 𝜽, 𝒂 𝑷 𝜽|𝒂
Prior
Likelihood
Posterior
Bayesian Bandits – The Model
Model if a recommendation will result in user engagement
• Bernoulli distribution: 𝑝 - likeli...
Bayesian Bandits – The Algorithm
1. Initialize 𝛼𝑖 = 𝛽𝑖 = 1 (uniform prior)
2. For each user request for recommendations t
...
Belief Adaptation
Belief Adaptation
Belief Adaptation
Belief Adaptation
Belief Adaptation
Bandit Regret
But behavior is dependent on context
• Categorical contexts
• One bandit model per category
• One-hot context vector
• Rea...
So why would I ever A/B test again?
Test intent
Optimization vs understanding
Difficulty with non-stationarity
Monday vs F...
Bayesian Bandits for the Patient
Thompson Sampling balances exploitation &
exploration while minimizing decision regret1
2...
Resources
https://github.com/bgalbraith/bandits
Prochain SlideShare
Chargement dans…5
×

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

846 vues

Publié le

Byron Galbraith is the Chief Data Scientist and co-founder of Talla, where he works to translate the latest advancements in machine learning and natural language processing to build AI-powered conversational agents. Byron has a PhD in Cognitive and Neural Systems from Boston University and an MS in Bioinformatics from Marquette University. His research expertise includes brain-computer interfaces, neuromorphic robotics, spiking neural networks, high-performance computing, and natural language processing. Byron has also held several software engineering roles including back-end system engineer, full stack web developer, office automation consultant, and game engine developer at companies ranging in size from a two-person startup to a multi-national enterprise.

Abstract Summary:

Bayesian Bandits:
What color should that button be to convert more sales? What ad will most likely get clicked on? What movie recommendations should be displayed to keep subscribers engaged? What should we have for lunch? These are all examples of iterated decision problems — the same choice has to be made repeatedly with the goal being to arrive at an optimal decision strategy by incorporating the results of the previous decisions. In this talk I will describe the Bayesian Bandit solution to these types of problems, how it adaptively learns to minimize regret, how additional contextual information can be incorporated, and how it compares to the more traditional A/B testing solution.

Publié dans : Technologie
  • Soyez le premier à commenter

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

  1. 1. Bayesian Bandits Byron Galbraith, PhD Cofounder / Chief Data Scientist, Talla 2017.03.24
  2. 2. Bayesian Bandits for the Impatient Online adaptive learning: “Earn while you Learn”1 2 3 Powerful alternative to A/B testing optimization Can be efficient and easy to implement
  3. 3. Dining Ware VR Experiences on Demand
  4. 4. Dining Ware VR Experiences on Demand
  5. 5. Iterated Decision Problems What product recommendations should we present to subscribers to keep them engaged?
  6. 6. A/B Testing
  7. 7. Exploit vs Explore - What should we do? Choose what seems best so far 🙂 Feel good about our decision 🙂 There still may be something better Try something new 😄 Discover a superior approach 😧 Regret our choice
  8. 8. A/B/n Testing
  9. 9. Regret - What did that experiment cost us?
  10. 10. The Multi-Armed Bandit Problem http://blog.yhat.com/posts/the-beer-bandit.html
  11. 11. Bandit Solutions 𝑅 𝑇 = 𝑡=1 𝑇 𝑟(𝑌𝑡 𝑎∗ ) − 𝑟 𝑌𝑡 𝑎 𝑡 k-MAB = 𝐴, 𝑌, 𝑃, 𝑟 𝑟𝑎 𝑛+1 = 𝑟𝑎 𝑛 + 1 𝑛 𝑎 𝑟𝑎 𝑡 − 𝑟𝑎 𝑛 𝑎 𝑡 = argmax 𝑖 𝑟𝑖 𝑡 + 𝑐 log 𝑡 𝑛𝑖 𝑃 𝐴 𝑡 = 𝑎 = 𝑒ℎ 𝑎 𝑛 𝑏=1 𝑘 𝑒ℎ 𝑏 𝑛 = 𝜋 𝑡(𝑎) ℎ 𝑎 𝑛+1 = ℎ 𝑎 𝑛 + 𝛼 𝑟𝑎 𝑡 − 𝑟𝑎 𝑛 (1 − 𝜋 𝑡 𝑎 ) ℎ 𝑏 𝑛+1 = ℎ 𝑏 𝑛 − 𝛼 𝑟𝑎 𝑡 − 𝑟𝑎 𝑛 𝜋 𝑡 𝑏 , 𝑏 ≠ 𝑎 𝑃 𝑋 = 𝑥 = 𝑥 𝛼−1 1 − 𝑥 𝛽−1 𝐵 𝛼, 𝛽 𝑃 𝑋 = 𝑥 = 𝑛 𝑥 𝑝 𝑥 1 − 𝑝 𝑛−𝑥 𝐵𝑒𝑡𝑎 𝑎(𝛼 + 𝑟𝑎, 𝛽 + 𝑁 − 𝑟𝑎) 𝑃 𝑋 𝑌, 𝑍 = 𝑃 𝑌 𝑋, 𝑍 𝑃 𝑋 𝑍 𝑃 𝑌 𝑍
  12. 12. Thompson Sampling 𝑷 𝜽 𝒓, 𝒂 ∝ 𝑷 𝒓 𝜽, 𝒂 𝑷 𝜽|𝒂 Prior Likelihood Posterior
  13. 13. Bayesian Bandits – The Model Model if a recommendation will result in user engagement • Bernoulli distribution: 𝑝 - likelihood of event occurring How do we find 𝑝? • Conjugate prior • Beta distribution: 𝛼 - number of hits, 𝛽 - number of misses Only need to keep track of two numbers per option • # of hits, # of misses
  14. 14. Bayesian Bandits – The Algorithm 1. Initialize 𝛼𝑖 = 𝛽𝑖 = 1 (uniform prior) 2. For each user request for recommendations t 1. Sample 𝑝𝑖 ~ 𝐵𝑒𝑡𝑎 𝛼𝑖, 𝛽𝑖 2. Choose action corresponding to largest 𝑝𝑖 3. Observe reward 𝑟𝑡 4. Update 𝛼𝑡 += 𝑟𝑡, 𝛽𝑡 += 1 − 𝑟𝑡
  15. 15. Belief Adaptation
  16. 16. Belief Adaptation
  17. 17. Belief Adaptation
  18. 18. Belief Adaptation
  19. 19. Belief Adaptation
  20. 20. Bandit Regret
  21. 21. But behavior is dependent on context • Categorical contexts • One bandit model per category • One-hot context vector • Real-valued contexts • Can capture interrelatedness of context dimensions • More difficult to incorporate effectively
  22. 22. So why would I ever A/B test again? Test intent Optimization vs understanding Difficulty with non-stationarity Monday vs Friday behavior Deployment Few turnkey options Specialized skill set https://vwo.com/blog/multi-armed-bandit-algorithm/
  23. 23. Bayesian Bandits for the Patient Thompson Sampling balances exploitation & exploration while minimizing decision regret1 2 3 No need to pre-specify decision splits, time horizon for experiments Can model a variety of problems and complex interactions
  24. 24. Resources https://github.com/bgalbraith/bandits

×