Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Reinforcement Learning
with Thompson Sampling
(3rd)
ujava.org workshop
2016-08-28
www.idosi.com
CEO 강신동
Shindong KANG
(주)지...
www.idosi.comujava.org
www.idosi.comspaceapi.org
www.idosi.comReinforcement Learning for Brick Game
www.idosi.comReinforcement Learning
www.idosi.comForecast
www.idosi.comForecast with probability
www.idosi.comProbability (확률)
www.idosi.comConditional Probability (조건부 확률)
www.idosi.comBayesian Probability (베이지안 확률)
www.idosi.comBayes Rule Words
www.idosi.comBayesian Probability (베이지안 확률)
P(fair|H) = ?
P(A) = P(fair) = ½
P(B) = P(H) = ¾
P(B|A) = P(H|fair) = ½
½ ½ 1
...
www.idosi.comBrownian motion, Gaussian distribution
www.idosi.comMarkov Process
www.idosi.comStochastic Matrix
www.idosi.comStochastic Matrix
0.4 0.6
0.7 0.3
www.idosi.comExploitation and Exploration (개발 and 탐험)
www.idosi.comState-action exploration vs. Parameter exploration
www.idosi.comMulti-armed bandit problem
www.idosi.comSimulated Bandit Performance
www.idosi.comMulti-armed bandit problem
www.idosi.comMulti-Armed Bandit Algorithms
www.idosi.comMAB Reward
www.idosi.comGaussian Distribution
www.idosi.comGaussian Distribution
www.idosi.comGMM (Gaussian Mixture Model)
www.idosi.comGaussian Mixture Model
www.idosi.comGaussian Mixture Model
www.idosi.comFunction's Probability Distribution
Function's Probability Distribution ?
www.idosi.comFunction's Probability Distribution
y = ax^2 +b
www.idosi.comFunction's Probability Distribution with Gaussian Distribution
y = ax^2 +b
www.idosi.comFunction's Probability Distribution with Gaussian Distribution
www.idosi.comGaussian Process Regreesion
www.idosi.comGaussian Process
From “C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT ...
www.idosi.comBayesian Optimization
www.idosi.comAcquisition function
www.idosi.comWhy Bayesian Optimization works
www.idosi.comBayesian reasoners
www.idosi.comIntelligent user interfaces regression
www.idosi.comSlot Machine
www.idosi.comMulti Armed Bandit
www.idosi.comMAB – Regret (후회)
www.idosi.comA/B Testing
www.idosi.comGreedy Algorithm
www.idosi.comGreedy Algorithm (Search Maximum)
www.idosi.comGreedy Algorithm (Search Tree)
www.idosi.comepsilon Greedy (epsilon = exploration)
www.idosi.comSoftmax
www.idosi.comSoftmax
www.idosi.comUCB
www.idosi.comargmax
www.idosi.comUCB
www.idosi.comUCB1
www.idosi.comLog graph
www.idosi.comUCB1
www.idosi.comIndicator function (표시함수)
www.idosi.comThompson sampling
Probability Matching,
Bayesian Bandit
www.idosi.comThompson sampling
www.idosi.comThompson sampling
(from SlideShare “Slice Technologies”)
www.idosi.comThompson sampling
www.idosi.comThompson sampling (area = 1)
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
19 / (19 + 9) = 19 / 28 = 0.679
59 / (59 + 39) = 59 / 98 = 0.60
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling Algorithm for Bernoulli bandits
www.idosi.comThompson sampling Algorithm for general stochastic bandits
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comMultiplay Thompson Sampling
(from MS Research)
www.idosi.comMultiplay Thompson sampling
Multi-play Thompson Sampling (MP-TS)
Improved Multi-play Thompson Sampling (IMP-T...
www.idosi.com
Thank you !
(주)지능도시
Intelligent City Ltd.
강신동
Shindong KANG
www.idosi.com
ceo@idosi.com
Prochain SlideShare
Chargement dans…5
×

ujava.org workshop : Reinforcement Learning with Thompson Sampling

607 vues

Publié le

ujava.org forum workshop : Reinforcement Learning with Thompson Sampling

Publié dans : Données & analyses
  • Soyez le premier à commenter

ujava.org workshop : Reinforcement Learning with Thompson Sampling

  1. 1. Reinforcement Learning with Thompson Sampling (3rd) ujava.org workshop 2016-08-28 www.idosi.com CEO 강신동 Shindong KANG (주)지능도시
  2. 2. www.idosi.comujava.org
  3. 3. www.idosi.comspaceapi.org
  4. 4. www.idosi.comReinforcement Learning for Brick Game
  5. 5. www.idosi.comReinforcement Learning
  6. 6. www.idosi.comForecast
  7. 7. www.idosi.comForecast with probability
  8. 8. www.idosi.comProbability (확률)
  9. 9. www.idosi.comConditional Probability (조건부 확률)
  10. 10. www.idosi.comBayesian Probability (베이지안 확률)
  11. 11. www.idosi.comBayes Rule Words
  12. 12. www.idosi.comBayesian Probability (베이지안 확률) P(fair|H) = ? P(A) = P(fair) = ½ P(B) = P(H) = ¾ P(B|A) = P(H|fair) = ½ ½ ½ 1 --- = –-- ¾ 3
  13. 13. www.idosi.comBrownian motion, Gaussian distribution
  14. 14. www.idosi.comMarkov Process
  15. 15. www.idosi.comStochastic Matrix
  16. 16. www.idosi.comStochastic Matrix 0.4 0.6 0.7 0.3
  17. 17. www.idosi.comExploitation and Exploration (개발 and 탐험)
  18. 18. www.idosi.comState-action exploration vs. Parameter exploration
  19. 19. www.idosi.comMulti-armed bandit problem
  20. 20. www.idosi.comSimulated Bandit Performance
  21. 21. www.idosi.comMulti-armed bandit problem
  22. 22. www.idosi.comMulti-Armed Bandit Algorithms
  23. 23. www.idosi.comMAB Reward
  24. 24. www.idosi.comGaussian Distribution
  25. 25. www.idosi.comGaussian Distribution
  26. 26. www.idosi.comGMM (Gaussian Mixture Model)
  27. 27. www.idosi.comGaussian Mixture Model
  28. 28. www.idosi.comGaussian Mixture Model
  29. 29. www.idosi.comFunction's Probability Distribution Function's Probability Distribution ?
  30. 30. www.idosi.comFunction's Probability Distribution y = ax^2 +b
  31. 31. www.idosi.comFunction's Probability Distribution with Gaussian Distribution y = ax^2 +b
  32. 32. www.idosi.comFunction's Probability Distribution with Gaussian Distribution
  33. 33. www.idosi.comGaussian Process Regreesion
  34. 34. www.idosi.comGaussian Process From “C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006”
  35. 35. www.idosi.comBayesian Optimization
  36. 36. www.idosi.comAcquisition function
  37. 37. www.idosi.comWhy Bayesian Optimization works
  38. 38. www.idosi.comBayesian reasoners
  39. 39. www.idosi.comIntelligent user interfaces regression
  40. 40. www.idosi.comSlot Machine
  41. 41. www.idosi.comMulti Armed Bandit
  42. 42. www.idosi.comMAB – Regret (후회)
  43. 43. www.idosi.comA/B Testing
  44. 44. www.idosi.comGreedy Algorithm
  45. 45. www.idosi.comGreedy Algorithm (Search Maximum)
  46. 46. www.idosi.comGreedy Algorithm (Search Tree)
  47. 47. www.idosi.comepsilon Greedy (epsilon = exploration)
  48. 48. www.idosi.comSoftmax
  49. 49. www.idosi.comSoftmax
  50. 50. www.idosi.comUCB
  51. 51. www.idosi.comargmax
  52. 52. www.idosi.comUCB
  53. 53. www.idosi.comUCB1
  54. 54. www.idosi.comLog graph
  55. 55. www.idosi.comUCB1
  56. 56. www.idosi.comIndicator function (표시함수)
  57. 57. www.idosi.comThompson sampling Probability Matching, Bayesian Bandit
  58. 58. www.idosi.comThompson sampling
  59. 59. www.idosi.comThompson sampling (from SlideShare “Slice Technologies”)
  60. 60. www.idosi.comThompson sampling
  61. 61. www.idosi.comThompson sampling (area = 1)
  62. 62. www.idosi.comThompson sampling
  63. 63. www.idosi.comThompson sampling
  64. 64. www.idosi.comThompson sampling
  65. 65. www.idosi.comThompson sampling 19 / (19 + 9) = 19 / 28 = 0.679 59 / (59 + 39) = 59 / 98 = 0.60
  66. 66. www.idosi.comThompson sampling
  67. 67. www.idosi.comThompson sampling
  68. 68. www.idosi.comThompson sampling
  69. 69. www.idosi.comThompson sampling
  70. 70. www.idosi.comThompson sampling Algorithm for Bernoulli bandits
  71. 71. www.idosi.comThompson sampling Algorithm for general stochastic bandits
  72. 72. www.idosi.comThompson sampling
  73. 73. www.idosi.comThompson sampling
  74. 74. www.idosi.comThompson sampling
  75. 75. www.idosi.comThompson sampling
  76. 76. www.idosi.comThompson sampling
  77. 77. www.idosi.comThompson sampling
  78. 78. www.idosi.comThompson sampling
  79. 79. www.idosi.comThompson sampling
  80. 80. www.idosi.comThompson sampling
  81. 81. www.idosi.comThompson sampling
  82. 82. www.idosi.comThompson sampling
  83. 83. www.idosi.comThompson sampling
  84. 84. www.idosi.comThompson sampling
  85. 85. www.idosi.comThompson sampling
  86. 86. www.idosi.comThompson sampling
  87. 87. www.idosi.comThompson sampling
  88. 88. www.idosi.comThompson sampling
  89. 89. www.idosi.comThompson sampling
  90. 90. www.idosi.comMultiplay Thompson Sampling (from MS Research)
  91. 91. www.idosi.comMultiplay Thompson sampling Multi-play Thompson Sampling (MP-TS) Improved Multi-play Thompson Sampling (IMP-TS)
  92. 92. www.idosi.com Thank you ! (주)지능도시 Intelligent City Ltd. 강신동 Shindong KANG www.idosi.com ceo@idosi.com

×