ujava.org workshop : Reinforcement Learning with Thompson Sampling

ujava.org forum workshop : Reinforcement Learning with Thompson Sampling

1. 1. Reinforcement Learning with Thompson Sampling (3rd) ujava.org workshop 2016-08-28 www.idosi.com CEO 강신동 Shindong KANG (주)지능도시
2. 2. www.idosi.comujava.org
3. 3. www.idosi.comspaceapi.org
4. 4. www.idosi.comReinforcement Learning for Brick Game
5. 5. www.idosi.comReinforcement Learning
6. 6. www.idosi.comForecast
7. 7. www.idosi.comForecast with probability
8. 8. www.idosi.comProbability (확률)
9. 9. www.idosi.comConditional Probability (조건부 확률)
10. 10. www.idosi.comBayesian Probability (베이지안 확률)
11. 11. www.idosi.comBayes Rule Words
12. 12. www.idosi.comBayesian Probability (베이지안 확률) P(fair|H) = ? P(A) = P(fair) = ½ P(B) = P(H) = ¾ P(B|A) = P(H|fair) = ½ ½ ½ 1 --- = –-- ¾ 3
13. 13. www.idosi.comBrownian motion, Gaussian distribution
14. 14. www.idosi.comMarkov Process
15. 15. www.idosi.comStochastic Matrix
16. 16. www.idosi.comStochastic Matrix 0.4 0.6 0.7 0.3
17. 17. www.idosi.comExploitation and Exploration (개발 and 탐험)
18. 18. www.idosi.comState-action exploration vs. Parameter exploration
19. 19. www.idosi.comMulti-armed bandit problem
20. 20. www.idosi.comSimulated Bandit Performance
21. 21. www.idosi.comMulti-armed bandit problem
22. 22. www.idosi.comMulti-Armed Bandit Algorithms
23. 23. www.idosi.comMAB Reward
24. 24. www.idosi.comGaussian Distribution
25. 25. www.idosi.comGaussian Distribution
26. 26. www.idosi.comGMM (Gaussian Mixture Model)
27. 27. www.idosi.comGaussian Mixture Model
28. 28. www.idosi.comGaussian Mixture Model
29. 29. www.idosi.comFunction's Probability Distribution Function's Probability Distribution ?
30. 30. www.idosi.comFunction's Probability Distribution y = ax^2 +b
31. 31. www.idosi.comFunction's Probability Distribution with Gaussian Distribution y = ax^2 +b
32. 32. www.idosi.comFunction's Probability Distribution with Gaussian Distribution
33. 33. www.idosi.comGaussian Process Regreesion
34. 34. www.idosi.comGaussian Process From “C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006”
35. 35. www.idosi.comBayesian Optimization
36. 36. www.idosi.comAcquisition function
37. 37. www.idosi.comWhy Bayesian Optimization works
38. 38. www.idosi.comBayesian reasoners
39. 39. www.idosi.comIntelligent user interfaces regression
40. 40. www.idosi.comSlot Machine
41. 41. www.idosi.comMulti Armed Bandit
42. 42. www.idosi.comMAB – Regret (후회)
43. 43. www.idosi.comA/B Testing
44. 44. www.idosi.comGreedy Algorithm
45. 45. www.idosi.comGreedy Algorithm (Search Maximum)
46. 46. www.idosi.comGreedy Algorithm (Search Tree)
47. 47. www.idosi.comepsilon Greedy (epsilon = exploration)
48. 48. www.idosi.comSoftmax
49. 49. www.idosi.comSoftmax
50. 50. www.idosi.comUCB
51. 51. www.idosi.comargmax
52. 52. www.idosi.comUCB
53. 53. www.idosi.comUCB1
54. 54. www.idosi.comLog graph
55. 55. www.idosi.comUCB1
56. 56. www.idosi.comIndicator function (표시함수)
57. 57. www.idosi.comThompson sampling Probability Matching, Bayesian Bandit
58. 58. www.idosi.comThompson sampling
59. 59. www.idosi.comThompson sampling (from SlideShare “Slice Technologies”)
60. 60. www.idosi.comThompson sampling
61. 61. www.idosi.comThompson sampling (area = 1)
62. 62. www.idosi.comThompson sampling
63. 63. www.idosi.comThompson sampling
64. 64. www.idosi.comThompson sampling
65. 65. www.idosi.comThompson sampling 19 / (19 + 9) = 19 / 28 = 0.679 59 / (59 + 39) = 59 / 98 = 0.60
66. 66. www.idosi.comThompson sampling
67. 67. www.idosi.comThompson sampling
68. 68. www.idosi.comThompson sampling
69. 69. www.idosi.comThompson sampling
70. 70. www.idosi.comThompson sampling Algorithm for Bernoulli bandits
71. 71. www.idosi.comThompson sampling Algorithm for general stochastic bandits
72. 72. www.idosi.comThompson sampling
73. 73. www.idosi.comThompson sampling
74. 74. www.idosi.comThompson sampling
75. 75. www.idosi.comThompson sampling
76. 76. www.idosi.comThompson sampling
77. 77. www.idosi.comThompson sampling
78. 78. www.idosi.comThompson sampling
79. 79. www.idosi.comThompson sampling
80. 80. www.idosi.comThompson sampling
81. 81. www.idosi.comThompson sampling
82. 82. www.idosi.comThompson sampling
83. 83. www.idosi.comThompson sampling
84. 84. www.idosi.comThompson sampling
85. 85. www.idosi.comThompson sampling
86. 86. www.idosi.comThompson sampling
87. 87. www.idosi.comThompson sampling
88. 88. www.idosi.comThompson sampling
89. 89. www.idosi.comThompson sampling
90. 90. www.idosi.comMultiplay Thompson Sampling (from MS Research)
91. 91. www.idosi.comMultiplay Thompson sampling Multi-play Thompson Sampling (MP-TS) Improved Multi-play Thompson Sampling (IMP-TS)
92. 92. www.idosi.com Thank you ! (주)지능도시 Intelligent City Ltd. 강신동 Shindong KANG www.idosi.com ceo@idosi.com