SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
An Introduction to
Deep
Reinforcement
Learning
Vishal A. Bhalla
Technical University of Munich (TUM), Germany
Talk @ Big Data & Data Science Meetup | Bogotá, Colombia, 4th
Sep ‘17.
1
About Me
● Masters Student in Informatics (CS) at Technical University of Munich (TUM)
○ Major focus in Artificial Intelligence (AI) & Natural Language Understanding (NLU)
○ Applied wide range of Machine Learning (ML) algorithms in Automotive, Robotics,
Medical Imaging & Security domains
● Interested in exploring Deep Reinforcement Learning (RL) methods for NLU & Dialogue
Systems
● Happy to connect for collaborations on novel and challenging projects
An Introduction to Deep Reinforcement Learning “Big Data & Data Science Meetup” 4th
Sep 2017 @ Bogotá, Colombia Vishal Bhalla, Student M Sc. Informatics @ TUM
2
Agenda
● Introduction
● Theory & Concepts
● Approaches
● Key Players & Toolkits
● Research considerations
● Envoi
3
Introduction
4
Motivation
5
● Goes beyond input-output pattern recognition
● Synergy of
Deep Neural Networks + Reinforcement Learning
● ‘Mapping’ sensors to actions
● Build new applications
Image courtesy: OpenAI Blog on Evolution Strategies
Major breakthrough!
● AlphaGo defeating the Go World Champion
6
Image courtesy: The Guardian Image courtesy: Twitter - Deep Mind AI
Applications
● Learning to play
Atari games
from raw pixels
7
Video courtesy: YouTube @DeepMind - DQN Breakout
Applications (2)
● Games
● Robotics
● Energy Conservation
● Healthcare
● Dialogue Systems
● Marketing
8
Video courtesy: Bipedal Walker - Evolution Strategy Variant + OpenAI Gym
Applications (3)
● Producing flexible behaviours in simulated environments
9
GIF courtesy: Deep Mind Blog
Applications (4)
● AI research in the real-time strategy game StarCraft II & DOTA 2
10
Image courtesy: (L) SC2LE - an RL environment based on StarCraft II from DeepMind & Blizzard and (R) A bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules
RL Theory &
Concepts
11
Reinforcement Learning (RL)
● Inspired by research into animal learning
● Correct input/label pairs are never presented
● Focus is on on-line performance
● Used in environments where,
○ No analytic solution
○ Simulation Model
○ Interaction only
● Eg: Making robots learn, how to walk
○ Reward: Head position
12
Typical RL scenario
13
Environment
Agent
ActionState
Reward
Markov Decision Processes (MDPs)
14
● State transition model p(st+1
| st
, at
) where,
s - state & a - action
● Reward p(rt+1
| st
, at
)
○ Depends on the current state and the
action performed
● Discount factor ∈ [0,1]
○ Controls the importance of future rewards
A simple MDP
Image courtesy: Wikipedia
Policy
● Agent - Choice of which action to perform
● Policy - Function of current environment state
● Action - Returns the best one
● Deterministic vs Stochastic environment
15
Rewards
● Agent’s goal: Pick best policy that maximises total reward
● Naive approach - Sum up rewards at each time step
where, T is the horizon (episode length) which can be infinity
● Discount factor importance
○ Reward doesn’t go to infinity as 0 ≤ ≤ 1
○ Preference for immediate rewards
16
Brute force
● 2 main steps
○ Sample returns after following each policy
○ Chose one with largest expected return
● Issues
○ Large or infinite policies
○ Large no. of samples required to handle variance of returns
● Solutions
○ Give some structure
○ Allow samples of one policy to influence estimates of other
17
Types
18
● Model based
1. Agent knows the MDP model
2. Agent uses it to (offline) plan
actions before any interactions
with environment
3. Eg: Value-iteration &
policy-iteration
● Model Free
1. Initial knowledge about possible
state-actions but not MDP model
2. Improves (online) through
learning from the interactions
with the environment
3. Eg: Q-Learning
Value Function
● Goodness of a state
● Expected total reward from start state s
● Depends on the policy
● There exists an optimal value function with the highest value
● Optimal policy *
19
Value Iteration
● Iteratively compute optimal state value function V(s)
● Guaranteed to converge to optimal values
20
Policy Iteration
● Re-define the policy at each step
● Compute value function for this new policy until the policy converges
● Guaranteed to converge
21
Value vs Policy Iteration
● Used for Offline planning
○ Prior knowledge about MDP
● Policy Iteration is computationally efficient compared to Value Iteration
○ Takes fewer iterations to converge
○ However, each iteration is computationally expensive
22
Q Learning
● Model free
● Quality of certain action in given state
● Q(st
,at
) = maxπ
Rt+1
such that π(s) = argmaxa
Q(s,a)
● Bellman equation
○ Q(s,a) = r + γ.maxa’
Q(s′,a′)
● Iterative Algorithm
● Q-function will converge and represent the true Q-value
23
Going Deep (RL)!
24
Deep Q-Learning
● Q-Learning uses tables to store data
● Combine function approximation with Neural Networks
● Eg: Deep RL for Atari Games
● 1067970
rows in our imaginary Q-table, more than the no. of atoms in the known universe!
● Other variants
○ Double DQN to correct over-estimated action values
○ Online version: Delayed Q-Learning with PAC
○ Greedy, Speedy Q-Learning, etc.
25
Deep Q Network
● Only game screens (and action) as input
● Output Q-value for each possible action
● One Forward pass
● CNN - No pooling
26
State
Action
Neural
Network
Q-Value
State
Neural Network
Q-Value1 Q-Value1 Q-Value1
Naive formulation of deep Q-network. Optimized architecture of deep Q-network (first used in DeepMind paper)
Policy Gradients
● Policy p has a set of ‘n’ real valued parameters q = {q1
, q2
, …, qn
}
● Calculate the reward gradient
qi
∀ i q ← qi
+ qi
R R
● Same as Supervised Learning
● Safe exploration and faster than value based methods
● Locally best parameter
● Parameterised policy & high dimensional space
● Advantage - ∑i
Ai
logp(yi
∣xi
)
27
Actor-Critic Algorithms
● Agent uses the Value estimate (critic) to
update the Policy (actor)
● Value function as a baseline for policy gradients
● Utilise a learned value function.
28
Actor-Critic
Asynchronous Advantage Actor-Critic (A3C)
● A3C utilizes multiple Worker agents
● Speedup & Diverse Experience
● Combines benefits of Value & Policy Iteration
● Continuous & Discrete action spaces
29
Images(L-R): A3C: Training workflow of each worker agent (L) and High-level architecture (R)
Break
30
Examples
31
Dialogue Systems: Interactive RL
32
● Conversational flow.
● Concept of delayed reward fits well to Dialogue
ICLR 2017 by FAIR: Learning Through Dialogue Interactions By Asking Questions
Dialogue Systems: Deep RL
33
● Actor-Critic method
● 2 Stage training → Supervised Learning + RL
○ Supervised → Mimic human behaviour
○ RL → Handle unforeseen situations
● User simulations for training
● Infinite state space of probability distributions
● Dialogue act-slot type combinations Image courtesy: Maluuba: Applying Deep Reinforcement Learning to Dialogue Management
Key Players &
Toolkits
34
Key Players
35
Labs & Groups
● Berkeley Artificial Intelligence Research (BAIR) Lab
○ UC Berkeley EE Department
● Univ. of Alberta, Edmonton, Canada
○ Deep Mind’s 1st international office
36
Richard Sutton, Michael Bowling and Patrick Pilarski @Univ of Alberta
Image courtesy: Deep Mind Blog
Researchers
● Prof. Peter Abeel, Sergey Levine & Chelsea Finn
○ BAIR, UC Berkeley EE Dept.
● Rich Sutton
○ Univ of Alberta
● David Silver, Oriol Vinyals & Vlad Mnih
○ Google DeepMind
● Ilya Sutskever, Rocky Duan & John Schulman
○ Open AI
● Jason Weston
○ Facebook AI Research
(FAIR)
37
Chelsea Finn, Sergey Levine & Peter Abeel from UC Berkeley.
Image courtesy: The New York Times
Tools
● High-quality implementations of reinforcement learning algorithms
○ OpenAI Baselines
○ ChainerRL
● Environments with a set of test problems to write & evaluate RL algorithms
○ OpenAI Gym
○ RLLab
38
Research Frontiers
39
Experience Replay
● Problem:
○ Approximate Q-functions using a CNN
○ Non-linearity is not stable and takes time to converge
● Trick:
○ Store all experiences < s, a, r, s’ > in a replay memory
○ Use random mini-batches from it
○ Avoids local minimum by breaking similarity between subsequent training samples
○ Makes it similar to Supervised Learning
40
Exploration vs Exploitation?
● Should the agent,
○ Trust the learnt Q values for every action? Or
○ Try other actions which might give a better reward
● Q-learning algorithm incorporates a greedy exploration
● Fix: -greedy approach!
○ Pick a random action (explore) with probability Or
○ Select an action according to current Q-values with probability (1- )
○ Decrease over time as agent becomes confident
41
Genetic Algorithm
● Evolutionary Computations family of AI
● Meta-heuristic optimization method
● Requirements
○ Represent as string of chromosomes (array of bits)
○ Fitness function to evaluate solutions
● Steps
○ Generation - Pool of candidate solutions
○ Next Gen- candidate sol with higher fitness value
■ Selection
■ Crossover
■ Mutation
○ Iterate till solution with goal fitness value
42
Image courtesy: The Genetic Algorithm - Explained
Evolution Strategies
● Black-box stochastic optimization
● Fit ‘n’ no. of parameters to a single reward function
● Tweak and guess iteratively
● Tradeoff vs RL
○ No need for backpropagation
○ Highly parallelizable
○ Higher robustness.
○ Structured exploration.
○ Credit assignment over long time scales
● https://blog.openai.com/evolution-strategies/
43
Exploration with Parameter noise
● Traditional RL uses action space noise
● Parameter space noise injects randomness
directly into the parameters of the agent
● A middle ground between
Evolution Strategies & Deep RL
44
Image courtesy: Better Exploration with Parameter Noise
Current Research & Other Challenges
● Model-based RL
● Inverse RL & Imitation Learning - Makes use of GAN’s
● Hierarchical (of policies) RL
● Multi-agent RL (MARL)
● Memory & Attention
● Transfer Learning
● Benchmarks
45
Envoi
46
Summary
● Stable and scalable RL is possible
● Deep networks represent value, policy and model
● Applications - Games, Robotics, Dialogue Systems, etc.
● Lot of hacks and advanced Deep RL paradigms required still
● Observing the agent is a rewarding experience!
47
References
● Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al.
Human-level control through deep reinforcement learning. [MnihDQN16]
In Nature 518, no. 7540 (2015): 529-533.
● Mnih, Volodymyr, Adria P. Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, & Koray Kavukcuoglu.
Asynchronous methods for deep reinforcement learning. [MnihA3C16]
In International Conference on Machine Learning, pp. 1928-1937. 2016.
● Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage and Anil Anthony Bharath.
A Brief Survey of Deep Reinforcement Learning. [KaiDeepRLSurvey17]
In IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image Understanding.
● Wang, Ziyu, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas.
Sample efficient actor-critic with experience replay. [WangACExpReplay17]
In arXiv preprint arXiv:1611.01224 (2016).
48
Additional Links
● Blogs
○ Deep RL (Episode 0-2) blog series by Moustafa Alzantot
○ Demystifying Deep RL guest post by Tambet Matiisen at Intel-Nervana Systems
○ Maluuba’s blog on Deep RL for Dialogue Systems
○ Simple Reinforcement Learning with Tensorflow 8 Part Series by Arthur Juliani
○ Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
● Tutorials
○ David Silver's Deep RL video-lectures
○ Tutorial on Deep RL by Sergey Levine & Chelsea Finn at ICML 2017
○ Deep RL Bootcamp in Berkeley, California USA
49
Questions?
Image courtesy: travelblogadvice
50
Image courtesy: bethratzlaff
51
Backup Slides
52
The End
53

Contenu connexe

Tendances

Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIMikko Mäkipää
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...SlideTeam
 

Tendances (20)

Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
 

Similaire à An introduction to deep reinforcement learning

Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learningguruprasad110
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement LearningHung Le
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learningMarsan Ma
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningVan Huy
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine LearningSARCCOM
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Universitat Politècnica de Catalunya
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...郁凱 黃
 

Similaire à An introduction to deep reinforcement learning (20)

Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement Learning
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learning
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine Learning
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
 

Plus de Big Data Colombia

Machine learning applied in health
Machine learning applied in healthMachine learning applied in health
Machine learning applied in healthBig Data Colombia
 
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern Recognition
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern RecognitionWhose Balance Sheet is this? Neural Networks for Banks’ Pattern Recognition
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern RecognitionBig Data Colombia
 
Analysis of your own Facebook friends’ data structure through graphs
Analysis of your own Facebook friends’ data structure through graphsAnalysis of your own Facebook friends’ data structure through graphs
Analysis of your own Facebook friends’ data structure through graphsBig Data Colombia
 
Lo datos cuentan su historia
Lo datos cuentan su historiaLo datos cuentan su historia
Lo datos cuentan su historiaBig Data Colombia
 
Entornos Naturalmente Inteligentes
Entornos Naturalmente InteligentesEntornos Naturalmente Inteligentes
Entornos Naturalmente InteligentesBig Data Colombia
 
Modelamiento predictivo y medicina
Modelamiento predictivo y medicinaModelamiento predictivo y medicina
Modelamiento predictivo y medicinaBig Data Colombia
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
 
Deep learning: el renacimiento de las redes neuronales
Deep learning: el renacimiento de las redes neuronalesDeep learning: el renacimiento de las redes neuronales
Deep learning: el renacimiento de las redes neuronalesBig Data Colombia
 
Cloud computing: Trends and Challenges
Cloud computing: Trends and ChallengesCloud computing: Trends and Challenges
Cloud computing: Trends and ChallengesBig Data Colombia
 
Kaggle: Coupon Purchase Prediction
Kaggle: Coupon Purchase PredictionKaggle: Coupon Purchase Prediction
Kaggle: Coupon Purchase PredictionBig Data Colombia
 
Introducción al Datawarehousing
Introducción al DatawarehousingIntroducción al Datawarehousing
Introducción al DatawarehousingBig Data Colombia
 
Análisis Explotatorio de Datos: Dejad que la data hable.
Análisis Explotatorio de Datos: Dejad que la data hable.Análisis Explotatorio de Datos: Dejad que la data hable.
Análisis Explotatorio de Datos: Dejad que la data hable.Big Data Colombia
 
Salud, dinero, amor y big data
Salud, dinero, amor y big dataSalud, dinero, amor y big data
Salud, dinero, amor y big dataBig Data Colombia
 
Business Analytics: ¡La culpa es del BIG data!
Business Analytics: ¡La culpa es del BIG data!Business Analytics: ¡La culpa es del BIG data!
Business Analytics: ¡La culpa es del BIG data!Big Data Colombia
 

Plus de Big Data Colombia (19)

Machine learning applied in health
Machine learning applied in healthMachine learning applied in health
Machine learning applied in health
 
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern Recognition
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern RecognitionWhose Balance Sheet is this? Neural Networks for Banks’ Pattern Recognition
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern Recognition
 
Analysis of your own Facebook friends’ data structure through graphs
Analysis of your own Facebook friends’ data structure through graphsAnalysis of your own Facebook friends’ data structure through graphs
Analysis of your own Facebook friends’ data structure through graphs
 
Lo datos cuentan su historia
Lo datos cuentan su historiaLo datos cuentan su historia
Lo datos cuentan su historia
 
Entornos Naturalmente Inteligentes
Entornos Naturalmente InteligentesEntornos Naturalmente Inteligentes
Entornos Naturalmente Inteligentes
 
Modelamiento predictivo y medicina
Modelamiento predictivo y medicinaModelamiento predictivo y medicina
Modelamiento predictivo y medicina
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
 
Deep learning: el renacimiento de las redes neuronales
Deep learning: el renacimiento de las redes neuronalesDeep learning: el renacimiento de las redes neuronales
Deep learning: el renacimiento de las redes neuronales
 
IPython & Jupyter
IPython & JupyterIPython & Jupyter
IPython & Jupyter
 
Cloud computing: Trends and Challenges
Cloud computing: Trends and ChallengesCloud computing: Trends and Challenges
Cloud computing: Trends and Challenges
 
Kaggle: Coupon Purchase Prediction
Kaggle: Coupon Purchase PredictionKaggle: Coupon Purchase Prediction
Kaggle: Coupon Purchase Prediction
 
Machine learning y Kaggle
Machine learning y KaggleMachine learning y Kaggle
Machine learning y Kaggle
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Data crunching con Spark
Data crunching con SparkData crunching con Spark
Data crunching con Spark
 
Introducción al Datawarehousing
Introducción al DatawarehousingIntroducción al Datawarehousing
Introducción al Datawarehousing
 
Análisis Explotatorio de Datos: Dejad que la data hable.
Análisis Explotatorio de Datos: Dejad que la data hable.Análisis Explotatorio de Datos: Dejad que la data hable.
Análisis Explotatorio de Datos: Dejad que la data hable.
 
Big Data para mortales
Big Data para mortalesBig Data para mortales
Big Data para mortales
 
Salud, dinero, amor y big data
Salud, dinero, amor y big dataSalud, dinero, amor y big data
Salud, dinero, amor y big data
 
Business Analytics: ¡La culpa es del BIG data!
Business Analytics: ¡La culpa es del BIG data!Business Analytics: ¡La culpa es del BIG data!
Business Analytics: ¡La culpa es del BIG data!
 

Dernier

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 

Dernier (20)

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

An introduction to deep reinforcement learning

  • 1. An Introduction to Deep Reinforcement Learning Vishal A. Bhalla Technical University of Munich (TUM), Germany Talk @ Big Data & Data Science Meetup | Bogotá, Colombia, 4th Sep ‘17. 1
  • 2. About Me ● Masters Student in Informatics (CS) at Technical University of Munich (TUM) ○ Major focus in Artificial Intelligence (AI) & Natural Language Understanding (NLU) ○ Applied wide range of Machine Learning (ML) algorithms in Automotive, Robotics, Medical Imaging & Security domains ● Interested in exploring Deep Reinforcement Learning (RL) methods for NLU & Dialogue Systems ● Happy to connect for collaborations on novel and challenging projects An Introduction to Deep Reinforcement Learning “Big Data & Data Science Meetup” 4th Sep 2017 @ Bogotá, Colombia Vishal Bhalla, Student M Sc. Informatics @ TUM 2
  • 3. Agenda ● Introduction ● Theory & Concepts ● Approaches ● Key Players & Toolkits ● Research considerations ● Envoi 3
  • 5. Motivation 5 ● Goes beyond input-output pattern recognition ● Synergy of Deep Neural Networks + Reinforcement Learning ● ‘Mapping’ sensors to actions ● Build new applications Image courtesy: OpenAI Blog on Evolution Strategies
  • 6. Major breakthrough! ● AlphaGo defeating the Go World Champion 6 Image courtesy: The Guardian Image courtesy: Twitter - Deep Mind AI
  • 7. Applications ● Learning to play Atari games from raw pixels 7 Video courtesy: YouTube @DeepMind - DQN Breakout
  • 8. Applications (2) ● Games ● Robotics ● Energy Conservation ● Healthcare ● Dialogue Systems ● Marketing 8 Video courtesy: Bipedal Walker - Evolution Strategy Variant + OpenAI Gym
  • 9. Applications (3) ● Producing flexible behaviours in simulated environments 9 GIF courtesy: Deep Mind Blog
  • 10. Applications (4) ● AI research in the real-time strategy game StarCraft II & DOTA 2 10 Image courtesy: (L) SC2LE - an RL environment based on StarCraft II from DeepMind & Blizzard and (R) A bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules
  • 12. Reinforcement Learning (RL) ● Inspired by research into animal learning ● Correct input/label pairs are never presented ● Focus is on on-line performance ● Used in environments where, ○ No analytic solution ○ Simulation Model ○ Interaction only ● Eg: Making robots learn, how to walk ○ Reward: Head position 12
  • 14. Markov Decision Processes (MDPs) 14 ● State transition model p(st+1 | st , at ) where, s - state & a - action ● Reward p(rt+1 | st , at ) ○ Depends on the current state and the action performed ● Discount factor ∈ [0,1] ○ Controls the importance of future rewards A simple MDP Image courtesy: Wikipedia
  • 15. Policy ● Agent - Choice of which action to perform ● Policy - Function of current environment state ● Action - Returns the best one ● Deterministic vs Stochastic environment 15
  • 16. Rewards ● Agent’s goal: Pick best policy that maximises total reward ● Naive approach - Sum up rewards at each time step where, T is the horizon (episode length) which can be infinity ● Discount factor importance ○ Reward doesn’t go to infinity as 0 ≤ ≤ 1 ○ Preference for immediate rewards 16
  • 17. Brute force ● 2 main steps ○ Sample returns after following each policy ○ Chose one with largest expected return ● Issues ○ Large or infinite policies ○ Large no. of samples required to handle variance of returns ● Solutions ○ Give some structure ○ Allow samples of one policy to influence estimates of other 17
  • 18. Types 18 ● Model based 1. Agent knows the MDP model 2. Agent uses it to (offline) plan actions before any interactions with environment 3. Eg: Value-iteration & policy-iteration ● Model Free 1. Initial knowledge about possible state-actions but not MDP model 2. Improves (online) through learning from the interactions with the environment 3. Eg: Q-Learning
  • 19. Value Function ● Goodness of a state ● Expected total reward from start state s ● Depends on the policy ● There exists an optimal value function with the highest value ● Optimal policy * 19
  • 20. Value Iteration ● Iteratively compute optimal state value function V(s) ● Guaranteed to converge to optimal values 20
  • 21. Policy Iteration ● Re-define the policy at each step ● Compute value function for this new policy until the policy converges ● Guaranteed to converge 21
  • 22. Value vs Policy Iteration ● Used for Offline planning ○ Prior knowledge about MDP ● Policy Iteration is computationally efficient compared to Value Iteration ○ Takes fewer iterations to converge ○ However, each iteration is computationally expensive 22
  • 23. Q Learning ● Model free ● Quality of certain action in given state ● Q(st ,at ) = maxπ Rt+1 such that π(s) = argmaxa Q(s,a) ● Bellman equation ○ Q(s,a) = r + γ.maxa’ Q(s′,a′) ● Iterative Algorithm ● Q-function will converge and represent the true Q-value 23
  • 25. Deep Q-Learning ● Q-Learning uses tables to store data ● Combine function approximation with Neural Networks ● Eg: Deep RL for Atari Games ● 1067970 rows in our imaginary Q-table, more than the no. of atoms in the known universe! ● Other variants ○ Double DQN to correct over-estimated action values ○ Online version: Delayed Q-Learning with PAC ○ Greedy, Speedy Q-Learning, etc. 25
  • 26. Deep Q Network ● Only game screens (and action) as input ● Output Q-value for each possible action ● One Forward pass ● CNN - No pooling 26 State Action Neural Network Q-Value State Neural Network Q-Value1 Q-Value1 Q-Value1 Naive formulation of deep Q-network. Optimized architecture of deep Q-network (first used in DeepMind paper)
  • 27. Policy Gradients ● Policy p has a set of ‘n’ real valued parameters q = {q1 , q2 , …, qn } ● Calculate the reward gradient qi ∀ i q ← qi + qi R R ● Same as Supervised Learning ● Safe exploration and faster than value based methods ● Locally best parameter ● Parameterised policy & high dimensional space ● Advantage - ∑i Ai logp(yi ∣xi ) 27
  • 28. Actor-Critic Algorithms ● Agent uses the Value estimate (critic) to update the Policy (actor) ● Value function as a baseline for policy gradients ● Utilise a learned value function. 28 Actor-Critic
  • 29. Asynchronous Advantage Actor-Critic (A3C) ● A3C utilizes multiple Worker agents ● Speedup & Diverse Experience ● Combines benefits of Value & Policy Iteration ● Continuous & Discrete action spaces 29 Images(L-R): A3C: Training workflow of each worker agent (L) and High-level architecture (R)
  • 32. Dialogue Systems: Interactive RL 32 ● Conversational flow. ● Concept of delayed reward fits well to Dialogue ICLR 2017 by FAIR: Learning Through Dialogue Interactions By Asking Questions
  • 33. Dialogue Systems: Deep RL 33 ● Actor-Critic method ● 2 Stage training → Supervised Learning + RL ○ Supervised → Mimic human behaviour ○ RL → Handle unforeseen situations ● User simulations for training ● Infinite state space of probability distributions ● Dialogue act-slot type combinations Image courtesy: Maluuba: Applying Deep Reinforcement Learning to Dialogue Management
  • 36. Labs & Groups ● Berkeley Artificial Intelligence Research (BAIR) Lab ○ UC Berkeley EE Department ● Univ. of Alberta, Edmonton, Canada ○ Deep Mind’s 1st international office 36 Richard Sutton, Michael Bowling and Patrick Pilarski @Univ of Alberta Image courtesy: Deep Mind Blog
  • 37. Researchers ● Prof. Peter Abeel, Sergey Levine & Chelsea Finn ○ BAIR, UC Berkeley EE Dept. ● Rich Sutton ○ Univ of Alberta ● David Silver, Oriol Vinyals & Vlad Mnih ○ Google DeepMind ● Ilya Sutskever, Rocky Duan & John Schulman ○ Open AI ● Jason Weston ○ Facebook AI Research (FAIR) 37 Chelsea Finn, Sergey Levine & Peter Abeel from UC Berkeley. Image courtesy: The New York Times
  • 38. Tools ● High-quality implementations of reinforcement learning algorithms ○ OpenAI Baselines ○ ChainerRL ● Environments with a set of test problems to write & evaluate RL algorithms ○ OpenAI Gym ○ RLLab 38
  • 40. Experience Replay ● Problem: ○ Approximate Q-functions using a CNN ○ Non-linearity is not stable and takes time to converge ● Trick: ○ Store all experiences < s, a, r, s’ > in a replay memory ○ Use random mini-batches from it ○ Avoids local minimum by breaking similarity between subsequent training samples ○ Makes it similar to Supervised Learning 40
  • 41. Exploration vs Exploitation? ● Should the agent, ○ Trust the learnt Q values for every action? Or ○ Try other actions which might give a better reward ● Q-learning algorithm incorporates a greedy exploration ● Fix: -greedy approach! ○ Pick a random action (explore) with probability Or ○ Select an action according to current Q-values with probability (1- ) ○ Decrease over time as agent becomes confident 41
  • 42. Genetic Algorithm ● Evolutionary Computations family of AI ● Meta-heuristic optimization method ● Requirements ○ Represent as string of chromosomes (array of bits) ○ Fitness function to evaluate solutions ● Steps ○ Generation - Pool of candidate solutions ○ Next Gen- candidate sol with higher fitness value ■ Selection ■ Crossover ■ Mutation ○ Iterate till solution with goal fitness value 42 Image courtesy: The Genetic Algorithm - Explained
  • 43. Evolution Strategies ● Black-box stochastic optimization ● Fit ‘n’ no. of parameters to a single reward function ● Tweak and guess iteratively ● Tradeoff vs RL ○ No need for backpropagation ○ Highly parallelizable ○ Higher robustness. ○ Structured exploration. ○ Credit assignment over long time scales ● https://blog.openai.com/evolution-strategies/ 43
  • 44. Exploration with Parameter noise ● Traditional RL uses action space noise ● Parameter space noise injects randomness directly into the parameters of the agent ● A middle ground between Evolution Strategies & Deep RL 44 Image courtesy: Better Exploration with Parameter Noise
  • 45. Current Research & Other Challenges ● Model-based RL ● Inverse RL & Imitation Learning - Makes use of GAN’s ● Hierarchical (of policies) RL ● Multi-agent RL (MARL) ● Memory & Attention ● Transfer Learning ● Benchmarks 45
  • 47. Summary ● Stable and scalable RL is possible ● Deep networks represent value, policy and model ● Applications - Games, Robotics, Dialogue Systems, etc. ● Lot of hacks and advanced Deep RL paradigms required still ● Observing the agent is a rewarding experience! 47
  • 48. References ● Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. Human-level control through deep reinforcement learning. [MnihDQN16] In Nature 518, no. 7540 (2015): 529-533. ● Mnih, Volodymyr, Adria P. Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, & Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. [MnihA3C16] In International Conference on Machine Learning, pp. 1928-1937. 2016. ● Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage and Anil Anthony Bharath. A Brief Survey of Deep Reinforcement Learning. [KaiDeepRLSurvey17] In IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image Understanding. ● Wang, Ziyu, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. Sample efficient actor-critic with experience replay. [WangACExpReplay17] In arXiv preprint arXiv:1611.01224 (2016). 48
  • 49. Additional Links ● Blogs ○ Deep RL (Episode 0-2) blog series by Moustafa Alzantot ○ Demystifying Deep RL guest post by Tambet Matiisen at Intel-Nervana Systems ○ Maluuba’s blog on Deep RL for Dialogue Systems ○ Simple Reinforcement Learning with Tensorflow 8 Part Series by Arthur Juliani ○ Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy ● Tutorials ○ David Silver's Deep RL video-lectures ○ Tutorial on Deep RL by Sergey Levine & Chelsea Finn at ICML 2017 ○ Deep RL Bootcamp in Berkeley, California USA 49