2. Value Iteration and Q-learning
• Model-free control: iteratively optimise value function and policy
•
3. Value Function Approximation
• “Lookup table” is not practical
• generalize to unobserved states
• handle large state/action space (and continuous state/action)
• Transform to supervised learning problem
• model(hypothesis space)
• Loss/cost function
• optimization
• iid assumption
• RL is unstable/divergent when action-value Q function is approximated
with a nonlinear function like neural networks
• states are correlated & data distribution changes + complex model
4. Deep Q-Network
• First step towards “General Artificial Intelligence”
• DQN = Q-learning + Function Approximation + Deep Network
• Stabilize training with experience replay and target network
• End-to-end RL approach, and quite flexible
10. Prioritized Experience Replay
• Motivation: more frequently replay transitions
with high information
• Key components
• criterion of importance: TD error
• stochastic prioritization instead of greedy
• Importance sampling to avoid bias
13. Dueling Architecture - Motivation
• Motivation: for many states, estimation of state value is more important,
comparing with state-action value
• Better approximate state value, and leverage power of advantage function
14. Dueling Architecture - Details
• Adopt to existing DQN algorithms (output of dueling
network is still Q function)
• Estimate value function and advantage function
separately, and combine them to estimate action
value function
• In Back-propagation: the estimates value function
and Advantage function are computed automatically
15. Dueling Architecture - Performance
• Converge faster
• More robust (differences
between Q-values for a
given state are small, so
noise could make the nearly
greedy policy switch
abruptly)
• Achieve better performance
on Atari games (advantage
grows when the number of
actions is large)
16. More variants
• Continuous action control + DQN
• NAF: continuous variant of Q-learning algorithm
• DDPG: Deep DPG
• Asynchronous Methods + DQN
• multiple agents in parallel + parameter server
17. Reference
• Playing atari with deep reinforcement learning
• Human-level control through deep reinforcement learning
• Deep Reinforcement Learning with Double Q-learning
• Prioritized Experience Replay
• Dueling Network Architectures for Deep Reinforcement Learning
• Asynchronous methods for deep reinforcement learning
• Continuous control with deep reinforcement learning
• Continuous Deep Q-Learning with Model-based Acceleration
• Double Q learning
• Deep Reinforcement Learning - An Overview