SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Deep Q-Network
guodong
Value Iteration and Q-learning
• Model-free control: iteratively optimise value function and policy
•
Value Function Approximation
• “Lookup table” is not practical
• generalize to unobserved states
• handle large state/action space (and continuous state/action)
• Transform to supervised learning problem
• model(hypothesis space)
• Loss/cost function
• optimization
• iid assumption
• RL is unstable/divergent when action-value Q function is approximated
with a nonlinear function like neural networks
• states are correlated & data distribution changes + complex model
Deep Q-Network
• First step towards “General Artificial Intelligence”
• DQN = Q-learning + Function Approximation + Deep Network
• Stabilize training with experience replay and target network
• End-to-end RL approach, and quite flexible
DQN Algorithm
Practical Tips
• stable training: experiment replay(1M)+ fixed target
• mini-batch
• E&E with decremental epsilon greedy parameter (1.0 to 0.1)
• input of Q-NETWORK includes 4 recent frames
• skip frames
• discounted reward with 0.99
• use RMSProp instead of SGD
DQN variants
• Double DQN
• Prioritized Experience Replay
• Dueling Architecture
• Asynchronous Methods
• Continuous DQN
Double Q-learning
• Motivation: reduce overestimation by decomposing the
max operation in the target into action selection and
action evaluation
Double DQN
• From Double Q-learning to DDQN
Prioritized Experience Replay
• Motivation: more frequently replay transitions
with high information
• Key components
• criterion of importance: TD error
• stochastic prioritization instead of greedy
• Importance sampling to avoid bias
Algorithm
Performance compare
Dueling Architecture - Motivation
• Motivation: for many states, estimation of state value is more important,
comparing with state-action value
• Better approximate state value, and leverage power of advantage function
Dueling Architecture - Details
• Adopt to existing DQN algorithms (output of dueling
network is still Q function)
• Estimate value function and advantage function
separately, and combine them to estimate action
value function
• In Back-propagation: the estimates value function
and Advantage function are computed automatically
Dueling Architecture - Performance
• Converge faster
• More robust (differences
between Q-values for a
given state are small, so
noise could make the nearly
greedy policy switch
abruptly)
• Achieve better performance
on Atari games (advantage
grows when the number of
actions is large)
More variants
• Continuous action control + DQN
• NAF: continuous variant of Q-learning algorithm
• DDPG: Deep DPG
• Asynchronous Methods + DQN
• multiple agents in parallel + parameter server
Reference
• Playing atari with deep reinforcement learning
• Human-level control through deep reinforcement learning
• Deep Reinforcement Learning with Double Q-learning
• Prioritized Experience Replay
• Dueling Network Architectures for Deep Reinforcement Learning
• Asynchronous methods for deep reinforcement learning
• Continuous control with deep reinforcement learning
• Continuous Deep Q-Learning with Model-based Acceleration
• Double Q learning
• Deep Reinforcement Learning - An Overview

Contenu connexe

Tendances

Tendances (20)

Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. Introduction
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 

Similaire à DQN (Deep Q-Network)

Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Soheila Dehghanzadeh
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
neju3
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 

Similaire à DQN (Deep Q-Network) (20)

DQN Variants: A quick glance
DQN Variants: A quick glanceDQN Variants: A quick glance
DQN Variants: A quick glance
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 

Plus de Dong Guo (8)

Convex optimization methods
Convex optimization methodsConvex optimization methods
Convex optimization methods
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
 
机器学习概述
机器学习概述机器学习概述
机器学习概述
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 

Dernier

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

DQN (Deep Q-Network)

  • 2. Value Iteration and Q-learning • Model-free control: iteratively optimise value function and policy •
  • 3. Value Function Approximation • “Lookup table” is not practical • generalize to unobserved states • handle large state/action space (and continuous state/action) • Transform to supervised learning problem • model(hypothesis space) • Loss/cost function • optimization • iid assumption • RL is unstable/divergent when action-value Q function is approximated with a nonlinear function like neural networks • states are correlated & data distribution changes + complex model
  • 4. Deep Q-Network • First step towards “General Artificial Intelligence” • DQN = Q-learning + Function Approximation + Deep Network • Stabilize training with experience replay and target network • End-to-end RL approach, and quite flexible
  • 6. Practical Tips • stable training: experiment replay(1M)+ fixed target • mini-batch • E&E with decremental epsilon greedy parameter (1.0 to 0.1) • input of Q-NETWORK includes 4 recent frames • skip frames • discounted reward with 0.99 • use RMSProp instead of SGD
  • 7. DQN variants • Double DQN • Prioritized Experience Replay • Dueling Architecture • Asynchronous Methods • Continuous DQN
  • 8. Double Q-learning • Motivation: reduce overestimation by decomposing the max operation in the target into action selection and action evaluation
  • 9. Double DQN • From Double Q-learning to DDQN
  • 10. Prioritized Experience Replay • Motivation: more frequently replay transitions with high information • Key components • criterion of importance: TD error • stochastic prioritization instead of greedy • Importance sampling to avoid bias
  • 13. Dueling Architecture - Motivation • Motivation: for many states, estimation of state value is more important, comparing with state-action value • Better approximate state value, and leverage power of advantage function
  • 14. Dueling Architecture - Details • Adopt to existing DQN algorithms (output of dueling network is still Q function) • Estimate value function and advantage function separately, and combine them to estimate action value function • In Back-propagation: the estimates value function and Advantage function are computed automatically
  • 15. Dueling Architecture - Performance • Converge faster • More robust (differences between Q-values for a given state are small, so noise could make the nearly greedy policy switch abruptly) • Achieve better performance on Atari games (advantage grows when the number of actions is large)
  • 16. More variants • Continuous action control + DQN • NAF: continuous variant of Q-learning algorithm • DDPG: Deep DPG • Asynchronous Methods + DQN • multiple agents in parallel + parameter server
  • 17. Reference • Playing atari with deep reinforcement learning • Human-level control through deep reinforcement learning • Deep Reinforcement Learning with Double Q-learning • Prioritized Experience Replay • Dueling Network Architectures for Deep Reinforcement Learning • Asynchronous methods for deep reinforcement learning • Continuous control with deep reinforcement learning • Continuous Deep Q-Learning with Model-based Acceleration • Double Q learning • Deep Reinforcement Learning - An Overview