[DL輪読会]Deep Reinforcement Learning that Matters

•

8 likes•3,870 views

Deep Learning JP

2017/12/8 Deep Learning JP: http://deeplearning.jp/seminar-2/

Technology

1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Deep Reinforcement Learning that Matters
Reiji Hatsugai

11
!"~$(&|(")
("*+~,(-.|(", !")
0"*+ = 0((", !", ("*+)

12
!"~$(&|(")
("*+~,(-.|(", !")
0"*+ = 0((", !", ("*+)
$
π∗
= argmax
π
Eπ [ γ τ
rτ ]
τ =0
∞
∑

13
TRPO
DQN DDQN
A3C
UNREAL PCL
ACER
PPO
Q-Prop
IPG
ACKTR
DDPG
D4PG
SAC
Soft Q

14
TRPO
DQN DDQN
A3C
UNREAL PCL
ACER
PPO
Q-Prop
IPG
ACKTR
DDPG
D4PG
SAC
Soft Q
『『深深層層』』強強化化学学習習ににななっっててかからら
たたくくささんんのの手手法法がが開開発発さされれたた

Deep Reinforcement Learning that Matters
• ICML2017 reproducibility work shop Reproducibility of
Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
• AAAI2018 accepted
•
–
–
•
•
16

Deep Reinforcement Learning that Matters
•
– ACKTR (Wu et al. 2017)
– PPO (Schulman et al. 2017)
– DDPG (Lillicrap et al. 2015)
– TRPO (Schulman et al. 2015)
• ACKTR, PPO
• DDPG, TRPO baseline
•
17

Deep Reinforcement Learning that Matters
• Network Architecture
• Reward Scale
• Random Seeds and Trials
• Environments
• Codebases
• Reporting Evaluation Metrics
18

Network Architecture
•
– (64, 64) (rllab)
– (100, 50, 25) (Q-Prop)
– (400, 300) (DDPG)
•
• Activation Function
21

Network Architecture
• PPO
• Tanh
• PPO
• “This also suggests a possible need for hyper parameter agnostic algorithms”
•
24

Reward Scale
• Q DQN cliping
• 0.
= 20
• σ=0.1
•
LeCun et al .2012; Glorot and Bengio 2010; Vincent, de Brebisson, and Bouthillier 2015
•
25

Reward Scale
• Reward Scale
•
• Reward Scale
• Layer norm
• Learning values across many orders of magnitude (Hado van Hasselt et al. 2016)
– adaptive
• HumanoidStandup-v1 100
– Reward Scale
27

Random Seeds and Trials
• 10 seed
• 10 5 5
•
29

Random Seeds and Trials
• 2
–
–
•
seed
• power analysis
•
33

Environment
• Hopper, HalfCheetah, Swimmer, Walker2D
•
34

HalfCheetah
• HalfCheetah DDPG
• Hopper DDPG
• Reproducibility of Benchmarked Deep
Reinforcement Learning Tasks for Continuous Control
• DDPG Q
• HalfCheetah DDPG DDPG base
HalfCheetah unfair
37

Swimmer
• TRPO
• policy local optimal
•
•
39

Code base
• TRPO DDPG rllab, baseline
•
40

Code base
•
• dramatic impacts on performance
•
42

Reporting Evaluation Metrics
•
•
•
–
–
–
43

Deep Reinforcement Learning that Matters
•
•
–
–
–
–
•
– hyperparameters agnostic algorithm
• “There is often no clear winner among all benchmark environments.”
44

• HalfCheetah Hopper DDPG
stable, unstable
• task difficulty algorithm
• Simple Nearest Neighbor Policy Method for Continuous Control Tasks
– Nearest Neighbor Policy
– task difficulty task
– NN task
45

• NN-1, NN-2
•
• NN-1
1.
2. action
• NN-2
1.
2. action 1step 1
• Sparse reward
46

Simple Nearest Neighbor
• Sparse Mountain Car
• HalfCheetah
• HalfCheetah
• task difficulty
• ICLR3,4,4
• NNPolicy
48

•
HalfCheetah
•
–
– sensor
• 3 MLP
• Towards Generalization and Simplicity in Continuous Control
– Policy parameterize RBF
– Natural Gradient
– Neural Net humanoid
– mujoco Todorov Natural Gradient Kakade 49

Towards Generalization and Simplicity in Continuous Control
50

•
• sensor DeepLearning
•
•
– sparse reward
–
• IL, IRL??
–
normalize
51

What's hot

多様な強化学習の概念と課題認識佑甲野

報酬設計と逆強化学習Yusuke Nakata

強化学習の基礎的な考え方と問題の分類佑甲野

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP

[DL輪読会]大規模分散強化学習の難しい問題設定への適用Deep Learning JP

【DL輪読会】論文解説：Offline Reinforcement Learning as One Big Sequence Modeling ProblemDeep Learning JP

【DL輪読会】Mastering Diverse Domains through World ModelsDeep Learning JP

【DL輪読会】マルチエージェント強化学習における近年の協調的方策学習アルゴリズムの発展Deep Learning JP

Decision Transformer: Reinforcement Learning via Sequence ModelingYasunori Ozaki

方策勾配型強化学習の基礎と応用Ryo Iwaki

SSII2021 [TS2] 深層強化学習〜強化学習の基礎から応用まで〜SSII

強化学習の基礎と深層強化学習（東京大学松尾研究室深層強化学習サマースクール講義資料）Shota Imai

強化学習その1nishio

[DL輪読会]`強化学習のための状態表現学習－より良い「世界モデル」の獲得に向けて－Deep Learning JP

深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜Jun Okumura

Introduction to A3C modelWEBFARMER. ltd.

[DL輪読会]Reinforcement Learning with Deep Energy-Based PoliciesDeep Learning JP

深層生成モデルと世界モデル（2020/11/20版）Masahiro Suzuki

強化学習の分散アーキテクチャ変遷Eiji Sekiya

A3C解説harmonylab

What's hot (20)

多様な強化学習の概念と課題認識

報酬設計と逆強化学習

強化学習の基礎的な考え方と問題の分類

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...

[DL輪読会]大規模分散強化学習の難しい問題設定への適用

【DL輪読会】論文解説：Offline Reinforcement Learning as One Big Sequence Modeling Problem

【DL輪読会】Mastering Diverse Domains through World Models

【DL輪読会】マルチエージェント強化学習における近年の協調的方策学習アルゴリズムの発展

Decision Transformer: Reinforcement Learning via Sequence Modeling

方策勾配型強化学習の基礎と応用

SSII2021 [TS2] 深層強化学習〜強化学習の基礎から応用まで〜

強化学習の基礎と深層強化学習（東京大学松尾研究室深層強化学習サマースクール講義資料）

強化学習その1

[DL輪読会]`強化学習のための状態表現学習－より良い「世界モデル」の獲得に向けて－

深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜

Introduction to A3C model

[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies

深層生成モデルと世界モデル（2020/11/20版）

強化学習の分散アーキテクチャ変遷

A3C解説

Similar to [DL輪読会]Deep Reinforcement Learning that Matters

pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"YeChan(Paul) Kim

Hadoop londonYahoo Developer Network

India software developers conference 2013 BangaloreSatnam Singh

Demystifying deep reinforement learning재연 윤

Deep Convolutional GANs - meaning of latent spaceHansol Kang

A Workshop on RAjay Ohri

Developing in R - the contextual Multi-Armed Bandit editionRobin van Emden

Imitation Learning for Autonomous Driving in TORCSPreferred Networks

Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей

SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"Inhacking

Face recognition v1San Kim

Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA

Cassandra drivers and librariesDuyhai Doan

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB

機械学習モデルの判断根拠の説明Satoshi Hara

R for hadoopersGwen (Chen) Shapira

Training in Analytics, R and Social Media AnalyticsAjay Ohri

IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks

Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Databricks

MySQL Performance Monitoringspil-engineering

Similar to [DL輪読会]Deep Reinforcement Learning that Matters (20)

pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"

Hadoop london

India software developers conference 2013 Bangalore

Demystifying deep reinforement learning

Deep Convolutional GANs - meaning of latent space

A Workshop on R

Developing in R - the contextual Multi-Armed Bandit edition

Imitation Learning for Autonomous Driving in TORCS

Valerii Vasylkov Erlang. measurements and benefits.

SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"

Face recognition v1

Getting started with Spark & Cassandra by Jon Haddad of Datastax

Cassandra drivers and libraries

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...

機械学習モデルの判断根拠の説明

R for hadoopers

Training in Analytics, R and Social Media Analytics

IIBMP2019 講演資料「オープンソースで始める深層学習」

Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...

MySQL Performance Monitoring

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

How to convert PDF to text with Nanonetsnaman860154

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Histor y of HAM Radio presentation slidevu2urc

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Developing An App To Navigate The Roads of BrazilV3cube

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

A Domino Admins Adventures (Engage 2024)Gabriella Davis

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

How to convert PDF to text with Nanonets

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Handwritten Text Recognition for manuscripts and early printed texts

Data Cloud, More than a CDP by Matt Robison

Axa Assurance Maroc - Insurer Innovation Award 2024

Boost PC performance: How more available memory can improve productivity

Histor y of HAM Radio presentation slide

Salesforce Community Group Quito, Salesforce 101

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Exploring the Future Potential of AI-Enabled Smartphone Processors

Developing An App To Navigate The Roads of Brazil

08448380779 Call Girls In Civil Lines Women Seeking Men

Finology Group – Insurtech Innovation Award 2024

A Domino Admins Adventures (Engage 2024)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Injustice - Developers Among Us (SciFiDevCon 2024)

[DL輪読会]Deep Reinforcement Learning that Matters

1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Deep Reinforcement Learning that Matters Reiji Hatsugai

2. • – – • • difficulty • • 2

3. 3

4. 4

5. : HalfCheetah 5

6. : Hopper 6

7. 7

8. 8

9. 9

10. 10

11. 11 !"~$(&|(") ("*+~,(-.|(", !") 0"*+ = 0((", !", ("*+)

12. 12 !"~$(&|(") ("*+~,(-.|(", !") 0"*+ = 0((", !", ("*+) $ π∗ = argmax π Eπ [ γ τ rτ ] τ =0 ∞ ∑

13. 13 TRPO DQN DDQN A3C UNREAL PCL ACER PPO Q-Prop IPG ACKTR DDPG D4PG SAC Soft Q

14. 14 TRPO DQN DDQN A3C UNREAL PCL ACER PPO Q-Prop IPG ACKTR DDPG D4PG SAC Soft Q 『『深深層層』』強強化化学学習習ににななっっててかかららたたくくささんんのの手手法法がが開開発発さされれたた

15. • 1. 2. 3. 4. 15

16. Deep Reinforcement Learning that Matters • ICML2017 reproducibility work shop Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control • AAAI2018 accepted • – – • • 16

17. Deep Reinforcement Learning that Matters • – ACKTR (Wu et al. 2017) – PPO (Schulman et al. 2017) – DDPG (Lillicrap et al. 2015) – TRPO (Schulman et al. 2015) • ACKTR, PPO • DDPG, TRPO baseline • 17

18. Deep Reinforcement Learning that Matters • Network Architecture • Reward Scale • Random Seeds and Trials • Environments • Codebases • Reporting Evaluation Metrics 18

19. Deep Reinforcement Learning that Matters • Network Architecture • Reward Scale • Random Seeds and Trials • Environments • Codebases • Reporting Evaluation Metrics 19 外因的なもの

20. Deep Reinforcement Learning that Matters • Network Architecture • Reward Scale • Random Seeds and Trials • Environments • Codebases • Reporting Evaluation Metrics 20 内因的なもの

21. Network Architecture • – (64, 64) (rllab) – (100, 50, 25) (Q-Prop) – (400, 300) (DDPG) • • Activation Function 21

22. Policy Architecture 22

23. Activation Function 23

24. Network Architecture • PPO • Tanh • PPO • “This also suggests a possible need for hyper parameter agnostic algorithms” • 24

25. Reward Scale • Q DQN cliping • 0. = 20 • σ=0.1 • LeCun et al .2012; Glorot and Bengio 2010; Vincent, de Brebisson, and Bouthillier 2015 • 25

26. Reward Scale 26

27. Reward Scale • Reward Scale • • Reward Scale • Layer norm • Learning values across many orders of magnitude (Hado van Hasselt et al. 2016) – adaptive • HumanoidStandup-v1 100 – Reward Scale 27

28. Deep Reinforcement Learning that Matters • Network Architecture • Reward Scale • Random Seeds and Trials • Environments • Codebases • Reporting Evaluation Metrics 28 内因的なもの

29. Random Seeds and Trials • 10 seed • 10 5 5 • 29

30. Random Seeds and Trials 30

31. Random Seeds and Trials 31

32. Random Seeds and Trials 32 <0.05

33. Random Seeds and Trials • 2 – – • seed • power analysis • 33

34. Environment • Hopper, HalfCheetah, Swimmer, Walker2D • 34

35. HalfCheetah 35

36. Hopper 36

37. HalfCheetah • HalfCheetah DDPG • Hopper DDPG • Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control • DDPG Q • HalfCheetah DDPG DDPG base HalfCheetah unfair 37

38. Swimmer 38

39. Swimmer • TRPO • policy local optimal • • 39

40. Code base • TRPO DDPG rllab, baseline • 40

41. Code base 41

42. Code base • • dramatic impacts on performance • 42

43. Reporting Evaluation Metrics • • • – – – 43

44. Deep Reinforcement Learning that Matters • • – – – – • – hyperparameters agnostic algorithm • “There is often no clear winner among all benchmark environments.” 44

45. • HalfCheetah Hopper DDPG stable, unstable • task difficulty algorithm • Simple Nearest Neighbor Policy Method for Continuous Control Tasks – Nearest Neighbor Policy – task difficulty task – NN task 45

46. • NN-1, NN-2 • • NN-1 1. 2. action • NN-2 1. 2. action 1step 1 • Sparse reward 46

47. NN 47

48. Simple Nearest Neighbor • Sparse Mountain Car • HalfCheetah • HalfCheetah • task difficulty • ICLR3,4,4 • NNPolicy 48

49. • HalfCheetah • – – sensor • 3 MLP • Towards Generalization and Simplicity in Continuous Control – Policy parameterize RBF – Natural Gradient – Neural Net humanoid – mujoco Todorov Natural Gradient Kakade 49

50. Towards Generalization and Simplicity in Continuous Control 50

51. • • sensor DeepLearning • • – sparse reward – • IL, IRL?? – normalize 51

[DL輪読会]Deep Reinforcement Learning that Matters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [DL輪読会]Deep Reinforcement Learning that Matters

Similar to [DL輪読会]Deep Reinforcement Learning that Matters (20)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (20)

[DL輪読会]Deep Reinforcement Learning that Matters