[DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.

•

3 likes•3,209 views

Deep Learning JP

ublished on Jan 15, 2021 2021/01/15 Deep Learning JP: http://deeplearning.jp/seminar-2/

Technology

1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Why Deep RL fails? A brief survey of recent works.
Presenter: Kei Ota (@ohtake_i).

Deep Reinforcement Learning that Matters
•
–
–
3
Deep RL

Deadly Triad
•
– 1G BA A8 D B 1
– g P it RW Pehg
– y R hg
– s w uM ..Ni P L s ]iw Vg
– g df c hg
– x ao gMv N [ pi P r Vg
– mkl f d ar Vg RnS g R g
4
2B B8K KD A BD K G GB G 8 1 D D B AA A BAB BG A 3 D D D A 0 8 D K A D I 0 A BD A , DA A . 1

Deadly Triad
•
– -O 2
i l
– ! "# = 1, ! "' = 2 > o !
– l ) " = * × ! -) "# = *, ) "' = 2*
– D )("#) c
• . > 1/2 * p
•
– T l c > l c
– > l f >
5

Deadly Triad
•
– t D Qs
– O N p
•
– T 22 -
• e d Qo s
– n g iB
– e d Mr MC B
•
– 2 Bl r Q
• y N giac B
•
– B QO D
6

Deep Reinforcement Learning and the Deadly Triad
•
•
– - 3 1 0 = 80
•
– a 8 N M d 83 - , ,
•
– e D8
•
7

Deep Reinforcement Learning and the Deadly Triad
•
. -1 b D 4 D 6 6 1 Q g b
, , B A3 : 23A A i
, 1 a i g
, y osrxi g e lko T Q
( . urt mnlp Q c
) Q rw a i Q g e
L- 36 2A:36a a i Oi Ofe g d
a a . .D 5 : A F: 3 : , , B A3 :
:5 a i g
8

Deep Reinforcement Learning and the Deadly Triad
– g s : d c b
– no Qi 0 e
– 0 1 - e
• 0 1 - 0 f a sm
• r ! = 0.99 c e 1/ 1 − ! = 100
9

Deep Reinforcement Learning and the Deadly Triad
– T
D -
–
– D
10

Deep Reinforcement Learning and the Deadly Triad
– b
– a D - Q
– D - - Q T
11

Deep Reinforcement Learning and the Deadly Triad
– pMgbac s LyYl [Yok R W Y x
– Mijdcfeh ! t EB F H D A You
– rmMijdcfeh ! P EB F H D A You s
• ] vWn 2 E w
12
0 E .D F A D A E DI BE A B B , DB - D BI A 0 A H E F A A AF E B C D A C -,

Deep Reinforcement Learning and the Deadly Triad
– - s c n f r
– - o d
– - e i
• e g
•
13

Deep Reinforcement Learning and the Deadly Triad
– R P E dc e g i 2
– g ! ∈ {0,1,2} f
– dc g - /
14

Deep Reinforcement Learning and the Deadly Triad
– - - - D Q R
– E M Q DT
– P ! E D
15

Diagnosing Bottlenecks in Deep Q-learning Algorithms
•
F 2 . 1 4 B3
3 42
5 4
2 O 2 B3
: : 2 3
16

Diagnosing Bottlenecks in Deep Q-learning Algorithms
– J MT D
– C D
– C
• C
17

Diagnosing Bottlenecks in Deep Q-learning Algorithms
– >R i nD la r
– >R o D u T - p
• 4 6532 f OR ,063 110 e c u
– >
• T oB - i n
• nD O c la r
18

Diagnosing Bottlenecks in Deep Q-learning Algorithms
– W cv z u ti sm [
fn ( kwyxz u pem [khl
– W n ∈ {0.5, 1.0, 2.0, 4.0, 8.0} j bg u
[ f k h A=MH OKLLE C drY M= HA AOPM AHHI= -MMKM
– W n l ad go ]d go i[ .A P 2PI=M o
A=MH OKLLE Cl u
19EHHE=I .A P M=FEO =I= D= M= E D= D )C=MR=H K DP= A CEK 0PCK =MK DAHHA =MG KRH= EHH ,= A A E EOE C .P =IA O=H KB -SLAMEA A ALH= 1
) EM=H 2PI=M E D= D )C=MR=H ,E = /DK D = AMCA A E A 1ILHE EO P AM L=M=IAOAME =OEK E DE EO =O= ABBE EA O AAL MAE BKM AIA O HA=M E C 1

Diagnosing Bottlenecks in Deep Q-learning Algorithms
–
• - ! :
• - -
–
•
• - -
20

Diagnosing Bottlenecks in Deep Q-learning Algorithms
•
– - - -
- PQ D E T
– M D E
– R a D ! Q E
21

Revisiting Fundamentals of Experience Replay
•
– 04 1 ,1 12
– 4 :5 /: 2 K Rg fe nc a
– 04 1 01 : l pd d
• 1 A 4 1 1 : * : 4= : 2 * . * -
• 1 o RiK : 4= : 2 M - P : : 2 KC
22

Revisiting Fundamentals of Experience Replay
•
– 4.no
• p t s daf cb
–
21 32 . 2 32 . m
– m l w i R
2 32 . A
3 4 .2 daf A
– R A bd r y gae bd f
233 4 .2 .1 2 3 4 24 1 21 4. -

Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– - - - I
– :
24

Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– O R
– L
• O
• = /
• T
25

Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– /
–
•
• = O
• LR
26

Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– L M D T
– D BC S
• S
27

Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
–
28

D2RL: Deep Dense Architectures in Reinforcement Learning
•
–
–
29

31
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Why Deep RL fails? A brief survey of recent works.
Presenter: Kei Ota.

What's hot

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP

[DL輪読会]Temporal Abstraction in NeurIPS2019Deep Learning JP

【DL輪読会】論文解説：Offline Reinforcement Learning as One Big Sequence Modeling ProblemDeep Learning JP

[DL輪読会]AlphaStarとその関連技術Deep Learning JP

[DL輪読会]Grandmaster level in StarCraft II using multi-agent reinforcement lear...Deep Learning JP

POMDP下での強化学習の基礎と応用Yasunori Ozaki

強化学習の分散アーキテクチャ変遷Eiji Sekiya

強化学習アルゴリズムPPOの解説と実験克海納谷

Active Learning 入門Shuyo Nakatani

[DL輪読会]Control as Inferenceと発展Deep Learning JP

[DL輪読会]近年のエネルギーベースモデルの進展Deep Learning JP

Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Preferred Networks

[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介Deep Learning JP

Introduction to Prioritized Experience ReplayWEBFARMER. ltd.

強化学習における好奇心Shota Imai

強化学習と逆強化学習を組み合わせた模倣学習Eiji Uchibe

猫でも分かるVariational AutoEncoderSho Tatsuno

[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...Deep Learning JP

深層生成モデルと世界モデルMasahiro Suzuki

[DL輪読会]Hybrid Reward Architecture for Reinforcement LearningDeep Learning JP

What's hot (20)

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...

[DL輪読会]Temporal Abstraction in NeurIPS2019

【DL輪読会】論文解説：Offline Reinforcement Learning as One Big Sequence Modeling Problem

[DL輪読会]AlphaStarとその関連技術

[DL輪読会]Grandmaster level in StarCraft II using multi-agent reinforcement lear...

POMDP下での強化学習の基礎と応用

強化学習の分散アーキテクチャ変遷

強化学習アルゴリズムPPOの解説と実験

Active Learning 入門

[DL輪読会]Control as Inferenceと発展

[DL輪読会]近年のエネルギーベースモデルの進展

Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2

[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介

Introduction to Prioritized Experience Replay

強化学習における好奇心

強化学習と逆強化学習を組み合わせた模倣学習

猫でも分かるVariational AutoEncoder

[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...

深層生成モデルと世界モデル

[DL輪読会]Hybrid Reward Architecture for Reinforcement Learning

Similar to [DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.

[DL Hacks 実装]Attention is All You NeedDeep Learning JP

[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient InferenceDeep Learning JP

Safe Reinforcement LearningDongmin Lee

[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...Deep Learning JP

[DL輪読会]A Probabilistic U-Net for Segmentation of Ambiguous ImagesDeep Learning JP

輪読会2021TaiseiNinomiya

180326basicknowledgeTakahiro Mizuta

E xact micro 10 photometer v4Ronnie Lewis

Jawsdays2018 180310Daisuke Yoshioka

2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docxfelicidaddinwoodie

[DL輪読会]“Kohn-Sham equations as regularizer building prior knowledge into mach...Deep Learning JP

OpenStack Summit & KubeConからみるコンテナ技術の最新トレンド (更新版) - OpenStack Day Tokyo 2018講演資料VirtualTech Japan Inc.

kintone on EKS ― EKS で実現するインフラ自動構築パイプライン Yusuke Nojima

【CVPR 2019】Learning Cross Modal Embeddings with Adversarial Networks for Cook...cvpaper. challenge

CNC.pdfMahamad Jawhar

[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image SynthesisDeep Learning JP

Theory and Methods for Unsupervised Anomaly Detection in Sounds Based on Deep...Yuma Koizumi

Kaggle Google Quest Q&A Labeling 反省会 LT資料 47th place solutionKen'ichi Matsui

Py con 2018_youngsooksongYoung Sook Song

Scanned by CamScannero u h a v e a tte m p te d th is .docxkenjordan97598

Similar to [DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works. (20)

[DL Hacks 実装]Attention is All You Need

[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference

Safe Reinforcement Learning

[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...

[DL輪読会]A Probabilistic U-Net for Segmentation of Ambiguous Images

輪読会2021

180326basicknowledge

E xact micro 10 photometer v4

Jawsdays2018 180310

2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx

[DL輪読会]“Kohn-Sham equations as regularizer building prior knowledge into mach...

OpenStack Summit & KubeConからみるコンテナ技術の最新トレンド (更新版) - OpenStack Day Tokyo 2018講演資料

kintone on EKS ― EKS で実現するインフラ自動構築パイプライン

【CVPR 2019】Learning Cross Modal Embeddings with Adversarial Networks for Cook...

CNC.pdf

[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis

Theory and Methods for Unsupervised Anomaly Detection in Sounds Based on Deep...

Kaggle Google Quest Q&A Labeling 反省会 LT資料 47th place solution

Py con 2018_youngsooksong

Scanned by CamScannero u h a v e a tte m p te d th is .docx

Recently uploaded

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

2024 April Patch TuesdayIvanti

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5

Data governance with Unity Catalog PresentationKnoldus Inc.

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

Scale your database traffic with Read & Write split using MySQL RouterMydbops

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

Sample pptx for embedding into website for demoHarshalMandlekar2

Manual 508 Accessibility Compliance AuditSkynet Technologies

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Recently uploaded (20)

UiPath Community: Communication Mining from Zero to Hero

Potential of AI (Generative AI) in Business: Learnings and Insights

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Generative Artificial Intelligence: How generative AI works.pdf

Assure Ecommerce and Retail Operations Uptime with ThousandEyes

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

2024 April Patch Tuesday

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...

Data governance with Unity Catalog Presentation

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Generative AI for Technical Writer or Information Developers

TeamStation AI System Report LATAM IT Salaries 2024

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

Scale your database traffic with Read & Write split using MySQL Router

What is DBT - The Ultimate Data Build Tool.pdf

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...

Sample pptx for embedding into website for demo

Manual 508 Accessibility Compliance Audit

The State of Passkeys with FIDO Alliance.pptx

DevEX - reference for building teams, processes, and platforms

[DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.

1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Why Deep RL fails? A brief survey of recent works. Presenter: Kei Ota (@ohtake_i).

2. • • • – – – – – – 2

3. Deep Reinforcement Learning that Matters • – – 3 Deep RL

4. Deadly Triad • – 1G BA A8 D B 1 – g P it RW Pehg – y R hg – s w uM ..Ni P L s ]iw Vg – g df c hg – x ao gMv N [ pi P r Vg – mkl f d ar Vg RnS g R g 4 2B B8K KD A BD K G GB G 8 1 D D B AA A BAB BG A 3 D D D A 0 8 D K A D I 0 A BD A , DA A . 1

5. Deadly Triad • – -O 2 i l – ! "# = 1, ! "' = 2 > o ! – l ) " = * × ! -) "# = *, ) "' = 2* – D )("#) c • . > 1/2 * p • – T l c > l c – > l f > 5

6. Deadly Triad • – t D Qs – O N p • – T 22 - • e d Qo s – n g iB – e d Mr MC B • – 2 Bl r Q • y N giac B • – B QO D 6

7. Deep Reinforcement Learning and the Deadly Triad • • – - 3 1 0 = 80 • – a 8 N M d 83 - , , • – e D8 • 7

8. Deep Reinforcement Learning and the Deadly Triad • . -1 b D 4 D 6 6 1 Q g b , , B A3 : 23A A i , 1 a i g , y osrxi g e lko T Q ( . urt mnlp Q c ) Q rw a i Q g e L- 36 2A:36a a i Oi Ofe g d a a . .D 5 : A F: 3 : , , B A3 : :5 a i g 8

9. Deep Reinforcement Learning and the Deadly Triad – g s : d c b – no Qi 0 e – 0 1 - e • 0 1 - 0 f a sm • r ! = 0.99 c e 1/ 1 − ! = 100 9

10. Deep Reinforcement Learning and the Deadly Triad – T D - – – D 10

11. Deep Reinforcement Learning and the Deadly Triad – b – a D - Q – D - - Q T 11

12. Deep Reinforcement Learning and the Deadly Triad – pMgbac s LyYl [Yok R W Y x – Mijdcfeh ! t EB F H D A You – rmMijdcfeh ! P EB F H D A You s • ] vWn 2 E w 12 0 E .D F A D A E DI BE A B B , DB - D BI A 0 A H E F A A AF E B C D A C -,

13. Deep Reinforcement Learning and the Deadly Triad – - s c n f r – - o d – - e i • e g • 13

14. Deep Reinforcement Learning and the Deadly Triad – R P E dc e g i 2 – g ! ∈ {0,1,2} f – dc g - / 14

15. Deep Reinforcement Learning and the Deadly Triad – - - - D Q R – E M Q DT – P ! E D 15

16. Diagnosing Bottlenecks in Deep Q-learning Algorithms • F 2 . 1 4 B3 3 42 5 4 2 O 2 B3 : : 2 3 16

17. Diagnosing Bottlenecks in Deep Q-learning Algorithms – J MT D – C D – C • C 17

18. Diagnosing Bottlenecks in Deep Q-learning Algorithms – >R i nD la r – >R o D u T - p • 4 6532 f OR ,063 110 e c u – > • T oB - i n • nD O c la r 18

19. Diagnosing Bottlenecks in Deep Q-learning Algorithms – W cv z u ti sm [ fn ( kwyxz u pem [khl – W n ∈ {0.5, 1.0, 2.0, 4.0, 8.0} j bg u [ f k h A=MH OKLLE C drY M= HA AOPM AHHI= -MMKM – W n l ad go ]d go i[ .A P 2PI=M o A=MH OKLLE Cl u 19EHHE=I .A P M=FEO =I= D= M= E D= D )C=MR=H K DP= A CEK 0PCK =MK DAHHA =MG KRH= EHH ,= A A E EOE C .P =IA O=H KB -SLAMEA A ALH= 1 ) EM=H 2PI=M E D= D )C=MR=H ,E = /DK D = AMCA A E A 1ILHE EO P AM L=M=IAOAME =OEK E DE EO =O= ABBE EA O AAL MAE BKM AIA O HA=M E C 1

20. Diagnosing Bottlenecks in Deep Q-learning Algorithms – • - ! : • - - – • • - - 20

21. Diagnosing Bottlenecks in Deep Q-learning Algorithms • – - - - - PQ D E T – M D E – R a D ! Q E 21

22. Revisiting Fundamentals of Experience Replay • – 04 1 ,1 12 – 4 :5 /: 2 K Rg fe nc a – 04 1 01 : l pd d • 1 A 4 1 1 : * : 4= : 2 * . * - • 1 o RiK : 4= : 2 M - P : : 2 KC 22

23. Revisiting Fundamentals of Experience Replay • – 4.no • p t s daf cb – 21 32 . 2 32 . m – m l w i R 2 32 . A 3 4 .2 daf A – R A bd r y gae bd f 233 4 .2 .1 2 3 4 24 1 21 4. -

24. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – - - - I – : 24

25. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – O R – L • O • = / • T 25

26. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – / – • • = O • LR 26

27. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – L M D T – D BC S • S 27

28. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – 28

29. D2RL: Deep Dense Architectures in Reinforcement Learning • – – 29

30. • • – • • – • – • • 30

31. 31 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Why Deep RL fails? A brief survey of recent works. Presenter: Kei Ota.

[DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.

Similar to [DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works. (20)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (20)

[DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.