Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Why Deep RL fails? A brief survey of recent works.
Presenter: Kei O...
•
•
•
–
–
–
–
–
–
2
Deep Reinforcement Learning that Matters
•
–
–
3
Deep RL
Deadly Triad
•
– 1G BA A8 D B 1
– g P it RW Pehg
– y R hg
– s w uM ..Ni P L s ]iw Vg
– g df c hg
– x ao gMv N [ pi P r Vg
...
Deadly Triad
•
– -O 2
i l
– ! "# = 1, ! "' = 2 > o !
– l ) " = * × ! -) "# = *, ) "' = 2*
– D )("#) c
• . > 1/2 * p
•
– T ...
Deadly Triad
•
– t D Qs
– O N p
•
– T 22 -
• e d Qo s
– n g iB
– e d Mr MC B
•
– 2 Bl r Q
• y N giac B
•
– B QO D
6
Deep Reinforcement Learning and the Deadly Triad
•
•
– - 3 1 0 = 80
•
– a 8 N M d 83 - , ,
•
– e D8
•
7
Deep Reinforcement Learning and the Deadly Triad
•
. -1 b D 4 D 6 6 1 Q g b
, , B A3 : 23A A i
, 1 a i g
, y osrxi g e lko...
Deep Reinforcement Learning and the Deadly Triad
– g s : d c b
– no Qi 0 e
– 0 1 - e
• 0 1 - 0 f a sm
• r ! = 0.99 c e 1/ ...
Deep Reinforcement Learning and the Deadly Triad
– T
D -
–
– D
10
Deep Reinforcement Learning and the Deadly Triad
– b
– a D - Q
– D - - Q T
11
Deep Reinforcement Learning and the Deadly Triad
– pMgbac s LyYl [Yok R W Y x
– Mijdcfeh ! t EB F H D A You
– rmMijdcfeh !...
Deep Reinforcement Learning and the Deadly Triad
– - s c n f r
– - o d
– - e i
• e g
•
13
Deep Reinforcement Learning and the Deadly Triad
– R P E dc e g i 2
– g ! ∈ {0,1,2} f
– dc g - /
14
Deep Reinforcement Learning and the Deadly Triad
– - - - D Q R
– E M Q DT
– P ! E D
15
Diagnosing Bottlenecks in Deep Q-learning Algorithms
•
F 2 . 1 4 B3
3 42
5 4
2 O 2 B3
: : 2 3
16
Diagnosing Bottlenecks in Deep Q-learning Algorithms
– J MT D
– C D
– C
• C
17
Diagnosing Bottlenecks in Deep Q-learning Algorithms
– >R i nD la r
– >R o D u T - p
• 4 6532 f OR ,063 110 e c u
– >
• T ...
Diagnosing Bottlenecks in Deep Q-learning Algorithms
– W cv z u ti sm [
fn ( kwyxz u pem [khl
– W n ∈ {0.5, 1.0, 2.0, 4.0,...
Diagnosing Bottlenecks in Deep Q-learning Algorithms
–
• - ! :
• - -
–
•
• - -
20
Diagnosing Bottlenecks in Deep Q-learning Algorithms
•
– - - -
- PQ D E T
– M D E
– R a D ! Q E
21
Revisiting Fundamentals of Experience Replay
•
– 04 1 ,1 12
– 4 :5 /: 2 K Rg fe nc a
– 04 1 01 : l pd d
• 1 A 4 1 1 : * : ...
Revisiting Fundamentals of Experience Replay
•
– 4.no
• p t s daf cb
–
21 32 . 2 32 . m
– m l w i R
2 32 . A
3 4 .2 daf A
...
Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– - - - I
– :
24
Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– O R
– L
• O
• = /
• T
25
Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– /
–
•
• = O
• LR
26
Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
– L M D T
– D BC S
• S
27
Implicit under-parameterization inhibits
data-efficient deep reinforcement learning
•
–
28
D2RL: Deep Dense Architectures in Reinforcement Learning
•
–
–
29
•
•
–
•
•
–
•
–
•
•
30
31
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Why Deep RL fails? A brief survey of recent works.
Presenter: Kei ...
Prochain SlideShare
Chargement dans…5
×

[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.

ublished on Jan 15, 2021

2021/01/15
Deep Learning JP:
http://deeplearning.jp/seminar-2/

  • Identifiez-vous pour voir les commentaires

[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.

  1. 1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Why Deep RL fails? A brief survey of recent works. Presenter: Kei Ota (@ohtake_i).
  2. 2. • • • – – – – – – 2
  3. 3. Deep Reinforcement Learning that Matters • – – 3 Deep RL
  4. 4. Deadly Triad • – 1G BA A8 D B 1 – g P it RW Pehg – y R hg – s w uM ..Ni P L s ]iw Vg – g df c hg – x ao gMv N [ pi P r Vg – mkl f d ar Vg RnS g R g 4 2B B8K KD A BD K G GB G 8 1 D D B AA A BAB BG A 3 D D D A 0 8 D K A D I 0 A BD A , DA A . 1
  5. 5. Deadly Triad • – -O 2 i l – ! "# = 1, ! "' = 2 > o ! – l ) " = * × ! -) "# = *, ) "' = 2* – D )("#) c • . > 1/2 * p • – T l c > l c – > l f > 5
  6. 6. Deadly Triad • – t D Qs – O N p • – T 22 - • e d Qo s – n g iB – e d Mr MC B • – 2 Bl r Q • y N giac B • – B QO D 6
  7. 7. Deep Reinforcement Learning and the Deadly Triad • • – - 3 1 0 = 80 • – a 8 N M d 83 - , , • – e D8 • 7
  8. 8. Deep Reinforcement Learning and the Deadly Triad • . -1 b D 4 D 6 6 1 Q g b , , B A3 : 23A A i , 1 a i g , y osrxi g e lko T Q ( . urt mnlp Q c ) Q rw a i Q g e L- 36 2A:36a a i Oi Ofe g d a a . .D 5 : A F: 3 : , , B A3 : :5 a i g 8
  9. 9. Deep Reinforcement Learning and the Deadly Triad – g s : d c b – no Qi 0 e – 0 1 - e • 0 1 - 0 f a sm • r ! = 0.99 c e 1/ 1 − ! = 100 9
  10. 10. Deep Reinforcement Learning and the Deadly Triad – T D - – – D 10
  11. 11. Deep Reinforcement Learning and the Deadly Triad – b – a D - Q – D - - Q T 11
  12. 12. Deep Reinforcement Learning and the Deadly Triad – pMgbac s LyYl [Yok R W Y x – Mijdcfeh ! t EB F H D A You – rmMijdcfeh ! P EB F H D A You s • ] vWn 2 E w 12 0 E .D F A D A E DI BE A B B , DB - D BI A 0 A H E F A A AF E B C D A C -,
  13. 13. Deep Reinforcement Learning and the Deadly Triad – - s c n f r – - o d – - e i • e g • 13
  14. 14. Deep Reinforcement Learning and the Deadly Triad – R P E dc e g i 2 – g ! ∈ {0,1,2} f – dc g - / 14
  15. 15. Deep Reinforcement Learning and the Deadly Triad – - - - D Q R – E M Q DT – P ! E D 15
  16. 16. Diagnosing Bottlenecks in Deep Q-learning Algorithms • F 2 . 1 4 B3 3 42 5 4 2 O 2 B3 : : 2 3 16
  17. 17. Diagnosing Bottlenecks in Deep Q-learning Algorithms – J MT D – C D – C • C 17
  18. 18. Diagnosing Bottlenecks in Deep Q-learning Algorithms – >R i nD la r – >R o D u T - p • 4 6532 f OR ,063 110 e c u – > • T oB - i n • nD O c la r 18
  19. 19. Diagnosing Bottlenecks in Deep Q-learning Algorithms – W cv z u ti sm [ fn ( kwyxz u pem [khl – W n ∈ {0.5, 1.0, 2.0, 4.0, 8.0} j bg u [ f k h A=MH OKLLE C drY M= HA AOPM AHHI= -MMKM – W n l ad go ]d go i[ .A P 2PI=M o A=MH OKLLE Cl u 19EHHE=I .A P M=FEO =I= D= M= E D= D )C=MR=H K DP= A CEK 0PCK =MK DAHHA =MG KRH= EHH ,= A A E EOE C .P =IA O=H KB -SLAMEA A ALH= 1 ) EM=H 2PI=M E D= D )C=MR=H ,E = /DK D = AMCA A E A 1ILHE EO P AM L=M=IAOAME =OEK E DE EO =O= ABBE EA O AAL MAE BKM AIA O HA=M E C 1
  20. 20. Diagnosing Bottlenecks in Deep Q-learning Algorithms – • - ! : • - - – • • - - 20
  21. 21. Diagnosing Bottlenecks in Deep Q-learning Algorithms • – - - - - PQ D E T – M D E – R a D ! Q E 21
  22. 22. Revisiting Fundamentals of Experience Replay • – 04 1 ,1 12 – 4 :5 /: 2 K Rg fe nc a – 04 1 01 : l pd d • 1 A 4 1 1 : * : 4= : 2 * . * - • 1 o RiK : 4= : 2 M - P : : 2 KC 22
  23. 23. Revisiting Fundamentals of Experience Replay • – 4.no • p t s daf cb – 21 32 . 2 32 . m – m l w i R 2 32 . A 3 4 .2 daf A – R A bd r y gae bd f 233 4 .2 .1 2 3 4 24 1 21 4. -
  24. 24. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – - - - I – : 24
  25. 25. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – O R – L • O • = / • T 25
  26. 26. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – / – • • = O • LR 26
  27. 27. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – L M D T – D BC S • S 27
  28. 28. Implicit under-parameterization inhibits data-efficient deep reinforcement learning • – 28
  29. 29. D2RL: Deep Dense Architectures in Reinforcement Learning • – – 29
  30. 30. • • – • • – • – • • 30
  31. 31. 31 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Why Deep RL fails? A brief survey of recent works. Presenter: Kei Ota.

×