4. Deadly Triad
•
– 1G BA A8 D B 1
– g P it RW Pehg
– y R hg
– s w uM ..Ni P L s ]iw Vg
– g df c hg
– x ao gMv N [ pi P r Vg
– mkl f d ar Vg RnS g R g
4
2B B8K KD A BD K G GB G 8 1 D D B AA A BAB BG A 3 D D D A 0 8 D K A D I 0 A BD A , DA A . 1
5. Deadly Triad
•
– -O 2
i l
– ! "# = 1, ! "' = 2 > o !
– l ) " = * × ! -) "# = *, ) "' = 2*
– D )("#) c
• . > 1/2 * p
•
– T l c > l c
– > l f >
5
6. Deadly Triad
•
– t D Qs
– O N p
•
– T 22 -
• e d Qo s
– n g iB
– e d Mr MC B
•
– 2 Bl r Q
• y N giac B
•
– B QO D
6
8. Deep Reinforcement Learning and the Deadly Triad
•
. -1 b D 4 D 6 6 1 Q g b
, , B A3 : 23A A i
, 1 a i g
, y osrxi g e lko T Q
( . urt mnlp Q c
) Q rw a i Q g e
L- 36 2A:36a a i Oi Ofe g d
a a . .D 5 : A F: 3 : , , B A3 :
:5 a i g
8
9. Deep Reinforcement Learning and the Deadly Triad
– g s : d c b
– no Qi 0 e
– 0 1 - e
• 0 1 - 0 f a sm
• r ! = 0.99 c e 1/ 1 − ! = 100
9
12. Deep Reinforcement Learning and the Deadly Triad
– pMgbac s LyYl [Yok R W Y x
– Mijdcfeh ! t EB F H D A You
– rmMijdcfeh ! P EB F H D A You s
• ] vWn 2 E w
12
0 E .D F A D A E DI BE A B B , DB - D BI A 0 A H E F A A AF E B C D A C -,
18. Diagnosing Bottlenecks in Deep Q-learning Algorithms
– >R i nD la r
– >R o D u T - p
• 4 6532 f OR ,063 110 e c u
– >
• T oB - i n
• nD O c la r
18
19. Diagnosing Bottlenecks in Deep Q-learning Algorithms
– W cv z u ti sm [
fn ( kwyxz u pem [khl
– W n ∈ {0.5, 1.0, 2.0, 4.0, 8.0} j bg u
[ f k h A=MH OKLLE C drY M= HA AOPM AHHI= -MMKM
– W n l ad go ]d go i[ .A P 2PI=M o
A=MH OKLLE Cl u
19EHHE=I .A P M=FEO =I= D= M= E D= D )C=MR=H K DP= A CEK 0PCK =MK DAHHA =MG KRH= EHH ,= A A E EOE C .P =IA O=H KB -SLAMEA A ALH= 1
) EM=H 2PI=M E D= D )C=MR=H ,E = /DK D = AMCA A E A 1ILHE EO P AM L=M=IAOAME =OEK E DE EO =O= ABBE EA O AAL MAE BKM AIA O HA=M E C 1
22. Revisiting Fundamentals of Experience Replay
•
– 04 1 ,1 12
– 4 :5 /: 2 K Rg fe nc a
– 04 1 01 : l pd d
• 1 A 4 1 1 : * : 4= : 2 * . * -
• 1 o RiK : 4= : 2 M - P : : 2 KC
22
23. Revisiting Fundamentals of Experience Replay
•
– 4.no
• p t s daf cb
–
21 32 . 2 32 . m
– m l w i R
2 32 . A
3 4 .2 daf A
– R A bd r y gae bd f
233 4 .2 .1 2 3 4 24 1 21 4. -