SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Model-free Continuous Control
Reinforcement Learning
初谷怜慈
2
⾃⼰紹介
• アメフト -> 東⼤Warriors
• 東京⼤学情報理⼯学系研究科修⼠2年
• DeepX, シニアエンジニア
• 研究 -> 強化学習
– 特に実環境に向けて
• twitter -> @Reiji_Hatsu
• github -> rarilurelo
3
What is reinforcement learning?
Environment Agent
4
What is reinforcement learning?
Environment Agent
state, reward
5
What is reinforcement learning?
Environment Agent
action
6
What is reinforcement learning?
Environment Agent
state, reward
action
Described as MDP or POMDP
7
Formulation
Agent
Environment
𝐴"~𝜋(𝑎|𝑆")
𝑆"*+~𝑃(𝑠.
|𝑆", 𝐴")
𝑟"*+ = 𝑟(𝑆", 𝐴", 𝑆"*+)
8
Formulation
Agent
Environment
𝐴"~𝜋(𝑎|𝑆")
𝑆"*+~𝑃(𝑠.
|𝑆", 𝐴")
𝑟"*+ = 𝑟(𝑆", 𝐴", 𝑆"*+)
Modeling π!
π∗
= argmax
π
Eπ [ γ τ
rτ ]
τ =0
∞
∑Get
Model-free
9
Formulation
Agent
Environment
𝐴"~𝜋(𝑎|𝑆")
𝑆"*+~𝑃(𝑠.
|𝑆", 𝐴")
𝑟"*+ = 𝑟(𝑆", 𝐴", 𝑆"*+)
Modeling π and P!
π∗
= argmax
π
Eπ [ γ τ
rτ ]
τ =0
∞
∑Get
Model-base
10
Example
• DQN
– Learn Q function
– ε-greedy <- policy π!
𝜋 𝑎 𝑠 = 2
𝑎𝑟𝑔𝑚𝑎𝑥6	𝑄 𝑠, 𝑎 	 𝜀 < 𝑢
𝑟𝑎𝑛𝑑𝑜𝑚	 𝑢 < 𝜀
(𝑢~𝑢𝑛𝑖𝑓𝑜𝑟𝑚 0,1 )
11
What is Continuous Control?
Atari invaders
Robot arm
• Stick directions
• Buttons
• Torques
12
What is Continuous Control?
Robot arm
Assume π is gaussian (gaussian
policy)
μ(s)
σ(s
)
action
action is sampled from this distribution
μ(s) and σ(s) is represented by neural
network
13
Overview of reinforcement learningʼ complexity
Action space complexity
Statespacecomplexity
Discrete Continuous
14
Learning methods of continuous policy
• NAF
• Policy gradient
• Value gradient
15
Definition
𝑄C
𝑠, 𝑎 = 𝐸E,C F 𝛾"
𝑟"
"
|𝑠, 𝑎 = 𝐸E 𝑟 + 𝛾𝐸C 𝑄C
(𝑠.
, 𝑎.
)
𝑉C
(𝑠) = 𝐸E,C F 𝛾"
𝑟"
"
|𝑠 = 𝐸E,C 𝑟 + 𝛾𝑉C
(𝑠.
)
𝐴C
s, a = 𝑄C
𝑠, 𝑎 − 𝑉C
(𝑠)
exist s, take a, and then according to π
exist s, and then according to π
true influence of a
(Advantage function)
bellman equation
16
Q-learning
Consider optimal policy 𝜋∗
𝑄C∗
𝑠, 𝑎 = 𝐸E 𝑟 + 𝛾𝐸C∗ 𝑄C∗
(𝑠.
, 𝑎.
)
𝜋∗
𝑎 𝑠 = O
1	(𝑎𝑟𝑔𝑚𝑎𝑥6 𝑄C∗
𝑠, 𝑎 )
0	(𝑜𝑡ℎ𝑒𝑟𝑠)
𝑄C∗
𝑠, 𝑎 = 𝐸E 𝑟 + 𝛾 max
6
𝑄C∗
(𝑠, 𝑎)
minimize (lhs – rhs)**2 (for function approximation)
17
NAF (Normalized Advantage Function)
In DQN, we can get max Q
How to get max Q with continuous action?
18
NAF (Normalized Advantage Function)
𝑄(𝑠, 𝑎) = 𝐴(𝑠, 𝑎) + 𝑉(𝑠)
𝐴 𝑠, 𝑎 = −
1
2
𝑎 − 𝜇 𝑠 𝑃(𝑠)(𝑎 − 𝜇 𝑠 )
Positive definite matrix Conve
x
μ(s)
We can get max Q as V!
0
max 𝑄 𝑠, 𝑎 = 0 + 𝑉(𝑠)
minimize (r+γmaxQ – Q) w.r.t all parameters
19
Learning methods of continuous policy
• NAF
• Policy gradient
• Value gradient
20
Formulation
Agent
Environment
𝐴"~𝜋(𝑎|𝑆")
𝑆"*+~𝑃(𝑠.
|𝑆", 𝐴")
𝑟"*+ = 𝑟(𝑆", 𝐴", 𝑆"*+)
Modeling π!
π∗
= argmax
π
Eπ [ γ τ
rτ ]
τ =0
∞
∑Get
Model-free
21
Policy gradient
more direct approach than Q-learning
𝐽 = 𝐸CX
F 𝛾Y
𝑟Y
Z
Y
𝜋[ 𝑎 𝑠 = 𝒩(𝜇[ 𝑠 , 𝜎[ 𝑠 )
𝛻𝜃 𝐽	is what we want
22
Policy gradient
∇θ J = ∇θ Eπθ
[ γ τ
rτ ]
τ =0
∞
∑
= ∇θ Es0 ~ρ,s'~p πθ at ,st( ) γ τ
rτ
τ =0
∞
∑t=0
∏
⎡
⎣
⎢
⎤
⎦
⎥
= Es0 ~ρ,s'~p ∇θ πθ at ,st( ) γ τ
rτ
τ =0
∞
∑t=0
∏
⎡
⎣
⎢
⎤
⎦
⎥
= Es~ρ πθ at ,st( )
∇θ πθ at ,st( )
t=0
∏
πθ at ,st( )
t=0
∏
γ τ
rτ
τ =0
∞
∑
t=0
∏
⎡
⎣
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
= Es~ρ πθ (at | st ) ∇θ log(πθ (at | st ))
t=0
∑t=0
∏ γ τ
rτ
τ =0
∞
∑
⎡
⎣
⎢
⎤
⎦
⎥
= Eπθ
[ ∇θ log(πθ (at | st ))
t=0
∑ γ τ
rτ
τ =t
∞
∑ ]
Expectation to summation
differentiate w.r.t theta
multiple pi to
nominator and denominator
logarithmic differentiation
Causality
Approximated by MC
23
Intuition
24
Intuition
25
Property of Policy gradient
• unbiased estimate
– stable
• on-policy and high-variance estimate
– need large batch size (or A3C like asynchronous training)
– less sample efficiency
• on-policy or off-policy
– current policy can be updated by only current policyʼs sample (on-policy)
– current policy can be updated by any policyʼs sample (off-policy)
26
High variance
• In policy gradient, we have to estimate ∑ 𝛾 𝜏
𝑅 𝜏
∞
𝜏=0
• Estimation of ∑ 𝛾 𝜏
𝑅 𝜏
∞
𝜏=0 is high variance
– long time sequence
– environmentʼs state transition probability
• There are several methods to reduce variance
27
Actor-critic method
policy gradient evaluate how good π(a_t|s_t) was
That depends on only τ > t (causality)
𝛻[ 𝐽 ≈ 𝐸CX
F 𝛻[ log 𝜋[ 𝑎" 𝑠" 𝑄C
(𝑠", 𝑎")
"
reduce variance, but biased estimate
28
Bias-variance
29
Bias-variance
30
Bias-variance
31
Baseline
𝛻[ 𝐽 ≈ 𝐸CX
F 𝛻[ log 𝜋[ 𝑎" 𝑠" (𝑄C
𝑠", 𝑎" − 𝑏 𝑠" )
"
𝐸CX
𝛻[ log 𝜋[ 𝑎" 𝑠" 𝑏(𝑠") = 𝑏(𝑠")𝛻[ 𝐸C[
𝜋[ = 0
b=V is good choice, because Q and V are correlation!
𝛻[ 𝐽 ≈ 𝐸CX
F 𝛻[ log 𝜋[ 𝑎" 𝑠" (𝐴C
(𝑠", 𝑎"))
"
32
Learning methods of continuous policy
• NAF
• Policy gradient
• Value gradient
33
Value gradient
state transition distribution 𝜌
𝐽 = 𝐸j 𝑄C
(𝑠, 𝑎)
= 𝐸j 𝛻6 𝑄 𝑠, 𝑎 k
6lm n *op n
𝛻[(𝜇 𝑠 + 𝜀𝜎 𝑠 )
𝛻[ 𝐽 = 𝐸j 𝛻[ 𝑄C
(𝑠, 𝑎)
= Ej 𝛻6 𝑄 𝑠, 𝑎 k
6lm n
𝛻[ 𝜇(𝑠) DPG
SVG
34
Similarity of GANs
policy (generator) is updated by
gradient of Q function (Discriminator)
35
Property of Value gradient
• biased estimate
– it depends on function approximation of Q
– less stable
• off-policy and low-variance estimate
– high sample efficiency
36
Recent approaches
• TRPO
• A3C
• Q-Prop
37
TRPO (Trust Region Policy Optimization)
• Problem of Policy gradient (on-policy) method
– large step size may affect policy to be divergence
– if once policy becomes bad, policy is updated by bad samples
• Careful choosing step size
– update should not cause large change
– KL constraint
38
TRPO
𝐿Cstu
𝜋 = 𝐸Cstu
𝜋
𝜋stu
𝐴C
(𝑠, 𝑎) variant of PG
constraint	𝐾𝐿(𝜋stu| 𝜋 < 𝐶
max	𝐿Cstu
𝜋 − 𝜆 ∗ 𝐾𝐿(𝜋stu||𝜋)
lagrange multiplier λ
Make linear approximation to L and quadratic approximation to KL
max 𝑔 𝜃 − 𝜃stu −
𝜆
2
𝜃 − 𝜃stu
y
𝐹	(𝜃 − 𝜃stu) 𝐹 =
𝜕|
𝜕𝜃|
𝐾𝐿
39
TRPO
𝜃 − 𝜃stu =
1
𝜆
𝐹}+
𝑔
Finally, natural gradient is
obtained
Conjugate gradient method, line
search
40
A3C
• Asynchronously Advantage Actor-Critic
• Advantage Actor-Critic (A2C) is variant of policy gradient
• Asynchronously update
– no need to get large batch
– no need experience replay
41
A3C
42
Q-Prop
• On-policy + Off-policy
• Policy gradient + value gradient
Stability and sample efficiency
43
Two main ideas
• First-order Taylor expansion
• Control variate
44
First-order Taylor expansion
45
Value gradient appear
46
Can we compute these?
47
Control variate
48
Adaptive Q-Prop
49
More detail about Q-Prop
• https://www.slideshare.net/ReijiHatsugai/q-prop

Contenu connexe

Tendances

ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...Ajay Kumar
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Chris Ohk
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansWayne Lee
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement LearningEdward Balaban
 
Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12Alex Gilgur
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
 
Lec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingLec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingRonald Teo
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Chris Ohk
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Intrinsically Motivated Reinforcement Learning
Intrinsically Motivated Reinforcement LearningIntrinsically Motivated Reinforcement Learning
Intrinsically Motivated Reinforcement LearningKai Zhang
 

Tendances (19)

ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning
 
Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12Time Series Forecasting Modeling CMG12
Time Series Forecasting Modeling CMG12
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Lec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingLec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fitting
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Intrinsically Motivated Reinforcement Learning
Intrinsically Motivated Reinforcement LearningIntrinsically Motivated Reinforcement Learning
Intrinsically Motivated Reinforcement Learning
 

En vedette

論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」Kaoru Nasuno
 
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured PredictionSotetsu KOYAMADA(小山田創哲)
 
introduction to Dueling network
introduction to Dueling networkintroduction to Dueling network
introduction to Dueling networkWEBFARMER. ltd.
 
強化学習その3
強化学習その3強化学習その3
強化学習その3nishio
 

En vedette (6)

Recent rl
Recent rlRecent rl
Recent rl
 
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
 
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
 
introduction to Dueling network
introduction to Dueling networkintroduction to Dueling network
introduction to Dueling network
 
強化学習その3
強化学習その3強化学習その3
強化学習その3
 
Q prop
Q propQ prop
Q prop
 

Similaire à Continuous control

DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birdsWangyu Han
 
week10_Reinforce.pdf
week10_Reinforce.pdfweek10_Reinforce.pdf
week10_Reinforce.pdfYuChianWu
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptxZhiwuGuo1
 
Sparsenet
SparsenetSparsenet
Sparsenetndronen
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgRonald Teo
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxSeungeon Baek
 
CS294-112 Lec 05
CS294-112 Lec 05CS294-112 Lec 05
CS294-112 Lec 05Gyubin Son
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureNecip Oguz Serbetci
 
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님AI Robotics KR
 

Similaire à Continuous control (20)

DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
week10_Reinforce.pdf
week10_Reinforce.pdfweek10_Reinforce.pdf
week10_Reinforce.pdf
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
RL unit 5 part 1.pdf
RL unit 5 part 1.pdfRL unit 5 part 1.pdf
RL unit 5 part 1.pdf
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 
CS294-112 Lec 05
CS294-112 Lec 05CS294-112 Lec 05
CS294-112 Lec 05
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
 

Dernier

Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 

Dernier (20)

Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 

Continuous control