Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Deep IRL by C language

1 678 vues

Publié le

This slids are Inverse Reinforcement Learning Experiment for 16 Gridword by C language.

Publié dans : Données & analyses
  • Soyez le premier à commenter

Deep IRL by C language

  1. 1. Maximum Entrophy Deep Inverse Reinforcement Learning mabonki0725 ()1 January 3, 2018
  2. 2. IRL(Inverse Reinforcement Learning) Network Figure: 2 IRS image 2 / 9
  3. 3. IRL τ cτ(θ) pθ(τ) = 1 Z(θ) exp (−cτ (θ)) rτ (θ) = −cτ (θ) rτ (θ) = 1 N N t=0 rst,at (θ) τ = s1, s2, · · · , sN a1, a2, · · · , aN cτ (θ) = −rτ (θ) st t s at t a τ 3 / 9
  4. 4. IRL τ cτ (θ) rτ (θ) = −cτ (θ) pθ(τ) = 1 Z(θ) exp (−cτ (θ)) = 1 Z(θ) exp(rτ (θ)) Figure: x=0 =1 x= =0 4 / 9
  5. 5. IRL L(θ) = log p(τ(s, a)|c) = log 1 Z(θ) exp   1 N N s,a rs,a(θ)   = 1 N N s,a rs,a(θ) − log Z(θ) ! θ ∂L(θ) ∂θ =   1 N s,a ∂ ∂θ rs,a(θ)   − ∂ ∂θ log Z(θ) log Z(θ) log Z(θ) = θ 1 N log s,a exp rs,a(θ)p(θ) = θ 1 N s,a rs,a(θ)p(θ) # ∂ ∂θ log Z(θ) = θ 1 N s,a ∂ ∂θ rs,a(θ)p(θ) = Eθ 1 N s,a ∂ ∂θ rs,a(θ) $ 5 / 9
  6. 6. IRL rs,a(θ) = θT f(s, a) θ f(s, a) ∂ ∂θ rs,a(θi) = f(s, a) % ∂L(θ) ∂θ =   1 N s,a f(s, a)   −   1 N s,a Eθf(s, a)   = Es,af(s, a) − Es,a[ ˆf(s, a)] ˆf(s, a) = Eθf(s, a) ' θ 6 / 9
  7. 7. Open-AI Gridword.py Gridword Gridword.py Entrophy 16 16 Figure: 16Cell GridWorld Puseudo Rewards BY SGD 7 / 9
  8. 8. 5/, 5/, reward+ = α ∗ ∂L ∂θ Figure: 16Cell GridWorld Puseud Rewards BY SGD 8 / 9
  9. 9. Deep Neural Net Network.grad = ∂L ∂θ Figure: 16Cell GridWorld Puseud Rewards BY DNN 9 / 9

×