SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Learning To Run
Deep Learning Course
Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi
https://github.com/MultiBeerBandits/learning-to-run
March 31, 2019
Politecnico di Milano
Our Goal
Our Goal
The goal of this project is to replicate the results of Reason8 team
in the NIPS 2017 Learning To Run competition 1.
• Given a human musculoskeletal model and a physics-based
simulation environment
• Develop a controller that runs as fast as possible
1
https://www.crowdai.org/challenges/nips-2017-learning-to-run
1
Background
Reinforcement Learning
Reinforcement Learning (RL) deals with sequential decision making
problems. At each timestep the agent observes the world state,
selects an action and receives a reward.
πs a
Agent
r
∼  (⋅ ∣ s, a)s
′
Goal: Maximize the expected discounted sum of rewards:
Jπ = E
[∑H
t=0 γtr(st, at)
]
.
2
Deep Reinforcement Learning
The policy πθ is encoded in a neural network with weights θ.
s a
Agent
r
(a ∣ s)πθ
∼  (⋅ ∣ s, a)s
′
How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ
(Policy gradient theorem).
3
Learning To Run
Learning To Run
s ∈ ℝ
34
(s)πθ
a ∈ [0, 1]
18
∼  (⋅ ∣ s, a)s
′
• State space represents kinematic quantities of joints and links.
• Actions represents muscles activations.
• Reward is proportional to the speed of the body. A penalization is given
when the pelvis height is below a threshold, and the episode restarts. 4
Deep Deterministic Policy Gradient - DDPG
• State of the art algorithm in Deep Reinforcement Learning.
• Off-policy.
• Actor-critic method.
• Combines in an effective way Deterministic Policy Gradient
(DPG) and Deep Q-Network (DQN).
5
Deep Deterministic Policy Gradient - DDPG
Main characteristics of DDPG:
• Deterministic actor π(s) : S → A.
• Replay Buffer to solve the sample independence problem while
training.
• Separated target networks with soft-updates to improve
convergence stability.
6
DDPG Improvements
We implemented several improvements over vanilla DDPG:
• Parameter noise (with layer normalization) and action noise to
improve exploration.
• State and action flip (data augmentation).
• Relative Positions (feature engineering).
7
DDPG Improvements
Dispatch sampling
jobs
Samples
ready
no
yes
Train
Store in replay
buffer
Dispatch
evaluation job
Evaluation 
ready
no
yes
Display statistics
Time expired
no
yes
Sampling workers
dispatch
Testing workers
dispatch
Replay buffer
dispatch
8
DDPG Improvements
yes
no
yes
Sampling workers Testing workers
Replay buffer
dispatch
Actori
s a
πθi
9
Results
Results - Thread number impact
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
20 Threads
10 Threads
10
Results - Ablation study
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35
Distance(m)
Flip - PN
Flip - No PN
No Flip - PN
No Flip - No PN
0 2 16 18 69 71 74 97
Training time (h)
11
Thank you all!
11
Backup slides
Results - Full state vs Reduced State
0 2 4 6 8 10 12 14
Training step 10 5
-5
0
5
10
15
20
25
30
35Distance(m)
reduced
full
12
Actor-Critic networks
Elu Elu σ
s ∈ ℝ
34
64 64 a ∈ [0, 1]
18
T anh T anh Linear
64 32
a ∈ [0, 1]
18
s ∈ ℝ
34
Actor Critic
1
13

Contenu connexe

Tendances

0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPGHye-min Ahn
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Uday Haral
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learningJie-Han Chen
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesSungjoon Choi
 

Tendances (9)

0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Lecture 7.3 bt
Lecture 7.3 btLecture 7.3 bt
Lecture 7.3 bt
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
Random Keys Genetic Alogrithims Applied to Conflicting Objectives for Optimiz...
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 

Similaire à Learning To Run

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birdsWangyu Han
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarizationaftab alam
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewYoonho Na
 
Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Fernando Constantino
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human labelKai-Wen Zhao
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoDEVCON
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSPreferred Networks
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 

Similaire à Learning To Run (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarization
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
 
Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Imitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCSImitation Learning for Autonomous Driving in TORCS
Imitation Learning for Autonomous Driving in TORCS
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 

Dernier

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 

Dernier (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Learning To Run

  • 1. Learning To Run Deep Learning Course Emanuele Ghelfi Leonardo Arcari Emiliano Gagliardi https://github.com/MultiBeerBandits/learning-to-run March 31, 2019 Politecnico di Milano
  • 3. Our Goal The goal of this project is to replicate the results of Reason8 team in the NIPS 2017 Learning To Run competition 1. • Given a human musculoskeletal model and a physics-based simulation environment • Develop a controller that runs as fast as possible 1 https://www.crowdai.org/challenges/nips-2017-learning-to-run 1
  • 5. Reinforcement Learning Reinforcement Learning (RL) deals with sequential decision making problems. At each timestep the agent observes the world state, selects an action and receives a reward. πs a Agent r ∼  (⋅ ∣ s, a)s ′ Goal: Maximize the expected discounted sum of rewards: Jπ = E [∑H t=0 γtr(st, at) ] . 2
  • 6. Deep Reinforcement Learning The policy πθ is encoded in a neural network with weights θ. s a Agent r (a ∣ s)πθ ∼  (⋅ ∣ s, a)s ′ How? Gradient ascent over policy parameters: θ′ = θ + η∇θJπ (Policy gradient theorem). 3
  • 8. Learning To Run s ∈ ℝ 34 (s)πθ a ∈ [0, 1] 18 ∼  (⋅ ∣ s, a)s ′ • State space represents kinematic quantities of joints and links. • Actions represents muscles activations. • Reward is proportional to the speed of the body. A penalization is given when the pelvis height is below a threshold, and the episode restarts. 4
  • 9. Deep Deterministic Policy Gradient - DDPG • State of the art algorithm in Deep Reinforcement Learning. • Off-policy. • Actor-critic method. • Combines in an effective way Deterministic Policy Gradient (DPG) and Deep Q-Network (DQN). 5
  • 10. Deep Deterministic Policy Gradient - DDPG Main characteristics of DDPG: • Deterministic actor π(s) : S → A. • Replay Buffer to solve the sample independence problem while training. • Separated target networks with soft-updates to improve convergence stability. 6
  • 11. DDPG Improvements We implemented several improvements over vanilla DDPG: • Parameter noise (with layer normalization) and action noise to improve exploration. • State and action flip (data augmentation). • Relative Positions (feature engineering). 7
  • 12. DDPG Improvements Dispatch sampling jobs Samples ready no yes Train Store in replay buffer Dispatch evaluation job Evaluation  ready no yes Display statistics Time expired no yes Sampling workers dispatch Testing workers dispatch Replay buffer dispatch 8
  • 13. DDPG Improvements yes no yes Sampling workers Testing workers Replay buffer dispatch Actori s a πθi 9
  • 15. Results - Thread number impact 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) 20 Threads 10 Threads 10
  • 16. Results - Ablation study 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35 Distance(m) Flip - PN Flip - No PN No Flip - PN No Flip - No PN 0 2 16 18 69 71 74 97 Training time (h) 11
  • 19. Results - Full state vs Reduced State 0 2 4 6 8 10 12 14 Training step 10 5 -5 0 5 10 15 20 25 30 35Distance(m) reduced full 12
  • 20. Actor-Critic networks Elu Elu σ s ∈ ℝ 34 64 64 a ∈ [0, 1] 18 T anh T anh Linear 64 32 a ∈ [0, 1] 18 s ∈ ℝ 34 Actor Critic 1 13