SlideShare a Scribd company logo
1 of 11
Download to read offline
Model-based Reinforcement Learning
with Neural Networks
on Hierarchical Dynamic System
Akihiko Yamaguchi and Christopher G. Atkeson
Robotics Institute, Carnegie Mellon University http://akihikoy.net/
http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
edtech-canderson/files/2013/01/
heinz-ketchup-old-bottle.jpg
http://old.post-gazette.com/images2/
20021213hosqueeze_230.jpg
http://img.diytrade.com/cdimg/1352823/17809917/
0/1292834033/shampoo_bottle_bodywash_bottle.jpg
http://www.nescafe.com/
upload/golden_roast_f_711.png
My pizza demonstration https://youtu.be/Wgj32blPGiE
https://youtu.be/GjwfbOur3CQ
Pouring: A Manipulation of Deformable Object
Planning actions
Planning parameters of actions
= Dynamic Programming (Opt ctrl, MPC, …)
Dynamics are partially unknown
 Reinforcement Learning Problem
RL in pouring
Adaptation: not much hard
Generalization: hard
Is Deep NN useful in this problem? (How to use in RL framework?)4
Remarks of Reinforcement Learning
Good to think about Model-free RL v.s.
Model-based RL
Successful robot-learning RL is model-free
(direct policy search) [cf. Kober et al. 2013]
Good at fine-tuning, Less computation cost (at
execution)
Robust to PoMDP
Model-based: Simulation biases
Model-based:
1. Generalization ability
2. Sharable / Reusable
3. Capable to reward changes
2 and 3: Thanks to symbolic (hierarchical)
representation
5
input
output
hidden
- u
update
FK ANN
[Magtanong et al. 2012]
How to deal with simulation biases?
Do not learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
Parameters  F_grasp  Grasp result
Parameters  F_flow_ctrl  Flow ctrl result
Use stochastic models
Gaussian  F  Gaussian
Stochastic Neural Networks [Yamaguchi, Atkeson, ICRA 2016]
Use stochastic dynamic programming
Stochastic Differential Dynamic Programming
[Yamaguchi, Atkeson, Humanoids 2015]
6 Model-based RL with Neural Networks for Hierarchical Dynamic System
Stochastic Neural Networks
Propagation of probability distribution from input to output
Gradients of output expectation w.r.t. an input
Difficulty: Nonlinear activation functions
ReLU (f(x)=max(0,x))
7
Mean
model
Error
model
Input
(shared)
Use Case
8 Independent neural networks for each (sub)dynamical system
Stochastic Differential Dynamic Programming
9
Results of Experiments
DNN+DDP was better
than LWR+DDP
Using redundant
features did not affect
the learning
performance
Worked in pouring
with PR2 robot
10
Video: https://youtu.be/aM3hE1J5W98
More Information
http://akihikoy.net/
https://www.youtube.com/AkihikoYamaguchi
Akihiko Yamaguchi and Christopher G. Atkeson:
Neural Networks and Differential Dynamic Programming for Reinforcement
Learning Problems, in Proceedings of the 2016 IEEE International Conference on
Robotics and Automation (ICRA2016), Stockholm, Sweden, May, 2016.
https://www.researchgate.net/publication/294729454
Akihiko Yamaguchi and Christopher G. Atkeson:
Differential Dynamic Programming with Temporally Decomposed Dynamics, in
Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots
(Humanoids2015), pp. 696-703, Seoul, 2015.
https://www.researchgate.net/publication/282157952
Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara:
Pouring Skills with Planning and Learning Modeled from Human Demonstrations,
International Journal of Humanoid Robotics, Vol.12, No.3, pp.1550030, July, 2015.
https://www.researchgate.net/publication/280733055
11

More Related Content

Similar to Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
KammetaJoshna
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
ijsrd.com
 
resume_Yuli_Liang
resume_Yuli_Liangresume_Yuli_Liang
resume_Yuli_Liang
Yuli Liang
 

Similar to Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System (20)

IRJET- Automated Attendance System using Face Recognition
IRJET-  	  Automated Attendance System using Face RecognitionIRJET-  	  Automated Attendance System using Face Recognition
IRJET- Automated Attendance System using Face Recognition
 
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Data driven model optimization [autosaved]
Data driven model optimization [autosaved]
 
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Crocodile Physics
Crocodile PhysicsCrocodile Physics
Crocodile Physics
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
final ppt
final pptfinal ppt
final ppt
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
40120140507006
4012014050700640120140507006
40120140507006
 
40120140507006
4012014050700640120140507006
40120140507006
 
Automated LiveMigration of VMs
Automated LiveMigration of VMs Automated LiveMigration of VMs
Automated LiveMigration of VMs
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
 
resume_Yuli_Liang
resume_Yuli_Liangresume_Yuli_Liang
resume_Yuli_Liang
 
IRJET- Prediction of Anomalous Activities in a Video
IRJET-  	  Prediction of Anomalous Activities in a VideoIRJET-  	  Prediction of Anomalous Activities in a Video
IRJET- Prediction of Anomalous Activities in a Video
 
HYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMS
HYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMSHYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMS
HYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMS
 
Survey on Artificial Neural Network Learning Technique Algorithms
Survey on Artificial Neural Network Learning Technique AlgorithmsSurvey on Artificial Neural Network Learning Technique Algorithms
Survey on Artificial Neural Network Learning Technique Algorithms
 
Learning of robot navigation tasks by
Learning of robot navigation tasks byLearning of robot navigation tasks by
Learning of robot navigation tasks by
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System