This document discusses model-based reinforcement learning using neural networks for hierarchical dynamic systems. It proposes using stochastic neural networks to model subsystem dynamics and handle uncertainty. Stochastic differential dynamic programming is also introduced to deal with simulation biases from learned models. Experiments show deep neural networks with differential dynamic programming worked better than other methods for learning a pouring task with a robot.
Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System
1. Model-based Reinforcement Learning
with Neural Networks
on Hierarchical Dynamic System
Akihiko Yamaguchi and Christopher G. Atkeson
Robotics Institute, Carnegie Mellon University http://akihikoy.net/
4. Pouring: A Manipulation of Deformable Object
Planning actions
Planning parameters of actions
= Dynamic Programming (Opt ctrl, MPC, …)
Dynamics are partially unknown
Reinforcement Learning Problem
RL in pouring
Adaptation: not much hard
Generalization: hard
Is Deep NN useful in this problem? (How to use in RL framework?)4
5. Remarks of Reinforcement Learning
Good to think about Model-free RL v.s.
Model-based RL
Successful robot-learning RL is model-free
(direct policy search) [cf. Kober et al. 2013]
Good at fine-tuning, Less computation cost (at
execution)
Robust to PoMDP
Model-based: Simulation biases
Model-based:
1. Generalization ability
2. Sharable / Reusable
3. Capable to reward changes
2 and 3: Thanks to symbolic (hierarchical)
representation
5
input
output
hidden
- u
update
FK ANN
[Magtanong et al. 2012]
6. How to deal with simulation biases?
Do not learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
Parameters F_grasp Grasp result
Parameters F_flow_ctrl Flow ctrl result
Use stochastic models
Gaussian F Gaussian
Stochastic Neural Networks [Yamaguchi, Atkeson, ICRA 2016]
Use stochastic dynamic programming
Stochastic Differential Dynamic Programming
[Yamaguchi, Atkeson, Humanoids 2015]
6 Model-based RL with Neural Networks for Hierarchical Dynamic System
7. Stochastic Neural Networks
Propagation of probability distribution from input to output
Gradients of output expectation w.r.t. an input
Difficulty: Nonlinear activation functions
ReLU (f(x)=max(0,x))
7
Mean
model
Error
model
Input
(shared)
10. Results of Experiments
DNN+DDP was better
than LWR+DDP
Using redundant
features did not affect
the learning
performance
Worked in pouring
with PR2 robot
10
Video: https://youtu.be/aM3hE1J5W98
11. More Information
http://akihikoy.net/
https://www.youtube.com/AkihikoYamaguchi
Akihiko Yamaguchi and Christopher G. Atkeson:
Neural Networks and Differential Dynamic Programming for Reinforcement
Learning Problems, in Proceedings of the 2016 IEEE International Conference on
Robotics and Automation (ICRA2016), Stockholm, Sweden, May, 2016.
https://www.researchgate.net/publication/294729454
Akihiko Yamaguchi and Christopher G. Atkeson:
Differential Dynamic Programming with Temporally Decomposed Dynamics, in
Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots
(Humanoids2015), pp. 696-703, Seoul, 2015.
https://www.researchgate.net/publication/282157952
Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara:
Pouring Skills with Planning and Learning Modeled from Human Demonstrations,
International Journal of Humanoid Robotics, Vol.12, No.3, pp.1550030, July, 2015.
https://www.researchgate.net/publication/280733055
11