9. 一本目: SimPLe
• 書誌情報:
• タイトル: Model Based Reinforcement Learning for Atari
• 著者: Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H
Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey
Levine, Ryan Sepassi, George Tucker, Henryk Michalewski
• グループ: GoogleBrain等
• 投稿先: arixv2019, ICML2019 under review
• 概要:
• Atariの多くのゲームでRainbowよりも高性能かつサンプル効率性2~10倍を達成
するモデルベースRLを提案
36. Improved Dynamics Model: 概要
• 書誌情報:
• タイトル: Learning Improved Dynamics Model in Reinforcement Learning by
Incorporating the Long Term Future
• 著者: Ke, N. R., Singh, A., Touati, A., Goyal, A., Bengio, Y., Parikh, D., and Batra, D.
• グループ: モントリオール大学,Facebook等
• 投稿先: ICLR2019
• 概要:
• RAR型の環境モデルに確率的な潜在変数を取り入れる.
• その際補助タスクを取り入れることで潜在変数が未来の情報を保持しやすくし,長期の
予測を可能にする
• Imitation LearningとRLに有効
73. References
• Marc Deisenroth and Carl E Rasmussen. Pilco: A model-based and data-efficient approach to policy search. ICML2011
• Levine, Sergey and Abbeel, Pieter. Learning neural network policies with guided policy search under unknown dynamics. NIPS2014
• Justin Bayer and Christian Osendorfer. Learning stochastic recurrent networks. arXiv2014
• Rahul G Krishnan, Uri Shalit, and David Sontag. Deep kalman filters. arXiv2015.
• Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. Embed to control: A locally linear latent dynamics model for control from raw images.
NIPS2015
• Chung, Junyoung, Kastner, Kyle, Dinh, Laurent, Goel, Kratarth, Courville, Aaron C, and Bengio, Yoshua. A recurrent latent variable model for sequential
data. NIPS2015
• Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. Action-conditional video prediction using deep networks in atari games.
NIPS2015
• Finn, C., Tan, X. Y., Duan, Y., Darrell, T., Levine, S., and Abbeel, P. Deep spatial autoencoders for visuomotor learning. ICRA2016
• Fraccaro, M., Sønderby, S. K., Paquet, U., and Winther, O. Sequential neural models with stochastic layers. NIPS2016
• Chelsea Finn, Ian Goodfellow, and Sergey Levine. Unsupervised learning for physical interaction through video prediction. NIPS2016
• Oh, J., Singh, S., and Lee, H. Value prediction network. NIPS2017
• Hsu, W.-N., Zhang, Y., and Glass, J. Unsupervised learning of disentangled and interpretable representations from sequential data. NIPS2017
• Anirudh ALIAS PARTH Goyal, Alessandro Sordoni, Marc-Alexandre Coˆte ́, Nan Ke, and Yoshua Bengio. Z-forcing: Training stochastic recurrent networks.
NIPS2017
74. References
• Weber, T., Racanière, S., Reichert, D. P., Buesing, L., Guez, A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., et al. Imagination-augmented agents for deep
reinforcement learning. NIPS2017
• van den Oord, A., Vinyals, O., and Kavukcuoglu, K. Neural discrete representation learning. NIPS2017.
• Silvia Chiappa, Se ́bastien Racaniere, Daan Wierstra, and Shakir Mohamed. Recurrent environment simulators. ICLR2017
• Karl, M., Soelch, M., Bayer, J., and van der Smagt, P. Deep variational bayes filters: Unsupervised learning of state space models from raw data. ICLR2017
• Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R. H., and Levine, S. Stochastic variational video prediction. ICLR2018
• David Ha, Jurgen Schmidhuber. Recurrent World Models Facilitate Policy Evaluation. NIPS2018
• Yingzhen Li, Stephan Mandt. Disentangled Sequential Autoencoder. ICML2018
• Denton, E. and Fergus, R. Stochastic video generation with a learned prior. ICML2018
• Kaiser, L. and Bengio, S. Discrete autoencoders for sequence models. arxiv2018
• Lars Buesing, Theophane Weber, Sebastien Racaniere, SM Eslami, Danilo Rezende, David P Re- ichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis, et al.
Learning and querying fast generative models for reinforcement learning. arxiv2018
• Gregor, K. and Besse, F. Temporal difference variational auto-encoder. ICLR2019
• Ke, N. R., Singh, A., Touati, A., Goyal, A., Bengio, Y., Parikh, D., and Batra, D. Learning Improved Dynamics Model in Reinforcement Learning by Incorporating the
Long Term Future. ICLR2019
• Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., and Levine, S. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arxiv2018
• Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson. Learning Latent Dynamics for Planning from Pixels. arxiv2019
• Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R.H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., Levine, S., Others: Model-Based Reinforcement
Learning for Atari. arxiv2019
• Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine. SOLAR: Deep Structured Representations for Model-Based
Reinforcement Learning. arxiv2019
Marc Deisenroth and Carl E Rasmussen. Pilco: A model-based and data-efficient approach to policy search. ICML2011
Levine, Sergey and Abbeel, Pieter. Learning neural network policies with guided policy search under unknown dynamics. NIPS2014
Justin Bayer and Christian Osendorfer. Learning stochastic recurrent networks. arXiv2014
Rahul G Krishnan, Uri Shalit, and David Sontag. Deep kalman filters. arXiv2015.
Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. Embed to control: A locally linear latent dynamics model for control from raw images. NIPS2015
Chung, Junyoung, Kastner, Kyle, Dinh, Laurent, Goel, Kratarth, Courville, Aaron C, and Bengio, Yoshua. A recurrent latent variable model for sequential data. NIPS2015
Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. Action-conditional video prediction using deep networks in atari games. NIPS2015
Finn, C., Tan, X. Y., Duan, Y., Darrell, T., Levine, S., and Abbeel, P. Deep spatial autoencoders for visuomotor learning. ICRA2016
Fraccaro, M., Sønderby, S. K., Paquet, U., and Winther, O. Sequential neural models with stochastic layers. NIPS2016
Chelsea Finn, Ian Goodfellow, and Sergey Levine. Unsupervised learning for physical interaction through video prediction. NIPS2016
Oh, J., Singh, S., and Lee, H. Value prediction network. NIPS2017
Hsu, W.-N., Zhang, Y., and Glass, J. Unsupervised learning of disentangled and interpretable representations from sequential data. NIPS2017
Anirudh ALIAS PARTH Goyal, Alessandro Sordoni, Marc-Alexandre Coˆte ́, Nan Ke, and Yoshua Bengio. Z-forcing: Training stochastic recurrent networks. NIPS2017
Weber, T., Racanière, S., Reichert, D. P., Buesing, L., Guez, A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., et al. Imagination-augmented agents for deep reinforcement learning. NIPS2017
van den Oord, A., Vinyals, O., and Kavukcuoglu, K. Neural discrete representation learning. NIPS2017.
Silvia Chiappa, Se ́bastien Racaniere, Daan Wierstra, and Shakir Mohamed. Recurrent environment simulators. ICLR2017
Karl, M., Soelch, M., Bayer, J., and van der Smagt, P. Deep variational bayes filters: Unsupervised learning of state space models from raw data. ICLR2017
Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R. H., and Levine, S. Stochastic variational video prediction. ICLR2018
David Ha, Jurgen Schmidhuber. Recurrent World Models Facilitate Policy Evaluation. NIPS2018
Yingzhen Li, Stephan Mandt. Disentangled Sequential Autoencoder. ICML2018
Denton, E. and Fergus, R. Stochastic video generation with a learned prior. ICML2018
Kaiser, L. and Bengio, S. Discrete autoencoders for sequence models. arxiv2018
Lars Buesing, Theophane Weber, Sebastien Racaniere, SM Eslami, Danilo Rezende, David P Re- ichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis, et al. Learning and querying fast generative models for reinforcement learning. arxiv2018
Gregor, K. and Besse, F. Temporal difference variational auto-encoder. ICLR2019
Ke, N. R., Singh, A., Touati, A., Goyal, A., Bengio, Y., Parikh, D., and Batra, D. Learning Improved Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future. ICLR2019
Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., and Levine, S. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arxiv2018
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson. Learning Latent Dynamics for Planning from Pixels. arxiv2019
Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R.H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., Levine, S., Others: Model-Based Reinforcement Learning for Atari. arxiv2019
Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning. arxiv2019