42. 主に紹介している文献
[1] Recurrent Experience Replay in Distributed Reinforcement Learning, ICLR2019 submitted
https://openreview.net/forum?id=r1lyTjAqYX
[2] Volodymyr Mnih, et al., Asynchronous methods for deep reinforcement learning. In International
conference on machine learning, pp. 1928–1937, 2016.
https://arxiv.org/abs/1602.01783
[3] Matthew Hausknecht and Peter Stone. Deep recurrent Q-learning for partially observable MDPs.
CoRR, abs/1507.06527, 7(1), 2015.
https://arxiv.org/abs/1507.06527
[4] Dan Horgan, et al., Distributed prioritized experience replay. ICLR2018.
https://arxiv.org/abs/1803.00933
[5] Lasse Espeholt, et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner
architectures. arXiv preprint arXiv:1802.01561, 2018.
https://arxiv.org/abs/1802.01561
43. その他参考文献
[6] Tom Schaul, et al. Prioritized experience replay. In International Conference on Learning
Representations, 2016.
https://arxiv.org/abs/1511.05952
[7] Hado van Hasselt, et al. Deep reinforcement learning with double Q-learning. In Advances in Neural
Information Processing Systems, 2016.
https://arxiv.org/abs/1509.06461
[8] Ziyu Wang, et al. Dueling network architectures for deep reinforcement learning. In International
Conference on Machine Learning, 2016.
https://arxiv.org/abs/1511.06581
[9] Richard Sutton and Andrew Barto. Reinforcement learning: An introduction. MIT press Cambridge, 1998.
[10] Fortunato, et al. 2017. Noisy networks for exploration. CoRR abs/1706.10295.
https://arxiv.org/abs/1706.10295
[11] Bellemare, M., et al. Unifying count-based exploration and intrinsic motivation. In Advances in Neural
Information Processing Systems 2016 (pp. 1471-1479).
https://arxiv.org/abs/1606.01868
[12] Hester, T., et al. Deep Q-learning from Demonstrations. arXiv preprint arXiv:1704.03732.
https://arxiv.org/abs/1704.03732
44. その他参考文献
[13] Yusuf Aytar, et al. Playing hard exploration games by watching YouTube. arXiv preprint arXiv:1805.11592.
https://arxiv.org/abs/1805.11592
[14] Learning Montezuma’s Revenge from a Single Demonstration. OpenAI Blog.
https://blog.openai.com/learning-montezumas-revenge-from-a-single-demonstration/
[15] David Ha and Jürgen Schmidhuber, World Models. arXiv preprint arXiv:1803.10122.
https://arxiv.org/abs/1803.10122
[16] The Laplacian in RL: Learning Representations with Efficient Approximations, ICLR2019 submitted
https://openreview.net/forum?id=HJlNpoA5YQ
[17] Jack Harmer, et al., Imitation Learning with Concurrent Actions in 3D Games, arXiv preprint
arXiv:1803.05402.
https://arxiv.org/abs/1803.05402
[18] Teh, Y., et al. Distral: Robust multitask reinforcement learning. In Advances in Neural Information
Processing Systems 2017 (pp. 4496-4506).
https://arxiv.org/abs/1707.04175
[19] Hassel, Matteo, et al. Multi-task deep reinforcement learning with popart. arXiv preprint arXiv:1809.04474.
https://arxiv.org/abs/1809.04474
45. その他参考文献
[20] Nair, Arun, et al. Massively parallel methods for deep reinforcement learning. arXiv preprint
arXiv:1507.04296.
https://arxiv.org/abs/1507.04296
[21] Babaeizadeh, Mohammad, et al. "Reinforcement learning through asynchronous advantage actor-critic
on a gpu." arXiv preprint arXiv:1611.06256.
https://arxiv.org/abs/1611.06256
[22] Weaver Lex and Tao Nigel. The optimal reward baseline for gradient-based reinforcement learning.
In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence.
Morgan Kaufmann Publishers Inc., 2001. p. 538-545.
https://arxiv.org/abs/1301.2315
[23] Munos, Rémi, et al. Safe and efficient off-policy reinforcement learning.
In: Advances in Neural Information Processing Systems. 2016. p. 1054-1062.
https://arxiv.org/abs/1606.02647