SlideShare a Scribd company logo
1 of 15
Download to read offline
DQN with Differentiable Memory Architectures
Okada Shintarou in SP team
(Mentor: Fujita, Kusumoto)
What I Did in This Internship
- Implement the Chainer version DRQN, MQN, RMQN, FRMQN. (The original is
implemented with Torch)
- DRQN has different mechanism from DQN to train
- MQN, RMQN, FRMQN have a Key-Value store memory
- Implement RogueGym
- 3D FPS based RL platform
- Scene images are available without OpenGL
- OpenAI Gym like Interface
- High customizability
Background and Problem
- DQN has been shown to successfully learn to play many Atari 2600 games (e.g.
Pong).
- However DQN is not good at some games where
- agents can not observe whole state of environment and
- have to keep some memories to clear missions (e.g. I-Maze where agents
have to look an indicator tile and go remote correct goal tile.)
Whole of states are observable Partially observable
DeepQNetworks = Q-Learning + DNNs
- Q(s,a) is Quality function:
- Generally, Q(s,a) is approximated by a function because of combinatorial
explosion of s and a.
- In DQN, Q(s,a) is approximated by DNNs.
DNNs
Feedback Recurrent Memory Q-Network (FRMQN) [1]
How to convey past informations?
(a) DQN
- Feed forward only
- Past M frames as input
(b) DRQN
- LSTM
(c) MQN
- No LSTM
- Key-Value store memory
(d) RMQN
- LSTM
- Key-Value store memory
(e) FRMQN
- LSTM
- Key-Value store memory
- Feedback from past memory
output
[1] Oh, Junhyuk, et al. "Control of Memory, Active Perception, and Action in Minecraft." ICML (2016).
External Memory
Write:
Read:
Context:
CNNs to encode input
Retrieved memory
Context vector
Q(s,a)
Project Malmo[2]
- The original paper employed a Minecraft-based environment
- So first, we tried using “Project Malmo” (a Minecraft-based RL platform
developped by Microsoft)
- But, Malmo
- Lacks of stability (Machine Learning turns into surprisingly difficult)
- Uses OpenGL (Please tell me how to play Minecraft on Ubuntu16.04 servers
with TitanX without any displays over SSH)
- Is slow (It takes 4sec overhead per one episode of 30000 episodes)
[2] Johnson M., Hofmann K., Hutton T.,
Bignell D. (2016) The Malmo Platform for
Artificial Intelligence Experimentation.
Proc. 25th International Joint Conference
on Artificial Intelligence, Ed.
Kambhampati S., p. 4246. AAAI Press,
Palo Alto, California USA.
https://github.com/Microsoft/malmo
So I Developped RogueGym
- RogueGym is a rogue-like environment for reinforcement learning inspired by
Project Malmo
- 3D scenes and types of surrounding blocks are available as agents'
observations.
Agent's observation
World's state
(top view)
I-Maze Environment
Environment:
- One long corridor, two goals and one indicator tile
- Agents need to reach correct goal indicated by green
or yellow tile
- Agents spawn randomly directed
- 50 steps limit
Actions: move forward / move backward / turn left / turn right
Rewards:
- Every step: -0.04
- Reaching Blue tile when Indicator tile is Green: 1, Yellow: -1
- Reaching Red tile when Indicator tile is Green: -1, Yellow: 1
4 actions
Experiment 1: Block Input
Comparison of (DQN), DRQN, MQN, RMQN, FRMQN
Environment:
- I-Maze (vertical corrider length 5)
Observations: Types of blocks in front of agents (expressed one-hot-vectors).
corridor = 5
observation range
Raw observations
{stone, air, stone,
stone, air, stone,
air, agent, air}
One-hot vector
air → 100000
stone → 010000
green_tile → 001000
yellow_tile → 000100
red_tile → 000010
blue_tile → 000001
Input
{010000,
100000,
010000,
010000,
100000,
010000,
100000,
/*agent,*/
100000}
- Calculate loss with
- Full episode
- Randomly extracted some successive frames
- When calculating loss, calculate Q value with
- Each frames
- Only last frame
We choiced "Full episode" and "Each frames"
(But the original implementation published at 9/21 seems "Randomly" and "Only
last frame"......This may be a different point from the original implementation)
an episode
There are Some Choices to Train Reccurent Models
an episode
ignore!
- How to select training
batches
- randomly extracted
frames from whole of
episodes
- randomly extracted
frames from randomly
extracted episodes (we
choiced)
Experiment1 Result
episode
totalreward
- DQN are trained with randomly extracted batches of 12 frames.
- The other models are trained with randomly extracted full episodes.
episode
totalreward
Generalization Performance
vertical corridor length
Totalreward(averageof100run)
- The memory limit are
changed from 11 to 49
- FRMQN does not lose
performance on long vertical
corridors.
- DRQN fights well.
Conclusion
- FRMQN has high generalization performance
- Introduced differentiable Key-Value store memory module can be changed the
size after trained
- DRQN is not so bad
- Recurrent Networks are useful for partially observable environment
- It is important that how to run iterations quickly
WIP: Experiment 2: Scene Image Input
Comparison of DQN, DRQN, MQN, RMQN, FRMQN
Environment: I-Maze changing vertical corrider length 5, 7, 9 for every episode
randomly.
Observations: Scene images that agents see
Training detail: Random frames, only last Q value
corridor = 5 corridor = 7 corridor = 9
an agents' observation
example

More Related Content

What's hot

Mitchell's Face Recognition
Mitchell's Face RecognitionMitchell's Face Recognition
Mitchell's Face Recognition
butest
 
presentation-final
presentation-finalpresentation-final
presentation-final
Lin Han
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 

What's hot (20)

Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Mitchell's Face Recognition
Mitchell's Face RecognitionMitchell's Face Recognition
Mitchell's Face Recognition
 
nn network
nn networknn network
nn network
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
 
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement LearningAsynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
 
Back propagation method
Back propagation methodBack propagation method
Back propagation method
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
The Art Of Backpropagation
The Art Of BackpropagationThe Art Of Backpropagation
The Art Of Backpropagation
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
presentation-final
presentation-finalpresentation-final
presentation-final
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introduction
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Cv mini project (1)
Cv mini project (1)Cv mini project (1)
Cv mini project (1)
 
Kaggle Google Landmark recognition
Kaggle Google Landmark recognitionKaggle Google Landmark recognition
Kaggle Google Landmark recognition
 

Viewers also liked

Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜
Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜
Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜
Preferred Networks
 

Viewers also liked (20)

対話における商品の営業
対話における商品の営業対話における商品の営業
対話における商品の営業
 
Generation of 3D-avatar animation from latent representations
Generation of 3D-avatar animation from latent representationsGeneration of 3D-avatar animation from latent representations
Generation of 3D-avatar animation from latent representations
 
Bayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix FactorizationBayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix Factorization
 
Anomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEAnomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAE
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks
 
Automatically Fusing Functions on CuPy
Automatically Fusing Functions on CuPyAutomatically Fusing Functions on CuPy
Automatically Fusing Functions on CuPy
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
aiconf2017okanohara
aiconf2017okanoharaaiconf2017okanohara
aiconf2017okanohara
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へ
 
Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜
Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜
Pythonの理解を試みる 〜バイトコードインタプリタを作成する〜
 
Ibis2016okanohara
Ibis2016okanoharaIbis2016okanohara
Ibis2016okanohara
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017
 

Similar to DQN with Differentiable Memory Architectures

Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Pratik Aggarwal
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
Praveen Narayanan
 
The Future starts with a Promise
The Future starts with a PromiseThe Future starts with a Promise
The Future starts with a Promise
Alexandru Nedelcu
 

Similar to DQN with Differentiable Memory Architectures (20)

Java ME - 04 - Timer, Tasks and Threads
Java ME - 04 - Timer, Tasks and ThreadsJava ME - 04 - Timer, Tasks and Threads
Java ME - 04 - Timer, Tasks and Threads
 
PlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge TechniquesPlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge Techniques
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Testing a 2D Platformer with Spock
Testing a 2D Platformer with SpockTesting a 2D Platformer with Spock
Testing a 2D Platformer with Spock
 
[BGOUG] Java GC - Friend or Foe
[BGOUG] Java GC - Friend or Foe[BGOUG] Java GC - Friend or Foe
[BGOUG] Java GC - Friend or Foe
 
Bugs from Outer Space | while42 SF #6
Bugs from Outer Space | while42 SF #6Bugs from Outer Space | while42 SF #6
Bugs from Outer Space | while42 SF #6
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Monogame and xna
Monogame and xnaMonogame and xna
Monogame and xna
 
Custom SRP and graphics workflows - Unite Copenhagen 2019
Custom SRP and graphics workflows - Unite Copenhagen 2019Custom SRP and graphics workflows - Unite Copenhagen 2019
Custom SRP and graphics workflows - Unite Copenhagen 2019
 
Skyline queries
Skyline queriesSkyline queries
Skyline queries
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
 
Developing Async Sense
Developing Async SenseDeveloping Async Sense
Developing Async Sense
 
Hybrid quantum classical neural networks with pytorch and qiskit
Hybrid quantum classical neural networks with pytorch and qiskitHybrid quantum classical neural networks with pytorch and qiskit
Hybrid quantum classical neural networks with pytorch and qiskit
 
Simple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorial
 
Kotlin coroutine - the next step for RxJava developer?
Kotlin coroutine - the next step for RxJava developer?Kotlin coroutine - the next step for RxJava developer?
Kotlin coroutine - the next step for RxJava developer?
 
Ase02.ppt
Ase02.pptAse02.ppt
Ase02.ppt
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
The Future starts with a Promise
The Future starts with a PromiseThe Future starts with a Promise
The Future starts with a Promise
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplay
 

More from Preferred Networks

More from Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

DQN with Differentiable Memory Architectures

  • 1. DQN with Differentiable Memory Architectures Okada Shintarou in SP team (Mentor: Fujita, Kusumoto)
  • 2. What I Did in This Internship - Implement the Chainer version DRQN, MQN, RMQN, FRMQN. (The original is implemented with Torch) - DRQN has different mechanism from DQN to train - MQN, RMQN, FRMQN have a Key-Value store memory - Implement RogueGym - 3D FPS based RL platform - Scene images are available without OpenGL - OpenAI Gym like Interface - High customizability
  • 3. Background and Problem - DQN has been shown to successfully learn to play many Atari 2600 games (e.g. Pong). - However DQN is not good at some games where - agents can not observe whole state of environment and - have to keep some memories to clear missions (e.g. I-Maze where agents have to look an indicator tile and go remote correct goal tile.) Whole of states are observable Partially observable
  • 4. DeepQNetworks = Q-Learning + DNNs - Q(s,a) is Quality function: - Generally, Q(s,a) is approximated by a function because of combinatorial explosion of s and a. - In DQN, Q(s,a) is approximated by DNNs. DNNs
  • 5. Feedback Recurrent Memory Q-Network (FRMQN) [1] How to convey past informations? (a) DQN - Feed forward only - Past M frames as input (b) DRQN - LSTM (c) MQN - No LSTM - Key-Value store memory (d) RMQN - LSTM - Key-Value store memory (e) FRMQN - LSTM - Key-Value store memory - Feedback from past memory output [1] Oh, Junhyuk, et al. "Control of Memory, Active Perception, and Action in Minecraft." ICML (2016).
  • 6. External Memory Write: Read: Context: CNNs to encode input Retrieved memory Context vector Q(s,a)
  • 7. Project Malmo[2] - The original paper employed a Minecraft-based environment - So first, we tried using “Project Malmo” (a Minecraft-based RL platform developped by Microsoft) - But, Malmo - Lacks of stability (Machine Learning turns into surprisingly difficult) - Uses OpenGL (Please tell me how to play Minecraft on Ubuntu16.04 servers with TitanX without any displays over SSH) - Is slow (It takes 4sec overhead per one episode of 30000 episodes) [2] Johnson M., Hofmann K., Hutton T., Bignell D. (2016) The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence, Ed. Kambhampati S., p. 4246. AAAI Press, Palo Alto, California USA. https://github.com/Microsoft/malmo
  • 8. So I Developped RogueGym - RogueGym is a rogue-like environment for reinforcement learning inspired by Project Malmo - 3D scenes and types of surrounding blocks are available as agents' observations. Agent's observation World's state (top view)
  • 9. I-Maze Environment Environment: - One long corridor, two goals and one indicator tile - Agents need to reach correct goal indicated by green or yellow tile - Agents spawn randomly directed - 50 steps limit Actions: move forward / move backward / turn left / turn right Rewards: - Every step: -0.04 - Reaching Blue tile when Indicator tile is Green: 1, Yellow: -1 - Reaching Red tile when Indicator tile is Green: -1, Yellow: 1 4 actions
  • 10. Experiment 1: Block Input Comparison of (DQN), DRQN, MQN, RMQN, FRMQN Environment: - I-Maze (vertical corrider length 5) Observations: Types of blocks in front of agents (expressed one-hot-vectors). corridor = 5 observation range Raw observations {stone, air, stone, stone, air, stone, air, agent, air} One-hot vector air → 100000 stone → 010000 green_tile → 001000 yellow_tile → 000100 red_tile → 000010 blue_tile → 000001 Input {010000, 100000, 010000, 010000, 100000, 010000, 100000, /*agent,*/ 100000}
  • 11. - Calculate loss with - Full episode - Randomly extracted some successive frames - When calculating loss, calculate Q value with - Each frames - Only last frame We choiced "Full episode" and "Each frames" (But the original implementation published at 9/21 seems "Randomly" and "Only last frame"......This may be a different point from the original implementation) an episode There are Some Choices to Train Reccurent Models an episode ignore! - How to select training batches - randomly extracted frames from whole of episodes - randomly extracted frames from randomly extracted episodes (we choiced)
  • 12. Experiment1 Result episode totalreward - DQN are trained with randomly extracted batches of 12 frames. - The other models are trained with randomly extracted full episodes. episode totalreward
  • 13. Generalization Performance vertical corridor length Totalreward(averageof100run) - The memory limit are changed from 11 to 49 - FRMQN does not lose performance on long vertical corridors. - DRQN fights well.
  • 14. Conclusion - FRMQN has high generalization performance - Introduced differentiable Key-Value store memory module can be changed the size after trained - DRQN is not so bad - Recurrent Networks are useful for partially observable environment - It is important that how to run iterations quickly
  • 15. WIP: Experiment 2: Scene Image Input Comparison of DQN, DRQN, MQN, RMQN, FRMQN Environment: I-Maze changing vertical corrider length 5, 7, 9 for every episode randomly. Observations: Scene images that agents see Training detail: Random frames, only last Q value corridor = 5 corridor = 7 corridor = 9 an agents' observation example