SlideShare a Scribd company logo
1 of 39
Training a Multi Layer Perceptron
with Expert Data and Game State
Representation using Convolutional
Neural Networks
JOHN STAMFORD
MSC INTELLIGENT SYSTEMS AND ROBOTICS
Contents
Background and Initial Brief
Previous Work
Motivation
Technical Frameworks
State Representation
Testing
Results
Conclusion
Future work
Background / Brief
Based on a project by Google/Deepmind
Build an App to capture gameplay data
◦Users play Atari games on a mobile device
◦We capture the data (somehow)
Use the data in machine learning
◦Reduce the costliness nature of Reinforcement
Learning
Deepmind
Bought by Google for £400 million
“Playing Atari with Deep Reinforcement Learning” (2013)
General Agent
◦ No prior knowledge of the environment
◦ Inputs (States) and Outputs (Actions)
◦ Learn Policies
◦ Mapping States and Actions
Deep Reinforcement Learning
Deep Q Networks (DQN)
2015 Paper Release (with source code LUA)
Motivation
Starts the Q-Learning Sample Code
◦ Deep Reinforcement Learning (Q-Learning)
◦ Links to Deepmind (Mnih et al. 2013)
Costly nature of Reinforcement Learning
◦ Trial and Error Approach
◦ Issues with long term goals
◦ Makes lots of mistakes
◦ Celiberto et al. (2010) states...
“this technique is not efficient enough to be used in applications
with real world demands due to the time that the agent needs
to learn”
Background
Q-Learning (RL)
◦ Learn the optimal policy, which action to take at each state
◦ Represented as...
Q(s, a)
Functioning: Watkins and Dayan (1992) state that...
◦ system observes its current state xn
◦ selects/performs an action an
◦ observes the subsequent state yn and gets the reward rn
◦ updates the Qn (s, a) values using
◦ a learning rate identified as α
◦ discounted factor as γ
Qn(s,a) = (1 - αn)Qn-1(s, a) + αn[rn + γ(max(Qn-1(yn,a)))]
Pseudo Code
Source: Mnih et al. (2013)
Representation of Q(s,a)
Actions
States
Q Values
Other Methods
Imitation Learning (IL)
◦ Applied to robotics e.g. Nemec et al. (2010), Schmidts et al. (2011) and
Kunze et al. (2013)
Could this be applied to the games agent?
◦ Potentially by mapping the states and the actions from observed game play
◦ Manually updating the policies
Hamahata et al. (2008) states that “imitation learning consisting of a
simple observation cannot give us the sophisticated skill”
Other Methods
Combining RL and IL
◦ Kulkarni (2012, p. 4) refers to this as ‘semi-supervised learning’
◦ Barto and Rosenstein (2004) suggesting the use of a model which acts as a
supervisor and an actor.
Supervisor Information (Barto and Rosenstein, 2004)
State
Representation
The Plan (at this point)
Reduce the costly impact of RL
◦ Use some form of critic or early reward system
◦ If no Q Value exists for that state, then check with an expert
Capture Expert Data
◦ States
◦ Actions
◦ Rewards
Build a model
Use the model to inform the Q Learning System
Data Capture Plan
Capture
Input Data
Using Stella VCS
based Android
Solution
User Actions
Up, Down, Left, Right, Up, Down, Left,
Right, Up, Down, Left, Right,
Up, Down, Left, Right, Up, Down, Left,
Right, Up, Down, Left, Right,
Up, Down, Left, Right, Up, Down, Left,
Right
Account for SEED Variant
setSeed(12345679)
Replay in the Lab
Extract Score & States
Using ALE
The Big Problem
We couldn’t account for the randomisation
◦ALE is based on Stella
◦ Version problems
◦Tested various approaches
◦Replayed games over Skype
We could save the state..!
◦But had some problems
Other problems
Technical Implementation
Arcade Learning Environment (ALE) (Bellemare et al 2013)
◦ General Agent Testing Environment using Atari Games
◦ Supporting 50+ Games
◦ Based Stella VCS Atari Emulator
◦ Supports Agents in C++, Java and more...
Python 2.7 (Anaconda Distribution)
Theano (ML Framework written in Python)
◦ Mnih et al. (2013)
◦ Q-Learning Sample Code
◦ Korjus (2014)
Linux then Windows 8, Cuda Support
Computational Requirements
Test System
◦ Simple CNN / MLP
◦ 16,000 grayscale
◦ 28x28 images
Results
◦ Significant Difference with Cuda Support
◦ CNN Process is very computationally
costly
MLP Speed Test Results
CNN Speed Test Results
States and Actions
States - Screen Data
◦Raw Screen Data
◦SDL (SDL_Surface)
◦ BMP File
Actions – Controller Inputs
Resulted in….
◦Lots of Images matched to entries in a CSV File
Rewards
ALE Reward Data
void BreakoutSettings::step(const System& system) {
// update the reward
int x = readRam(&system, 77);
int y = readRam(&system, 76);
reward_t score = 1 * (x & 0x000F) + 10 * ((x & 0x00F0) >>
4) + 100 * (y & 0x000F);
m_reward = score - m_score;
m_score = score;
// update terminal status
int byte_val = readRam(&system, 57);
if (!m_started && byte_val == 5) m_started = true;
m_terminal = m_started && byte_val == 0;
}
State Representation
Screen Pixel – 160 x 210 RGB
If we used them as inputs...
◦ RGB: 100,800
◦ Greyscale: 33,000
Mnih et al. (2013) use cropped 84 x 84 images
◦ Good – High Resolutions, Lots of Features Present
◦ Bad – When handling lots of training data
MNIST Example Set use 28 x 28
◦ Good – Computationally Acceptable
◦ Bad – Limited Detail
The problem
◦ Unable to process large amounts of hi-res images
◦ Low-res images gave poor results
Original System - Image
Processing
Image Resize Methods
Temporal Data (Frame Merging)
Original System - Training
Results
28x28 Images
64x64 Images
84x84 (4,100 images) = Memory Error
7 minutes for 16,000 28x28
18 minutes for 4,000 64x64
Development
Original
Revised
CNN Framework
Mnih et al. (2013) make use of Convolutional Neural Networks
Feature extraction
◦ Can be used to reduce Dimensionality of the Domain Space
◦ Examples include
◦ Hand Writing Classification Yuan et al. (2012), Bottou et al. (1994)
◦ Face Detection Garcia and Delakis (2004) and Chen et al. (2006)
A CNN as inputs for a fully connected MLP (Bergstra et al. 2010).
Convolutional Neural Networks
Feature Extraction
Developed as a result of the work of LeCun
et al. (1998)
Take inspiration from cats and monkeys
visual processes Hubel and Wiesel (1962,
1968)
Can accommodate changes in Scale,
Rotation, Stroke Width, etc
Can handle Noise
See: http://yann.lecun.com/exdb/lenet/index.html
Convolution of an Image
0 0 0
0 1 0
0 0 0
Example Kernel
Source: https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
Other Examples
0 -1 0
-1 5 -1
0 -1 0
0 1 0
1 -4 1
0 1 0
1 0 -1
0 0 0
-1 0 1
-1 -1 -1
-1 8 -1
-1 -1 -1
Source: http://en.wikipedia.org/wiki/Kernel_(image_processing)
CNN Feature Extraction
Single Convolutional Layer
◦ From Full Resolution Images (160 x 210 RGB)
1,939 Inputs
130 Inputs
CNN Feature Extraction
Binary Conversion
◦ Accurate State Representation
Lower Computational Costs
◦ Single Convolution Layer (15 seconds for 2,391 images / 11.7 seconds for 1,790)
◦ Reduced number of inputs for the MLP
◦ More Manageable
Problems & Limitations
Binary Conversion was too severe (Breakout)
Feature removed by binary conversion as shown above
Seaquest could not differentiate between the enemy and the goals
New System Training Results
Test Configuration
Results
Lowest Error Rate: 32.50%
Evidence of Learning
MLP New System
More Testing
Conclusion
Large amounts of data
CNN as a Preprocessor...
◦ Reduced Computational Costs
◦ Allowed for good state representation
◦ Reduced dimensionality for the MLP
Old System
◦ No evidence of learning
New System
◦ Evidence of the system learning
◦ Needs to be implemented as an agent to test real-world effectiveness
What would I do differently?
Better Evaluation Methodology
◦ What was the frequency/distribution of controls?
◦ Was the system better at different games or controls?
Went too far with the image conversion...
Future Work
1. Data Collection Methods
2. Foundation for Q-Learning
Future Work
3. State Representation
Step 1
Identify areas of interest
Step 2
Process and Classify Area
Step 3
Update State Representation
Future Work
4. Explore the effects of multiple Convolutional Layers
5. Build a working agent...!
? ?
Useful Links
ALE (Visual Studio Version)
https://github.com/mvacha/A.L.E.-0.4.4.-Visual-Studio
Replicating the Paper “Playing Atari with Deep Reinforcement Learning” - Kristjan Korjus et al
https://courses.cs.ut.ee/MTAT.03.291/2014_spring/uploads/Main/Replicating%20DeepMind.pdf
Github for the above project
https://github.com/kristjankorjus/Replicating-DeepMind/tree/master/src
ALE : http://www.arcadelearningenvironment.org/
ALE Old Site: http://yavar.naddaf.name/ale/
Bibliography
Barto, M. T. and Rosenstein, A. G. (2004), `Supervised actor-critic reinforcement learning', Handbook of Learning and Approximate Dynamic
Programming 2, 359.
Bellemare, M. G., Naddaf, Y., Veness, J. and Bowling, M. (2013), `The arcade learning environment: An evaluation platform for general agents',
Journal of Articial Intelligence Research 47, 253-279.
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D. and Bengio, Y. (2010), Theano: a CPU and
GPU math expression compiler, in `Proceedings of the Python for Scientic Computing Conference (SciPy)'. Oral Presentation.
Celiberto, L., Matsuura, J., Lopez de Mantaras, R. and Bianchi, R. (2010), Using transfer learning to speed-up reinforcement learning: A cased-
based approach, in `Robotics Symposium and Intelligent Robotic Meeting (LARS), 2010 Latin American', pp. 55-60
Korjus, K., Kuzovkin, I., Tampuu, A. and Pungas, T. (2014), Replicating the paper "Playing Atari with Deep Reinforcement Learning", Technical
report, University of Tartu.
Kulkarni, P. (2012), Reinforcement and systemic machine learning for decision making, John Wiley & Sons, Hoboken.
Kunze, L., Haidu, A. and Beetz, M. (2013), Acquiring task models for imitation learning through games with a purpose, in `Intelligent Robots and
Systems (IROS), 2013 IEEE/RSJ International Conference on', pp. 102-107.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M. (2013), Playing Atari with deep reinforcement
learning, in `NIPS Deep Learning Workshop'.
Nemec, B., Zorko, M. and Zlajpah, L. (2010), Learning of a ball-in-a-cup playing robot, in `Robotics in Alpe-Adria-Danube Region (RAAD), 2010 IEEE
19th International Workshop on', pp. 297-301.
Schmidts, A. M., Lee, D. and Peer, A. (2011), Imitation learning of human grasping skills from motion and force data, in `Intelligent Robots and
Systems (IROS), 2011 IEEE/RSJ International Conference on', pp. 1002-1007.
Watkins, C. J. C. H. and Dayan, P. (1992), `Technical note q-learning', Machine Learning 8, 279-292.
Thank you

More Related Content

What's hot

Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachMaurizio Calo Caligaris
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired personshahsamkit73
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Alexander Korbonits
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...Edge AI and Vision Alliance
 
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Daniel Lewis
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning OverviewMykhailo Koval
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMustafa Aldemir
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...Francisco (Paco) Florez-Revuelta
 
Compressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningCompressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningSong Cui, Ph.D
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesHJ van Veen
 
An overview of machine learning
An overview of machine learningAn overview of machine learning
An overview of machine learningdrcfetr
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...Edge AI and Vision Alliance
 
使用人工智慧檢測三維錫球瑕疵_台大傅楸善
使用人工智慧檢測三維錫球瑕疵_台大傅楸善使用人工智慧檢測三維錫球瑕疵_台大傅楸善
使用人工智慧檢測三維錫球瑕疵_台大傅楸善CHENHuiMei
 

What's hot (20)

Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired person
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
 
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
 
Compressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningCompressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learning
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
An overview of machine learning
An overview of machine learningAn overview of machine learning
An overview of machine learning
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labels
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
 
Term11566
Term11566Term11566
Term11566
 
使用人工智慧檢測三維錫球瑕疵_台大傅楸善
使用人工智慧檢測三維錫球瑕疵_台大傅楸善使用人工智慧檢測三維錫球瑕疵_台大傅楸善
使用人工智慧檢測三維錫球瑕疵_台大傅楸善
 

Viewers also liked

Pathfinding in partially explored games environments
Pathfinding in partially explored games environmentsPathfinding in partially explored games environments
Pathfinding in partially explored games environmentsjohnstamford
 
Les roses bleues
Les roses bleuesLes roses bleues
Les roses bleuesNicBD
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Spark Summit
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Real_Estate_Script
Real_Estate_ScriptReal_Estate_Script
Real_Estate_ScriptJeff Kent
 
Beijing-s-stealthy-advance-towards-Greater-China
Beijing-s-stealthy-advance-towards-Greater-ChinaBeijing-s-stealthy-advance-towards-Greater-China
Beijing-s-stealthy-advance-towards-Greater-ChinaJames Pomfret
 
Organizational environment and developing countries
Organizational environment and developing countriesOrganizational environment and developing countries
Organizational environment and developing countriesRobius Bagoka
 
Classification and genetic algorithms
Classification and genetic algorithmsClassification and genetic algorithms
Classification and genetic algorithmsAlessandro Zoia
 
Home Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings FundHome Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings Fundjohnstamford
 

Viewers also liked (20)

Pathfinding in partially explored games environments
Pathfinding in partially explored games environmentsPathfinding in partially explored games environments
Pathfinding in partially explored games environments
 
Les roses bleues
Les roses bleuesLes roses bleues
Les roses bleues
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
 
Crime Prevention and Modern Policing
Crime Prevention and Modern PolicingCrime Prevention and Modern Policing
Crime Prevention and Modern Policing
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Real_Estate_Script
Real_Estate_ScriptReal_Estate_Script
Real_Estate_Script
 
Question 1
Question 1 Question 1
Question 1
 
Beijing-s-stealthy-advance-towards-Greater-China
Beijing-s-stealthy-advance-towards-Greater-ChinaBeijing-s-stealthy-advance-towards-Greater-China
Beijing-s-stealthy-advance-towards-Greater-China
 
Pomfretfeatures2009
Pomfretfeatures2009Pomfretfeatures2009
Pomfretfeatures2009
 
Pieter CV
Pieter CVPieter CV
Pieter CV
 
Organizational environment and developing countries
Organizational environment and developing countriesOrganizational environment and developing countries
Organizational environment and developing countries
 
Classification and genetic algorithms
Classification and genetic algorithmsClassification and genetic algorithms
Classification and genetic algorithms
 
CV - Paulus Poniman
CV - Paulus PonimanCV - Paulus Poniman
CV - Paulus Poniman
 
Home Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings FundHome Telehealth Monitoring Outcome Assessment - Kings Fund
Home Telehealth Monitoring Outcome Assessment - Kings Fund
 
About india
About indiaAbout india
About india
 
serv_ter
serv_terserv_ter
serv_ter
 
Kirelys capriles
Kirelys caprilesKirelys capriles
Kirelys capriles
 
03
0303
03
 

Similar to Atari Game State Representation using Convolutional Neural Networks

Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET Journal
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learningwesley chun
 
UMLassure: An approach to model software security
UMLassure: An approach to model software securityUMLassure: An approach to model software security
UMLassure: An approach to model software securitymanishthaper
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningAI Frontiers
 
Enhance your java applications with deep learning using deep netts
Enhance your java applications with deep learning using deep nettsEnhance your java applications with deep learning using deep netts
Enhance your java applications with deep learning using deep nettsZoran Sevarac, PhD
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
IRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet ArchitectureIRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet ArchitectureIRJET Journal
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET Journal
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
Unsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the ArtUnsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the ArtTJ Torres
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoDEVCON
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET Journal
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 
Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)wesley chun
 

Similar to Atari Game State Representation using Convolutional Neural Networks (20)

Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learning
 
UMLassure: An approach to model software security
UMLassure: An approach to model software securityUMLassure: An approach to model software security
UMLassure: An approach to model software security
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
 
Enhance your java applications with deep learning using deep netts
Enhance your java applications with deep learning using deep nettsEnhance your java applications with deep learning using deep netts
Enhance your java applications with deep learning using deep netts
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
IRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet ArchitectureIRJET - Gender and Age Prediction using Wideresnet Architecture
IRJET - Gender and Age Prediction using Wideresnet Architecture
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
 
1025 track1 Malin
1025 track1 Malin1025 track1 Malin
1025 track1 Malin
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Unsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the ArtUnsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the Art
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Atari Game State Representation using Convolutional Neural Networks

  • 1. Training a Multi Layer Perceptron with Expert Data and Game State Representation using Convolutional Neural Networks JOHN STAMFORD MSC INTELLIGENT SYSTEMS AND ROBOTICS
  • 2. Contents Background and Initial Brief Previous Work Motivation Technical Frameworks State Representation Testing Results Conclusion Future work
  • 3. Background / Brief Based on a project by Google/Deepmind Build an App to capture gameplay data ◦Users play Atari games on a mobile device ◦We capture the data (somehow) Use the data in machine learning ◦Reduce the costliness nature of Reinforcement Learning
  • 4. Deepmind Bought by Google for £400 million “Playing Atari with Deep Reinforcement Learning” (2013) General Agent ◦ No prior knowledge of the environment ◦ Inputs (States) and Outputs (Actions) ◦ Learn Policies ◦ Mapping States and Actions Deep Reinforcement Learning Deep Q Networks (DQN) 2015 Paper Release (with source code LUA)
  • 5. Motivation Starts the Q-Learning Sample Code ◦ Deep Reinforcement Learning (Q-Learning) ◦ Links to Deepmind (Mnih et al. 2013) Costly nature of Reinforcement Learning ◦ Trial and Error Approach ◦ Issues with long term goals ◦ Makes lots of mistakes ◦ Celiberto et al. (2010) states... “this technique is not efficient enough to be used in applications with real world demands due to the time that the agent needs to learn”
  • 6. Background Q-Learning (RL) ◦ Learn the optimal policy, which action to take at each state ◦ Represented as... Q(s, a) Functioning: Watkins and Dayan (1992) state that... ◦ system observes its current state xn ◦ selects/performs an action an ◦ observes the subsequent state yn and gets the reward rn ◦ updates the Qn (s, a) values using ◦ a learning rate identified as α ◦ discounted factor as γ Qn(s,a) = (1 - αn)Qn-1(s, a) + αn[rn + γ(max(Qn-1(yn,a)))]
  • 7. Pseudo Code Source: Mnih et al. (2013)
  • 9. Other Methods Imitation Learning (IL) ◦ Applied to robotics e.g. Nemec et al. (2010), Schmidts et al. (2011) and Kunze et al. (2013) Could this be applied to the games agent? ◦ Potentially by mapping the states and the actions from observed game play ◦ Manually updating the policies Hamahata et al. (2008) states that “imitation learning consisting of a simple observation cannot give us the sophisticated skill”
  • 10. Other Methods Combining RL and IL ◦ Kulkarni (2012, p. 4) refers to this as ‘semi-supervised learning’ ◦ Barto and Rosenstein (2004) suggesting the use of a model which acts as a supervisor and an actor. Supervisor Information (Barto and Rosenstein, 2004) State Representation
  • 11. The Plan (at this point) Reduce the costly impact of RL ◦ Use some form of critic or early reward system ◦ If no Q Value exists for that state, then check with an expert Capture Expert Data ◦ States ◦ Actions ◦ Rewards Build a model Use the model to inform the Q Learning System
  • 12. Data Capture Plan Capture Input Data Using Stella VCS based Android Solution User Actions Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right Account for SEED Variant setSeed(12345679) Replay in the Lab Extract Score & States Using ALE
  • 13. The Big Problem We couldn’t account for the randomisation ◦ALE is based on Stella ◦ Version problems ◦Tested various approaches ◦Replayed games over Skype We could save the state..! ◦But had some problems Other problems
  • 14. Technical Implementation Arcade Learning Environment (ALE) (Bellemare et al 2013) ◦ General Agent Testing Environment using Atari Games ◦ Supporting 50+ Games ◦ Based Stella VCS Atari Emulator ◦ Supports Agents in C++, Java and more... Python 2.7 (Anaconda Distribution) Theano (ML Framework written in Python) ◦ Mnih et al. (2013) ◦ Q-Learning Sample Code ◦ Korjus (2014) Linux then Windows 8, Cuda Support
  • 15. Computational Requirements Test System ◦ Simple CNN / MLP ◦ 16,000 grayscale ◦ 28x28 images Results ◦ Significant Difference with Cuda Support ◦ CNN Process is very computationally costly MLP Speed Test Results CNN Speed Test Results
  • 16. States and Actions States - Screen Data ◦Raw Screen Data ◦SDL (SDL_Surface) ◦ BMP File Actions – Controller Inputs Resulted in…. ◦Lots of Images matched to entries in a CSV File
  • 17. Rewards ALE Reward Data void BreakoutSettings::step(const System& system) { // update the reward int x = readRam(&system, 77); int y = readRam(&system, 76); reward_t score = 1 * (x & 0x000F) + 10 * ((x & 0x00F0) >> 4) + 100 * (y & 0x000F); m_reward = score - m_score; m_score = score; // update terminal status int byte_val = readRam(&system, 57); if (!m_started && byte_val == 5) m_started = true; m_terminal = m_started && byte_val == 0; }
  • 18. State Representation Screen Pixel – 160 x 210 RGB If we used them as inputs... ◦ RGB: 100,800 ◦ Greyscale: 33,000 Mnih et al. (2013) use cropped 84 x 84 images ◦ Good – High Resolutions, Lots of Features Present ◦ Bad – When handling lots of training data MNIST Example Set use 28 x 28 ◦ Good – Computationally Acceptable ◦ Bad – Limited Detail The problem ◦ Unable to process large amounts of hi-res images ◦ Low-res images gave poor results
  • 19. Original System - Image Processing Image Resize Methods Temporal Data (Frame Merging)
  • 20. Original System - Training Results 28x28 Images 64x64 Images 84x84 (4,100 images) = Memory Error 7 minutes for 16,000 28x28 18 minutes for 4,000 64x64
  • 22. CNN Framework Mnih et al. (2013) make use of Convolutional Neural Networks Feature extraction ◦ Can be used to reduce Dimensionality of the Domain Space ◦ Examples include ◦ Hand Writing Classification Yuan et al. (2012), Bottou et al. (1994) ◦ Face Detection Garcia and Delakis (2004) and Chen et al. (2006) A CNN as inputs for a fully connected MLP (Bergstra et al. 2010).
  • 23. Convolutional Neural Networks Feature Extraction Developed as a result of the work of LeCun et al. (1998) Take inspiration from cats and monkeys visual processes Hubel and Wiesel (1962, 1968) Can accommodate changes in Scale, Rotation, Stroke Width, etc Can handle Noise See: http://yann.lecun.com/exdb/lenet/index.html
  • 24. Convolution of an Image 0 0 0 0 1 0 0 0 0 Example Kernel Source: https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
  • 25. Other Examples 0 -1 0 -1 5 -1 0 -1 0 0 1 0 1 -4 1 0 1 0 1 0 -1 0 0 0 -1 0 1 -1 -1 -1 -1 8 -1 -1 -1 -1 Source: http://en.wikipedia.org/wiki/Kernel_(image_processing)
  • 26. CNN Feature Extraction Single Convolutional Layer ◦ From Full Resolution Images (160 x 210 RGB) 1,939 Inputs 130 Inputs
  • 27. CNN Feature Extraction Binary Conversion ◦ Accurate State Representation Lower Computational Costs ◦ Single Convolution Layer (15 seconds for 2,391 images / 11.7 seconds for 1,790) ◦ Reduced number of inputs for the MLP ◦ More Manageable
  • 28. Problems & Limitations Binary Conversion was too severe (Breakout) Feature removed by binary conversion as shown above Seaquest could not differentiate between the enemy and the goals
  • 29. New System Training Results Test Configuration Results Lowest Error Rate: 32.50%
  • 32. Conclusion Large amounts of data CNN as a Preprocessor... ◦ Reduced Computational Costs ◦ Allowed for good state representation ◦ Reduced dimensionality for the MLP Old System ◦ No evidence of learning New System ◦ Evidence of the system learning ◦ Needs to be implemented as an agent to test real-world effectiveness
  • 33. What would I do differently? Better Evaluation Methodology ◦ What was the frequency/distribution of controls? ◦ Was the system better at different games or controls? Went too far with the image conversion...
  • 34. Future Work 1. Data Collection Methods 2. Foundation for Q-Learning
  • 35. Future Work 3. State Representation Step 1 Identify areas of interest Step 2 Process and Classify Area Step 3 Update State Representation
  • 36. Future Work 4. Explore the effects of multiple Convolutional Layers 5. Build a working agent...! ? ?
  • 37. Useful Links ALE (Visual Studio Version) https://github.com/mvacha/A.L.E.-0.4.4.-Visual-Studio Replicating the Paper “Playing Atari with Deep Reinforcement Learning” - Kristjan Korjus et al https://courses.cs.ut.ee/MTAT.03.291/2014_spring/uploads/Main/Replicating%20DeepMind.pdf Github for the above project https://github.com/kristjankorjus/Replicating-DeepMind/tree/master/src ALE : http://www.arcadelearningenvironment.org/ ALE Old Site: http://yavar.naddaf.name/ale/
  • 38. Bibliography Barto, M. T. and Rosenstein, A. G. (2004), `Supervised actor-critic reinforcement learning', Handbook of Learning and Approximate Dynamic Programming 2, 359. Bellemare, M. G., Naddaf, Y., Veness, J. and Bowling, M. (2013), `The arcade learning environment: An evaluation platform for general agents', Journal of Articial Intelligence Research 47, 253-279. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D. and Bengio, Y. (2010), Theano: a CPU and GPU math expression compiler, in `Proceedings of the Python for Scientic Computing Conference (SciPy)'. Oral Presentation. Celiberto, L., Matsuura, J., Lopez de Mantaras, R. and Bianchi, R. (2010), Using transfer learning to speed-up reinforcement learning: A cased- based approach, in `Robotics Symposium and Intelligent Robotic Meeting (LARS), 2010 Latin American', pp. 55-60 Korjus, K., Kuzovkin, I., Tampuu, A. and Pungas, T. (2014), Replicating the paper "Playing Atari with Deep Reinforcement Learning", Technical report, University of Tartu. Kulkarni, P. (2012), Reinforcement and systemic machine learning for decision making, John Wiley & Sons, Hoboken. Kunze, L., Haidu, A. and Beetz, M. (2013), Acquiring task models for imitation learning through games with a purpose, in `Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on', pp. 102-107. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M. (2013), Playing Atari with deep reinforcement learning, in `NIPS Deep Learning Workshop'. Nemec, B., Zorko, M. and Zlajpah, L. (2010), Learning of a ball-in-a-cup playing robot, in `Robotics in Alpe-Adria-Danube Region (RAAD), 2010 IEEE 19th International Workshop on', pp. 297-301. Schmidts, A. M., Lee, D. and Peer, A. (2011), Imitation learning of human grasping skills from motion and force data, in `Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on', pp. 1002-1007. Watkins, C. J. C. H. and Dayan, P. (1992), `Technical note q-learning', Machine Learning 8, 279-292.