Developers start your engines! This breakout session will provide an introduction to the newly launched AWS DeepRacer. Learn about the basics of reinforcement learning, what’s under the hood and your opportunities to experience AWS DeepRacer for yourself.
Deep Learning algorithms traditionally fall into broad categories based on
Amount of training data required
Sophistication of the deep learning models.
Supervised: lots of trained data, leads to sophisticated models and prediction accuracy (computer vision, speech)
Unsupervised: algorithms try to find hidden structure in unlabeled data (anomaly detection – data outliers)
A third complementary approach has emerged, called Reinforcement Learning or RL
RL takes a different approach; enables learning complex behavior without pre-labeled data.
Let’s use RL to play Pac-Man. Luckily, Pac-Man only knows how to go up/down/left/right. That makes it easy for an algorithm to play.
Also luckily, we have a ‘simulation’ environment where we can let the algorithm try and play Pac-Man – the game!
Finally, we assign rewards for the desired behavior – scoring as many points as possible!
So, we allow the algorithm to play, potentially thousands of times, and instruct it to maximize rewards (learning to eat fruit, while avoiding being eaten by a ghost)
Reinforcement Learning has been applied successfully in a number of practical use cases. Some recent examples include:
- Autonomous cars
Fleet logistics
Financial trading
Data center cooling
But that made us think about what we did with DeepLens to put computer vision and deep learning into the hands of developers…
1/Well, developers have told us they love this approach –
2/they’ve deployed tens of thousands of custom deep learning models to DeepLens devices in the last year,
And so we asked ourselves,
Can we help developers get rolling with reinforcement learning… literally?
After much brainstorming, we decided against building a scale model datacenter and using RL to manage the cooling (though TBH that would have been cool)… and we landed on…
AWS DeepRacer, a fully autonomous 1/18th scale race car, driven by reinforcement learning.
As we built and started testing the car, we realized…
…what’s a car without a little competition?
And so, we’re also announcing…
The AWS DeepRacer league.
Developers train their models via the console for the fastest lap time, and can submit lap times to online leaderboards, or compete in-person at AWS Summits.
Winners of each stage progress to the Championship Final at re:Invent 2019, to win the DeepRacer Cup.
The idea around which reinforcement learning is built, is used quite often, perhaps daily, by humans. This about the last time you used a reward to incentivize the right behavior
Think about the method used to train a dog.
What is reinforcement learning from a software perspective and why would we need it?
Our goal is to create a software agent that can interact with an environment to achieve a goal that we specify.
Reinforcement learning is the method we use to teach the agent which actions to choose from its current state in the environment to achieve its goal. This is different from supervised learning, because in the interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all situations in which the agent has to act. In unchartered territory and agent must be able to learn from its own experience. (Last two sentences from Sutton book)
How does RL do this?? By assigning positive rewards to actions that lead towards the goal and ignoring (or penalizing) actions that move away from the goal. Reinforcement learning make use of the reward signals to teach the agent which actions it should take to achieve the goal.
A key challenge – how do we set up our reward function to ensure our agent achieves its goal?
Agent
A piece of software, or model, that acts autonomously in a given environment to reach a specified goal
Environment
The environment with which our agent interacts
State
The current state of the environment that is visible, or known, to our agent and upon which it needs to act
Action
Given the current state our agent needs to take an action to try and achieve its goal. Action is taken based on exploring, or exploiting what the agent has learned
Reward
If the chosen action gets the agent closer to the goal, reinforce this action in future through a positive reward. Otherwise, discourage it with a negative reward, or no reward
Episode
Each iteration where an agents goes from the start position to a termination state (crashes off track or finishes track)
Value Function
The highest cumulative reward that can be achieved from any state by choosing the action and subsequent actions to maximize the reward for each action
Policy Function
A function that tells the agent how to act in each state. Our car knows which action to take at any position on the track
This week we launched DeepRacer.
AWS DeepRacer is a 1/18th scale robotic car which gives you an exciting and fun way to get started with reinforcement learning (RL) by applying it to autonomous racing. You can pre-order your AWS DeepRacer from Amazon today.
DeepRacer has a virtual racing simulator that allows you to train, evaluate, and iterate on their RL models in a racing environment, quickly and easily.
And if you get really good, and want to showcase your machine learning skills in a competitive environment, there is the DeepRacer league. You can compete in a global championship - racing the car - for a chance to win several prizes and advance to the AWS DeepRacer Grand Final. Throughout 2019 there will be in person events, that will be announced at a later date, and the online simulator will also give developers the opportunity to compete, virtually.
Before we start the engines for the first lab, lets take a quick look at what to expect in the AWS DeepRacer Console.
In the console you are able to create a model, configure the model by specifying the reward functions and hyperparameters. These are critical in tweaking and tuning your model to try and get the best model performance. You then train your model in the simulator in the console, and afterwards you can evaluate your model. If you are happy with the performance of your model you can submit the model to a leaderboard for evaluation to get your name on the leaderboard, or you can download the model and choose to deploy it to the DeepRacer car for a real life experience. IF you are not happy with your model performance you can clone the model, reconfigure it and train again.
In the next lab we will cover steps 1 to 3