Contenu connexe Similaire à Let’s Talk about Reinforcement Learning with Amazon SageMaker RL (AIM399) - AWS re:Invent 2018 (20) Plus de Amazon Web Services (20) Let’s Talk about Reinforcement Learning with Amazon SageMaker RL (AIM399) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’sTalk about Reinforcement
Learning withAmazonSageMaker RL
Sina Afrooze
A I M 3 9 9
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Introduction to Reinforcement Learning
SageMaker RL example: Cartpole
Overview of RL-Coach
Introduction to SageMaker RL
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WhyReinforcement Learning
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reinforcement learning concepts
Learn by
interacting with
the real world
Model the real
world problem
as a simulation
environment
Trial and error
Observe
results
Optimize
learning
strategy to
maximize long
term reward
Model learns
how to make
complex
decisions
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Theprototypicalexample / keyconcepts
State:
• Position of the cart (x)
• Velocity of the cart (v)
• Position of the pole (θ)
• Velocity of the pole (⍵)
⍵
θ x v
⍵
θ
x
v
L
Agent/policy actions:
• Left (L)
• Right (R)
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Theprototypicalexample / keyconcepts
⍵
θ x v
⍵
θ
x
v
L
⍵’
θ' x' v'
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Howtolearn thepolicy?
• maps states to an action
• maximizes agent’s long term reward
Reward + 0Reward + 1Reward + 1Reward + 1
11+0.91+0.9 + 0.81
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thepolicygradient algorithm
• Stochastic policy to sample actions
• Initialize with random weights
• 100 episodes
• Calculate gradients per decision made by the agent
• Update weights
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Keyconcepts
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typicalchallengesin running RLworkloads
Difficult to get
started
RL agent
algorithms are
complex to
implement
Hard to integrate
environments for
training
Training is
computationally
expensive and
time consuming
Requires trial and
error & frequent
tuning of
hyperparameters
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonSageMaker RLmakes RLmore accessible
Difficult to get
started
RL agent
algorithms are
complex to
implement
Hard to integrate
environments for
training
Training is
computationally
expensive and
time consuming
Requires trial and
error & frequent
tuning of
hyperparameters
Difficult to get
started
RL agent
algorithms are
complex to
implement
Hard to integrate
environments for
training
Single/ Distributed
training; Local/
Remote
environment
Local mode for
debugging;
Automatic Model
Tuning
Pre-built
environments for
RL; numerous
examples
Support for RL
agent algorithms
Easy to integrate
variety of
simulation
environments
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonSageMaker RL
End-to-end examples for classic RL and real-world RL applications
Robotics
Industrial
Control
HVAC
Autonomous
Vehicles Operations Finance Games NLP
Open AI Gym
SageMaker supported Customer BYO
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaletrainingfor production workloads
Single/distributed
Homogenous/
heterogeneous clusters
Local/remote simulation
environment
Logging/metrics/
visualization
High performance CPU/
GPU hardware
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Run parallelsimulations
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RL-Coach offers stateof theartRLagent
algorithmimplementations
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RL-Coach inAWSDeepRacer
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
22. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sina Afrooze
Twitter: @sinafz
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.