Battlesnake AWS ML Meetup Victoria 2020

A battlesnake reinforcement
learning starter pack
Jonathan Chung
Xavier Raffin
©2020 Amazon Web Services, Inc. or its affiliates, All rights reserved

Battlesnake reinforcement learning starter pack
Agent Environment
Training your own battlesnake
reinforcement learning model
2. Reinforcement learning module
Build custom rules on top of an
existing model
3. Heuristics module
One-click deployment of an
existing model
1. One-click deploy module

Reinforcement learning module
Agent Environment

Agent Environment
Actions
Rewards
State

Agent Environment
Actions
Rewards
State
Accelerate, move left, move right
Road image
#Kms driven

Agent Environment
Actions
Rewards
State
Position of pieces
Position of all pieces
Winning/losing

Agent Environment
Actions
Rewards
State
Direction of movement
Pos. snakes&food
Reward?

Pos. snakes+food
Reward per snake
Multiagent reinforcement learning
environment
snakes

Training routine for the module (deep Q learning)
for epi in episodes:
state = env.reset()
while agent.is_alive():
if prob < eps:
action = agent.get_random_action()
else:
action = agent.get_next_best_action(state)
next_state, reward = env.step(action)
memory.append(next_state, state, action, reward)
agent.learn(memory)
Agent Environment
Actions
Rewards
State

Training routine for the module (deep Q learning)
for epi in episodes:
state = env.reset()
while agents.agents_alive() > 1:
actions = []
for agent in agents:
if prob < eps:
action = agent.get_random_action()
else:
action = agent.get_next_best_action(state)
actions.append(action)
next_state, reward = env.step(actions)
memory.append(next_state, state, actions, reward)
for agent in agents:
agent.learn(memory)
Agent
Environment
Actions
Rewards
State
Agent
Agent

Environment:
Agent
Environment
Actions
Rewards
State
Agent
Agent

State representation:
the positions of the snakes and food
Agent
Environment
Actions
Rewards
State
Agent
Agent

-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 1 0 0 -1
-1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
-1 0 5 1 1 -1
-1 0 0 0 1 -1
-1 0 0 0 1 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
-1 1 0 0 0 -1
-1 5 0 1 0 -1
-1 0 0 5 0 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
Food position Target snake position Other snake position
Agent
Environment
Actions
Rewards
State
Agent
Agent

-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 1 0 -1
-1 0 0 5 0 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
-1 1 5 1 1 -1
-1 5 0 0 1 -1
-1 0 0 0 1 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
Food position Snake position Other snake positions
Agent
Environment
Actions
Rewards
State
Agent
Agent
-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 1 0 0 -1
-1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1
-1 1 0 0 0 -1
-1 5 0 0 0 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
-1 0 5 1 1 -1
-1 0 0 1 1 -1
-1 0 0 5 1 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
Food position Snake position Other snake positions
Agent
Environment
Actions
Rewards
State
Agent
Agent
-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 1 0 0 -1
-1 -1 -1 -1 -1 -1

Environment: rewards
+1 every turn the snake lives
Agent
Environment
Actions
Rewards
State
Agent
Agent

Agent
Agent??
Agent
Environment
Actions
Rewards
State
Agent
Agent

Agent
Neural
network
Agent
Environment
Actions
Rewards
State
Agent
Agent

Agent
Neural
network
15
10
Predicted total expected
reward
13
0

Learning
Neural
network
t = 0
t = 1
Neural
network
New reward = 1
48
84
New predicted
total expected reward
1510
130
Predicted total expected reward

Learning
New predicted total expected reward ≃ neural network (new reward, predicted total expected reward)
t = 0
Predicted total expected rewards
t = 1 New predicted total expected rewards
New reward = 1

Learning
- Total Expected reward (Q)
Qupdated(st) ← Qold(st) + ⍺ (reward + ɣ ·[max Q(st+1) – Q(st)])
Qt + Reward = 1 → Qt+1

Customization opportunities
• Environment representations
• Neural network design
• Rewards design

-1 -1 -1 -1 -1 -1
-1 1 0 0 0 -1
-1 5 0 1 0 -1
-1 0 0 5 0 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
Environment representation:
-1 -1 -1 -1 -1 -1
-1 0 5 1 1 -1
-1 0 0 0 1 -1
-1 0 0 0 1 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 1 0 -1
-1 0 0 0 0 -1
-1 0 1 0 0 -1
-1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1
-1 1 0 0 0 -1
-1 5 0 1 0 -1
-1 0 0 5 0 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
Neural network design:
-1 -1 -1 -1 -1 -1
-1 0 5 1 1 -1
-1 0 0 0 1 -1
-1 0 0 0 1 -1
-1 0 0 0 0 -1
-1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 1 0 -1
-1 0 0 0 0 -1
-1 0 1 0 0 -1
-1 -1 -1 -1 -1 -1
Snake
health Snake ID
Turn
count
Neural
network

Rewards design
• Surviving another turn
• Eating food
• Starving
• Winning the game
• Losing the game
• Hitting a
wall/snake/yourself
• Performing a forbidden
move
• Eating another snake
• Forcing another snake to
hit your body

Amazon Sagemaker

Amazon Sagemaker
Agent
Environment
Actions
Rewards
State
Agent
Agent Snake model
Training

Amazon Sagemaker
• Neural network
parameters
• Learning parameters
• Environment
configurations
Agent
Environment
Actions
Rewards
State
Agent
Agent Best snake
Hyper parameter
optimisation

Amazon Sagemaker
-1 -1 -1 -1 -1 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 0 0 0 -1
-1 0 1 0 0 -1
-1 -1 -1 -1 -1 -1
Agent
Environment
Actions
Rewards
State
Agent
Agent
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
Snake with
Border
Snake without
border

Amazon Sagemaker
Agent
Environment
Actions
Rewards
State
Agent
Agent

Custom rules with the heuristic module
Provides a starting point for you to build upon
Pos. snakes+food
Reward?
Trained AI snake model

Code

Situation simulation

One-click deployment with sagemaker
• After you train your own model
• After writing your custom rules
• Using the existing pretrained snake

One-click deployment with sagemaker

Battlesnake AWS ML Meetup Victoria 2020

Recommandé

Recommandé

Contenu connexe

Similaire à Battlesnake AWS ML Meetup Victoria 2020

Similaire à Battlesnake AWS ML Meetup Victoria 2020 (10)

Dernier

Dernier (20)

Battlesnake AWS ML Meetup Victoria 2020

Notes de l'éditeur