Dissertation defense

expressiveintelligencestudio
Integrating Learning in a
Multi-Scale Agent
Ben Weber
Dissertation Defense
May 18, 2012

expressiveintelligencestudio UC Santa Cruz
Introduction
 AI has a long history of using games to
advance the state of the field
[Shannon 1950]

Real-Time Strategy Games
 Building human-level AI for RTS games
remains an open research challenge
StarCraft II, Blizzard Entertainment

Task Environment Properties
Chess StarCraft Taxi Driving
Fully vs. partially
observable
Fully Partially Partially
Deterministic vs.
stochastic
Deterministic Deterministic* Stochastic
Episodic vs.
sequential
Sequential Sequential Sequential
Static vs. dynamic Static Dynamic Dynamic
Discrete vs.
continuous
Discrete Continuous Continuous
Single vs. multiagent Multi Multi Multi
[Russell & Norvig 2009]

Motivation
 RTS games present complex environments
and complex tasks
 Professional players demonstrate a broad
range of reasoning capabilities
 Human behavior can be observed, emulated,
and evaluated
[Langley 2011, Mateas 2002]

Hypothesis
 Reproducing expert-level StarCraft
gameplay involves integrating
heterogeneous reasoning capabilities

Research Questions
 What competencies are necessary for
expert StarCraft gameplay?
 Which competencies can be learned
from demonstrations?
 How can these competencies be
integrated in a real-time agent?

Overview
 StarCraft
 Multi-Scale AI
 Learning from Demonstration
 Integrating Learning
 Evaluation

StarCraft
 Expert gameplay
 300+ APM
 Evolving meta-game
 Exhibited capabilities
 Estimation
 Anticipation
 Adaptation
[Flash, Pro-gamer]

StarCraft Gameplay
Expand Tech Tree
Manage Economy
Produce Units
Attack Opponent

Gameplay Scales in StarCraft
 Individual
 Squad
 Global
Support
siege line
Worker
harassment
Aggressive mine
placement

State Space
 The following number of states are possible,
considering only unit type and location:
(Type * X * Y)Units
 States on a 256x256 tile map:
(100*256*256)1700 > 1011,500

Decision Complexity
 The set of possible actions that can be executed at a
particular moment:
O(2W(A * P) + 2T(D + S) + B(R + C))
 W – number of workers
 A – number of the type of worker assignments
 P – average number of workspaces
 T – number of troops
 D – number of movement directions
[Aha et al. 2005]

Decision Complexity
 The set of possible actions that can be executed at a
particular moment:
O(W * A * P + T * D * S + B(R + C))
 Assumption
 Unit actions can be selected independently
 Resulting complexity:
 Assuming 50 worker units on a 256x256 tile map
results in more than 1,000,000 possible actions

StarCraft
 Complex gameplay
 Real-world properties
 Highly-competitive
 Sources of expert gameplay

Research Question #1
 What competencies are necessary for
expert StarCraft gameplay?

Multi-Scale AI
 Multiple scales
 Actions are performed across multiple
levels of coordination
 Interrelated tasks
 Performance in each tasks impacts other tasks
 Real-time
 Actions are performed in real time

Reactive Planning
 Provides useful mechanisms for building
multi-scale agents
 Advantages
 Efficient behavior selection
 Interleaved plan expansion and execution
 Disadvantages
 Lacks deliberative capabilities
[Loyall 1997, Mateas 2002]

Agent Design
 Implemented in the ABL reactive planning
language
 Architecture
 Extension of McCoy & Mateas integrated agent
framework
 Partitions gameplay into distinct competencies
 Uses a blackboard for coordination
[McCoy & Mateas 2008]

EISBot Managers
Strategy
Manager
Income
Manager
Production
Manager
Tactics
Manager
Recon
Manager
Gather
Resources
Construct
Buildings
Attack
Opponent
Scout
Opponent

Multi-Scale Idioms
 Design patterns for authoring multi-scale AI
 Idioms
 Message passing
 Daemon behaviors
 Managers
 Unit subtasks
 Behavior locking

Idioms in EISBot
Initial_tree
Tactics Manager Strategy Manager Income Manager
Form Squad
Squad Monitor
Squad Attack Squad Retreat
Attack Enemy Pump Probes
Legend
Subgoal
Daemon behavior
Message passingDragoon Dance
Timing Attack WME Probe Stop WME

Multi-Scale AI
 StarCraft gameplay is multi-scale
 Reactive planning provides mechanisms for
multi-scale reasoning
 Idioms are applied in EISBot to support
StarCraft gameplay

 Which competencies can be learned
from demonstrations?

Learning from Demonstration
 Objective
 Emulate capabilities exhibited by expert players
by harnessing gameplay demonstrations
 Methods
 Classification and regression model training
 Case-based goal formulation
 Parameter selection for model optimization

Strategy Prediction
 Tasks
 Identify opponent build orders
 Predict when buildings will be constructed
0
100
200
300
400
0 4
Game Time (minutes)
Spawning Pool Timing
[Hsieh & Sun 2008]

Approach
 Feature encoding
 Each player’s actions are encoded in a single vector
 Vectors are labeled using a build-order rule set
 Features describe the game cycle when a unit or
building type is first produced by a player
t, time when x is first produced by P
0, x was not (yet) produced by P
f(x) = {

Strategy Prediction Results
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9 10 11 12
RecallPrecision
Game Time (minutes)
NNge Boosting Rule Set State Lattice

Strategy Learning
 Task
 Learn build-orders from demonstration
 Trace Algorithm
 Converts replays to a trace representation
 Formulates goals based on most similar situation
q = argminc ϵ L distance(s, c)
g = s + (q’ - q)
[Ontañón et al. 2010]

Trace Retrieval: Example
 Consider a planning window of size 2
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >

Trace Retrieval: Step 1
 The system retrieves the most similar case, q
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >

Trace Retrieval : Step 2
 q’ is retrieved
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >

 The difference is computed: T4 – T2 = <1,1,0.4,1>
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >

 g is computed:
S =< 3, 0, 1, 1 >
T1 =< 2, 0, 0.5, 1 >
T2 =< 3, 0, 0.7, 1 >
T3 =< 4, 1, 0.9, 1 >
T4 =< 4, 1, 1.1, 2 >
g = s + (T4 – T2) = <4, 1, 1.4, 2>

Strategy Learning Results
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
PredictionError(RMSE)
Actions performed by player
Opponent modeling with a window size of 20
Null
IB1
Trace
MultiTrace

State Estimation
 Task
 Estimate enemy positions
given prior observations
 Particle Model
 Apply movement model
 Remove visible particles
 Reweight particles
[Thrun 2002, Bererton 2004]

Parameter Selection
 Free parameters
 Trajectory weights
 Decay rates
 State estimation is represented as an
optimization problem
 Input: parameter weights
 Output: particle model error
 Replays are used to implement a particle model
error function

State Estimation Results
0
20
40
60
80
100
120
140
160
0 2 4 6 8 10 12 14 16 18
ThreatPredictionError
Game Time (Minutes)
Null Model Perfect Tracker Default Model Optimized Model

Learning from Demonstration
 Anticipation
 Classification and regression models
 Adaptation
 Case-based goal formulation
 Estimation
 Model optimization

 How can these competencies be
integrated in a real-time agent?

Agent Architecture

Integration Approaches
 Augmenting working memory
 External plan generation
 External goal formulation
Working
Memory
External
Components

Augmenting Working Memory
 Supplementing working memory with
additional beliefs

External Plan Generation
 Generating plans outside the scope of ABL

External Goal Formulation
 Formulating goals outside the scope of ABL

Goal-Driven Autonomy
 A framework for building self introspective
agents
 GDA agents monitor plan execution, detect
discrepancies, and explain failures
 Implementations
 Hand-authored rules
 Case-based reasoning
[Molineaux et al. 2010, Muñoz-Avila et al. 2010]

GDA Subtasks
 Expectation generation
 Discrepancy detection
 Explanation generation
 Goal formulation

Implementation

Integrating Learning
 ABL agents can be interfaced with external
learning components
 Applying the GDA model enabled tighter
coordination across capabilities
 EISBot incorporates ABL behaviors, a particle
model, and a GDA implementation

Evaluation
 Claim
 Reproducing expert-level StarCraft
gameplay involves integrating
heterogeneous reasoning capabilities
 Experiments
 Ablation studies
 User study

GDA Ablation Study
 Agent configurations
 Base
 Formulator
 Predictor
 GDA
 Free parameters
 Planning window size
 Look-ahead window size
 Discrepancy period
Discrepancy
Detector
Explanation
Generator
Goal
Formulator
Goal
Manager
Discrepancies
Explanations
Goals

GDA Results
 Overall results from the GDA experiments
Agent
Win
Ratio
Base 0.73
Formulator 0.77
Predictor 0.81
GDA 0.92

User Study
 Experiment setup
 Matches hosted on ICCup
 3 trials
 Testing script
1. Launch StarCraft
2. Connect to server
3. Host match
4. Announce experiment [Dennis Fong, Pro-gamer]

Performance on Tau Cross
0
500
1000
1500
2000
0 10 20 30 40 50
ICCupScore
Number of Games Played
Base
Formulator
Predictor
GDA

ICCup Results
Agent Longinus Python Tau Cross Overall
Base 942 599 669 737
Formulator 980 718 1078 925
Predictor 1111 555 1145 937
GDA 952 860 1293 1035

EISBot Ranking
 Rankings achieved by the complete GDA agent
Trial
Percentile
Ranking
Longinus 32nd
Python 8th
Tau Cross 66th
Average 48th

Evaluation
 Ablation Studies
 Optimized particle model
 Complete GDA model
 Integrating additional capabilities into EISBot
improved performance
 EISBot performed at the level of a competitive
amateur StarCraft player

Conclusion
 Objective
 Identify and realize capabilities necessary for
expert-level StarCraft gameplay in an agent
 Approach
 Decompose gameplay
 Learn capabilities from demonstrations
 Integrate learned gameplay models
 Evaluate versus humans and agents

Contributions
 Idioms for authoring multi-scale agents
 Methods for learning from demonstration
 Integration approaches for ABL agents

Integrating Learning in a Multi-Scale Agent
 Ben G. Weber
 Ph.D. Candidate
 Expressive Intelligence Studio
 UC Santa Cruz
 bweber@soe.ucsc.edu
 Funding
 NSF Grant IIS – 1018954

References
 Aha, Molineaux, & Ponsen. 2005. “Learning to Win: Case-Based Plan
Selection in a Real-Time Strategy Game”, Proceedings of ICCBR.
 Bererton. 2004. “State Estimation for Game AI using Particle Filters”,
Proceedings of AAI Workshop on Challenges in Game AI.
 Hsieh & Sun. 2008. “Building a Player Strategy Model by Analyzing Replays
of Real-Time Strategy Games”, Proceedings of IJCNN.
 Langley. 2011. “Artificial Intelligence and Cognitive Systems”, AISB
Quarterly.
 Loyall. 1997. “Believable Agents: Building Interactive Personalities”, Ph.D.
thesis, CMU.
 Mateas. 2002. “Believable Agents: Building Interactive Personalities”,
Ph.D. thesis, CMU.

References
 McCoy & Mateas. 2008. “An Integrated Agent for Playing Real-Time
Strategy Games”, Proceedings of AAAI.
 Molineaux, Klenk, Aha. 2010. “Goal-Driven Autonomy in a Navy Strategy
Simulation”, Proceedings of AAAI.
 Muñoz-Avila, Aha, Jaidee, Klenk, Molineaux. 2010. “Applying Goal Driven
Autonomy to a Team Shooter Game”, Proceedings of FLAIRS.
 Ontañón, Mishra, Sugandh, Ram. 2010. “On-line Case-Based Planning”,
Computational Intelligence.
 Russell & Norvig. 2009. Artificial Intelligence: A Modern Approach.
 Shannon. 1950. “Programming a Computer for Playing Chess”,
Philosophical magazine .
 Thrun. 2002. “Particle Filters in Robotics”, Proceedings of UAI.

Dissertation defense

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Dissertation defense

Similaire à Dissertation defense (20)

Dernier

Dernier (20)

Dissertation defense