A Retrospective Look at A Retrospective Look at Classiﬁer System ResearchClassiﬁer System Research

A Retrospective Look at
Classiﬁer System Research

Lashon B. Booker
The MITRE Corporation

© 2006 The MITRE Corporation. All rights reserved.

Early Motivations for Learning Classiﬁer
System (LCS) Research

 Design symbolic problem solvers that avoid brittleness in
realistic (uncertain and continually varying) domains
involving
– On-line, adaptive control of behaviors
Representations and procedures must adjust without unnecessarily
disrupting existing capabilities
– Discovering relevant categories in a complex and unlabeled
stream of input
Inputs must be incrementally grouped together into plausible classes

 This is especially difficult when behavior requires more
knowledge representation and processing capability than
is available with simple empirical associations between
inputs and outputs


Requirements for Non-Brittle Rule-Based
Behavior

 Need to identify and take advantage of the exploitable
regularities in the environment
 Generalizations must be selective, pragmatic and subject
to exceptions
 Learning must be incremental and closely coupled with
performance and with unfolding reality
 Rules must be treated as tentative hypotheses (not logical
assertions) subject to testing and conformation
– Hypothesis “strength” is derived from experienced-based
predictions of performance
– Strength is used to determine rule fitness and infer plausibility


Observations about early research
 The Holland and Reitman collaboration placed a strong
emphasis on cognition and characterized the problems of
interest
 Viewed classifier systems as symbolic problem solvers that
avoid brittle behavior (an alternative to expert systems)
– Treat rule set as a model and rules as parts in a context
– Evaluation of parts is context dependent (i.e., aspects are non-stationary)

 Learning emphasized policy search and value estimation
– Rules are policy elements along with performance estimators
– Adjust policy via natural selection among rule types
– The Pitt approach preserved this idea, using the GA for direct policy search

 Included provisions for motivation, affect and introspection

 These ideas provided the foundation for a comprehensive theory
of induction (rule clusters, distributed representations,
associations, spreading activation, etc.)


Inﬂuence of reinforcement learning

Reinforcement learning problems are

faced by agents that must learn action
sequences from trial-and-error
– Framework provides attractive formalisms
based on estimating value functions (with
Environment key contributions from Sutton and Barto)
State – Algorithms provide useful benchmarks for
comparisons
input
Emphasis on value functions has had

Learning
a strong influence on LCS research
Agent
Action – The primary niche is learning compact
scalar
value function representations for off-
feedback
policy temporal difference methods
– But, the RL community has good
alternatives
Solution strategies:
• Search the space of possible behaviors It is not clear if we are learning the

best generalizations, or giving
• Estimate utility of taking actions in
sufficient emphasis to policy
world states
improvement © 2006 The MITRE Corporation. All rights reserved.

Value-based generalizations aren’t often intuitive
Start

0
0

0
0

0

0
0
0

50
50
75
75

75
50
50

0

50
75

75
50
125
125
250
250
500
500

500
500
250
125
125

125
250
500

500
250
125
1000
1000
1000
1000

1000
1000
Grefenstette’s 9x32 abstract state space

There are many obvious intuitive solution strategies:

– E.g. Move left or right to column with highest reward, then go straight

Classifier systems tend to learn piecemeal strategies rather than coherent

ones
– Many narrowly-focused general rules are needed to get the overall solution
– Generalizations correspond to symmetries in the reward distribution
e.g., (Row = 111) (Column = #011#)  RIGHT )
not the key attribute-based concepts.
– This distinction has been irrelevant in most classifier system test problems (e.g.,
multiplexor and Woods problems)

Off-policy Methods Learn Different
Behaviors

Since Q learning is an off-policy method

(i.e., behavior policy may differ from
estimation policy), it does not suffer
negative consequences for exploration
Sarsa (i.e. the bucket brigade) is an on-

policy method, so its solution accounts
for the consequences of exploration
In real problems where on-line errors

are costly, this distinction is important
This also has architectural implications

(e.g., how to approximate the value
function)

Bottom line: we need to identify and build on the strengths of the LCS
approach. The key may be in specifying a set of organizing principles
that go beyond implementation diagrams

Soar Architecture of Intelligent Rule-based
Behavior

I/O

Low Faster
Intelligence
Reaction

Deliberation
Learning

Reflection
High
Slower
Intelligence

 Derived by Newell and his students (~1980), also as a response
to the expert system phenomenon
 Based on a theory of problem solving (i.e., problem spaces),
along with a companion view of learning (i.e., chunking)
 The theory was operationalized as an architecture that has
served that community well


What kind of architecture makes sense for
classiﬁer systems?
!*
!
 The key role of policy
policy
improvement suggests that
evaluation
an actor-critic structure may
value
learning
be a good start
Critic
 The idea is to intermix value
Actor
iteration and policy
improvement continually
policy
(state by state, action by
improvement
V, *Q * action, sample by sample)
greedification
V,Q
 Is there an organizing
principle that extends this
concept to cover many forms
of induction at different
scales? (including
perception, reasoning, and
action)


DARPA/IPTO Focus on Cognitive Systems

Darpa views a cognitive system as one that

– can reason, using substantial amounts of appropriately represented knowledge
– can learn from its experience so that it performs better tomorrow than it did today
– can explain itself and be told what to do
– can be aware of its own capabilities and reflect on its own behavior
– can respond robustly to surprise
Learning is ubiquitous. Different forms operate at different times and

places
What niche is the LCS community best suited to fill?Corporation. All rights reserved.
 © 2006 The MITRE

Some Open Problems for Reinforcement
Learning (Sutton) - and Classiﬁer Systems

 Incomplete state information
 Exploration
 Structured states and actions
 Incorporating prior knowledge
 Using teachers
 Theory of RL with function approximators
 Modular and hierarchical architectures
 Integration with other problem–solving and
planning methods


A Retrospective Look at A Retrospective Look at Classiﬁer System ResearchClassiﬁer System Research

Recommandé

Recommandé

Contenu connexe

Similaire à A Retrospective Look at A Retrospective Look at Classiﬁer System ResearchClassiﬁer System Research

Similaire à A Retrospective Look at A Retrospective Look at Classiﬁer System ResearchClassiﬁer System Research (19)

Plus de Xavier Llorà

Plus de Xavier Llorà (20)

Dernier

Dernier (20)

A Retrospective Look at A Retrospective Look at Classiﬁer System ResearchClassiﬁer System Research