"Development Methods for Intelligent Systems"

“BABEŞ-BOLYAI” UNIVERSITY OF CLUJ-NAPOCA
FACULTY OF MATHEMATICS AND COMPUTER SCIENCE

GABRIELA ŞERBAN

Development methods for Intelligent Systems

Abstract of the Ph.D. Thesis

Supervisor
Prof. univ. dr. MILITON FRENŢIU

2003

Table of Contents page / page
abstract / thesis

INTRODUCTION ........................................................................................................... 2/ 5
CHAPTER I INTELLIGENT SYSTEMS.............................................................. 5/ 9

CHAPTER II MATHEMATICAL MODELS FOR INTELLIGENT
LEARNING AGENTS.......................................................................... 7 / 65
CHAPTER III REINFORCEMENT LEARNING.............................................. 9 / 75
CHAPTER IV NEW APPROACHES IN INTELLIGENT
SYSTEMS...................................................................................................... 11 / 107
4.1 A new real time learning algorithm (RTL) .................. 11 / 107
4.2 A new Logic Architecture for Intelligent Agents
(ALSO) .................................................................................................. 12 / 115
4.3 A new reinforcement learning algorithm (URU) ..... 13 / 123
4.4 A new algorithm for words sense disambiguation... 16 / 134
CHAPTER V IMPLEMENTATIONS OF INTELLIGENT
SYSTEMS...................................................................................................... 18 / 141
5.1 A reinforcement learning Intelligent Agent based
on Adaptive Dynamic Programming................................. 18 / 142
5.2 An Intelligent Agent based on training Hidden
Markov Models.................................................................................. 18 / 147
5.3 A game playing Reinforcement Learning
Intelligent Agent ("Agent 21")............................................. 19 / 152
5.4 "Know-How" Agent....................................................................... 20 / 152
5.5 A real-time learning Intelligent Agent.............................. 20 / 153
5.6 An agent for words disambiguation................................... 21 / 159
5.7 A Web Agent to display a tree structure......................... 22 / 168
5.8 A Logic-based Agent (ALSO) ................................................. 23 / 172
5.9 An URU reinforcement learning Intelligent Agent. 23 / 174
5.10 An Agent for words clustering............................................. 23 / 181
CHAPTER VI A PROGRAMMING INTERFACE FOR
REINFORCEMENT LEARNING SIMULATIONS..... 24 / 184
CONCLUSIONS ................................................................................................................................ 28 / 199
REFERENCES ................................................................................................................................ 29 / 200

1

INTRODUCTION

This PhD. thesis is the result of my researches in the field of Artificial
Intelligence, particularly in Machine Learning, started in 1998.
The purpose of this abstract is to present an overview of the most important
aspects of the thesis, including the original results.
Artificial Intelligence, one of the most recent disciplines appeared in the field of
computer science, tries to understand the intelligent entities. But, unlike philosophy and
psychology, which deal with the study of intelligence, too, the aim of Artificial
Intelligence is building and as well understanding intelligent entities.
As nobody can predict the future in detail, computers endowed with a degree of
human intelligence (or more than that) would have a great impact in everyday life and in
the future course of civilization.
So, the field of machine learning represents one of the most recent challenges of
the Artificial Intelligence, opening new perspectives for research in the field, and, of
course, a way for achieving intelligent systems.
In our opinion, the most challenging field of Machine Learning is
Reinforcement Learning, dealing with the problem of improving the behavior of an
artificial agent based on feedback of its performance.
On the other hand, we consider that one of the most suitable fields for exploiting
the intelligence is Natural Language Processing. This is why our researches are
directed, especially, to the fields of Reinforcement Learning and Natural Language
Processing.
The thesis contains 133 references and is structured on six chapters, as follows.
Chapter I, Intelligent Systems, discusses general aspects related to Artificial
Intelligence. The field of Intelligent Agents is presented, a domain of Artificial
Intelligence that offers a new problem solving method and a new way for interaction
between user and computer. Some of the most important (in our opinion) abstract and
concrete architectures for intelligent agents are presented, as well as the new
programming paradigm, agent oriented programming. In the last but one section of the
first chapter the representation based on logic and the reasoning in agent based systems is
discussed. There are presented aspects related to the Software Engineering and the formal
aspects of the agent based systems. The last section of the first chapter tackles with the
problem of machine learning, as an extremely important research direction in Artificial
Intelligence. The general model of a learning agent, aspects related to the learning of a
field, as well as the most important learning strategies are presented.
Chapter II, Mathematical Models for Learning Intelligent Agents, presents three
mathematical models for agents that learn: Markov Decision Processes, Partial
Observable Markov Decision Processes and Hidden Markov Models. In our opinion, the
mathematical models described in this chapter have a very important role in modeling
and designing agent based systems, especially systems that learn by reinforcement (by
interaction with the environment).

2

Chapter III, Reinforcement Learning, tackles with the extremely complex
problem of reinforcement learning, a learning approach that combines two disciplines in
order to solve problems that neither one could solve independently: Dynamic
Programming and Supervised Learning. This chapter presents the particularities of
Reinforcement Learning. There are discussed: the problem of learning based on a utility
function, the passive or active learning in known or unknown environments, the problem
of learning based on temporal differences (the most interesting variant of reinforcement
learning), the problem of Q-learning. The most important action selection mechanisms, as
well as the SARSA algorithm, the most efficient variant for Q-learning, are
comparatively discussed.
Chapter IV, New approaches in Intelligent Systems, presents a few new learning
algorithms (approaches of Reinforcement Learning and learning algorithms that may be
used for Natural Language Processing) as well as a new logic architecture for intelligent
agents (showing the importance of logical approaches for intelligent agents). The learning
algorithms described in this section are: an algorithm for words sense disambiguation, a
real time learning algorithm, a reinforcement learning algorithm with application to
problems of motion for robots in environments with obstacles. Each of the original
approaches from this chapter are comparatively discussed with already existent
approaches, the advantages of the new approaches being illustrated.
Chapter V, Examples of Intelligent Systems, presents practical applications
(concrete architectures for intelligent agents that improve their behavior by learning), i.e.
implementations of the theoretical aspects presented in chapters II and IV. For each such
application we developed experiments that illustrate the efficiency of the approaches. We
also suggested directions for generalization and for future researches.
Chapter VI, A programming interface for reinforcement learning simulations,
presents a standard programming interface for reinforcement learning simulations in a
known environment, both the simulation of reinforcement learning based on utility
function and the simulation of Q-learning being possible. The use of the interface on an
example of movement of an embodied robotic agent in an environment with obstacles, is,
as well, illustrated.
From the original elements presented through out this thesis we mention the
following:
the use of Hidden Markov Models as mathematical models for Intelligent
Learning Agents [SER01a] (section 2.3);
algorithm for words sense disambiguation [SER03e] (section 4.4);
a real time learning algorithm - RTL [SER02a] (section 4.1);
the convergence theorem for the RTL algorithm [SER02a] (Theorem 4.1.1,
section 4.1);
a new Logic Architecture for Intelligent Agents - ALSO [SER02c] (section
4.2);
the reinforcement learning algorithm URU [SER03a] (section 4.3);

3

the convergence theorem for the URU learning algorithm [SER03a]
(Theorem 4.3.1, section 4.3);
Theorems 4.3.2 and 4.3.3: the equilibrium equation of the utilities in the URU
algorithm and, respectively, the error estimation in the URU algorithm
[SER03a] (section 4.3);
a reinforcement learning agent based on dynamic adaptive programming
[SER00b] (section 5.1);
intelligent agent based on training Hidden Markov Models [SER00c] (section
5.2);
a game playing reinforcement learning intelligent agent [SER01b] (section
5.3);
"Know-How" agent based on logic [SER02b] (section 5.4);
real time learning agent [SER02d] (section 5.5);
intelligent agent for words disambiguation [SER03d] (section 5.6);
Web agent for displaying a tree structure [SER01e] (section 5.7);
agent based on logic (ALSO) [SER02c] (section 5.8);
an URU reinforcement learning intelligent agent [SER03a] (section 5.9);
agent for words clustering [SER03b] (section 5.10);
a programming interface for reinforcement learning simulations [SER03c]
(chapter VI).

*
* *
I would like to express my gratitude to my supervisor, Prof. Dr. Militon
FRENŢIU, for his competent guidance, suggestions and encouragement.
I would like to express my thanks to Assoc. Prof. Dr. Doina TĂTAR from the
Department of Computer Science of the Faculty of Mathematics and Computer Science
in Cluj-Napoca for her support along all the elaboration period of the thesis and for our
extremely good cooperation.
I also want to give my thanks to Assoc. Prof. Dr. Horia POP from the same
department for all his competent recommendations.
I am grateful to all the members of Computer Science Department for their
helpful suggestions.

4

CHAPTER I
INTELLIGENT SYSTEMS

Artificial Intelligence (AI) tries to understand intelligent entities. Therefore, a
reason for studying this field is to learn about ourselves.
One question that AI has to answer is the following: how is it possible that a small
and slow, biological or electronic brain, to perceive, understand, predict and manipulate a
world much larger and more complicated than itself. All the researchers should look
themselves in the mirror in order to see an example of an intelligent system.
There are two fundamental research directions in Artificial Intelligence [DUM99].
On one hand, the achievement of machines capable to reproduce the cognitive functions
of the human mind, the so-called cognitive modeling. In this way, the cognitive
psychology is one of the cognitive sciences whose method are taken over by Artificial
Intelligence, in fields as Natural Language Processing, learning, perception. Such
intelligent machines were named in the specific literature as “artificial brains”, the basic
idea being building intelligent machines starting from the models of the neuronal activity.
Such models are known as artificial neuronal networks. These research directions in
Artificial Intelligence define the field of Computational Intelligence (Intelligent
Calculus), its main characteristic being the numerical representation of knowledge.
On the other hand, the achievement of programs that allows the symbolical
processing of information (knowledge) using logical methods, the so-called logical-
symbolical approach (rational thinking). The logical-symbolical tradition of Artificial
Intelligence tries to achieve intelligent systems using deduction mechanisms in order to
obtain a solution of the problem, based on the problem's description in the logical
formalism. This field of the traditional Artificial Intelligence is based on a symbolic
representation of the knowledge and on logical inferences on knowledge.
All the AI problems could be gathered under the concept of intelligent agent. The
AI goal is to describe and build agents that receive perception from the environment and
executes actions.
An agent is an entity that perceives its environment by sensors and acts on this
environment by actions.
An agent is described by:
the architecture part, the so-called behavior of the agent - the action accomplished
after a perceptual sequence;
the program part, the so-called internal part of the agent - how the agent's inside
works.
The goal of AI is to achieve the program part of the agent: a function that
implements the mapping from perception to action.

5

The abstract architectures of intelligent agents described in this chapter are purely
reactive agents, and agents with states. The concrete architectures presented are logic
based architectures, reactive architectures, Belief-Desire-Intention architectures and
layered architectures.
For many AI researchers, the idea of programming computer systems in terms of
mental notions such as belief, desire, and intention is the key component of agent-based
computing. The concept was articulated by Yoav Shoham, in his agent-oriented
programming (AOP) proposal [SHO93]. Thus, Yoav Shoham proposes a “a new
programming paradigm, based on a new point of view on the computational process”.
AOP may be regarded as a kind of „post-declarative‟ programming. In AOP, the
idea is that, as in declarative programming, we state our goals, and let the built-in control
mechanism figure out what to do in order to achieve them. So, it is obvious that, when
specifying such systems, we have to use logical statements that may be formalized using
a logic of knowledge.
“Intelligent agents are ninety-nine percent computer science and one percent AI”
[ETZ96].
The technology of intelligent agents and multi-agent systems seems set to
radically alter the way complex, distributed, open systems are conceptualized and
implemented. The logic based representation and reasoning in agent based systems is an
important topic of the field, with formal methods having an important role [SER02b].
There are two problems related to the formalization of the agent based systems:
how the formal methods could be used for describing and reasoning about agents and
their behavior, and
how formal methods provide a way to show that the implemented agent systems are
correct with respect to their specifications.
The field of intelligent agents is related to another field of AI, the field of
machine learning. Machine learning represents the study of system models that, based on
a set of training data, improve their performance by experiences and by learning some
specific experimental knowledge.
The attempt of modeling the human reasoning leads to the concept of intelligent
reasoning [FLA94]. Reasoning is the process of conclusion deduction; intelligent
reasoning is a kind of reasoning accomplished by humans. Most of the AI systems are
deductive, able to make inferences (draw conclusions), given their initial or supplied
knowledge, without being able of new knowledge acquisition or to generate new
knowledge.

6

CHAPTER II
MATHEMATICAL MODELS FOR INTELLIGENT
LEARNING AGENTS

Formally, an agent based system is represented as a 4-tuple S , A, h,U , where
S si is the space of the agent's states (the environment), A ai is the space of
i i

the agent's actions (the agent's behavior), h is the transition function between the states,
and U (u1, u 2 ,...,u n ) is an array representing the utility function of the agent ( u i is the
utility of the state s i ).
Most of the computational models of reinforcement learning are based on the
assumption that the environment the agent interacts with can be modeled as a Markov
Decision Process, defined as follows:
1. both the agent and the environment may be modeled as finite syncronized automata
(with a finite number of states);
2. the agent and the environment interact in a discrete time;
3. the agent could perceive the environment's states and use them in order to select its
actions;
4. after the agent acts, the environment makes a transition to a new state;
5. the agent receives a reward after each action performed.
We say that the environment is a Markov Decision Process (satisfies the Markov
property) if the environment's response (signal) at the moment t+1 only depends on the
state and action at the moment t.
It is well known that the Markov Decision Processes are very important in the
reinforcement learning theory (they represent 90% of modern reinforcement learning).
In many situations the environment cannot not be “labeled” beforehand with
appropriate states, and the environment's states are not completely accessible to the agent.
Thus, the environment is continuous and partial observable. Partially Observable Markov
Decision Processes were developed specifically to deal with this problem. The use of
mathematical statistic-methods represents a leading topic in the field of Artificial
Intelligence.
In the last section of this chapter we present how Hidden Markov Models may be
used as a mathematical tool for modeling the environment of intelligent agents
[SER01a].
The Hidden Markov Model (HMM) is a generalization of Markov decision
processes, more transitions (arrows) from a state for a given action being possible.
Therefore, in an HMM we may have more paths covered for the same sequence of

7

actions, which implies that P(a1, n ) (which is the probability to have as input a sequence
made up of n actions, a1, a2 ,...,an , shortly written as a1, n ) is computed as the sum of the
probabilities on all the possible paths. The probability on a given path is computed by
multiplying the probabilities on each segment (transition) of that path.
Definition. An HMM is a 4-tuple s1, S , A, P , where S is a (finite) set of states,
s1 S is the initial state, A is a set of input symbols (actions), and P is a set of transitions
(labeled arrows).
A transition is defined as a 4-tuple: (si , s j , a k , p) , representing passing from
state s i to state s j for input (action) a k , transition evaluated as having the probability p.
As for a given input sequence we have more possible paths, the sequence of states it has
been passed through is not deductible from the input, but hidden (this gives the name of
the model we focus on).
In the last section of this chapter, we present the way intelligent agents can be
modeled using a Hidden Markov Model and the way they can be trained using Viterbi's
algorithm [JUR00] and Baum-Welch algorithm [JUR00].
In our opinion, Hidden Markov Models are very appropriate to be used in fields
as Natural Language Processing, writing and speech recognition, and also in motion
problems for robots in environments with perturbations [SER01a].

8

CHAPTER III
REINFORCEMENT LEARNING
Reinforcement learning is learning the mapping from situations to actions in order
to maximize a scalar reward or a reward signal (function). The learning agent does not
know a priori what actions to perform, as in most of learning cases, but it has to discover
alone what actions will lead to a maximum reward and try these.
In most interesting and challenging cases, the actions could affect not only the
immediate reward, but all the subsequent situations, and consequently all the subsequent
rewards. These two characteristics - trial-error searches and delayed rewards are the most
important particularities of reinforcement learning.
Reinforcement Learning can be considered, at the same time, a new but a very old
topic in Artificial Intelligence. The term was first used by Minsky (1961) [MIN61] and
independently in control theory by Waltz and Fu (1965) [WAL65].
The first relevant learning research in this field is the checkers player of Samuel
(1959) [SAM59], which uses learning based on temporal differences to handle the
rewards. Of course, learning and reinforcement were studied by psychologists for more
than a century, and these researches had a great impact on AI and engineering activities.
In this chapter, we present the way agents can learn in much less generous
environments, where the agent receives no examples, and starts with no model of the
environment and no utility function.
Reinforcement Learning (RL) is the way to improve the behavior of an agent,
given a certain feedback about its performance.
Reinforcement Learning [HAR00] is an approach to machine intelligence that
combines two disciplines to successfully solve problems that neither discipline can
address individually. Dynamic Programming is a field of mathematics that has
traditionally been used to solve problems of optimization and control. However,
traditional dynamic programming is limited in the size and complexity of the problems it
can address.
Supervised learning is a general method for training a parameterized function
approximator, such as a neural network, to represent functions. However, supervised
learning requires sample input-output pairs from the function to be learned. In other
words, supervised learning requires a set of questions with the right answers.
Unfortunately, we don‟t often know the correct answers that supervised learning
requires. For these reasons there has recently been much interest in a different approach
known as reinforcement learning (RL). Reinforcement learning is not a type of neural
network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach
that addresses a different, more difficult question. Reinforcement learning combines the

9

fields of dynamic programming and supervised learning to yield powerful machine-
learning systems. Reinforcement learning appeals to many researchers because of its
generality. In RL, the computer is simply given a goal to achieve. The computer then
learns how to achieve that goal by trial-and-error interactions with its environment.
In a reinforcement learning problem, the agent receives a feedback, known as
reward or reinforcement; the reward is received at the end, in a terminal state, or in any
other state, where the agent has exactly information about what he did well or wrong.
A reinforcement learning problem has three fundamental parts [HAR00]:
the environment – represented by “states”. Every RL system learns a mapping from
situations to actions by trial-and-error interactions with a dynamic environment. This
environment must at least be partially observable by the reinforcement learning
system;
the reinforcement function – the “goal” of the RL system is defined using the concept
of a reinforcement function, which is the exact function of future reinforcements the
agent seeks to maximize. In other words, there exists a mapping from state/action
pairs to reinforcements; after performing an action in a given state the RL agent will
receive some reinforcement (reward) in the form of a scalar value. The RL agent
learns to perform actions that will maximize the sum of the reinforcements received
when starting from some initial state and proceeding to a terminal state;
the value (utility) function – explain how the agent learns to choose “good” actions, or
even how we might measure the utility of an action. Two terms are defined: a policy
determines which action should be performed in each state; a policy is a mapping
from states to actions. The value of a state is defined as the sum of the reinforcements
received when starting in that state and following some fixed policy to a terminal
state. The value (utility) function would therefore be the mapping from states to
actions that maximizes the sum of the reinforcements when starting in an arbitrary
state and performing actions until a terminal state is reached.

10

CHAPTER IV
NEW APPROACHES IN INTELLIGENT SYSTEMS

In this chapter we present some original approaches of Intelligent Systems: new
learning algorithms (for reinforcement learning, learning algorithms with use in Natural
Language Processing) and a new logic architecture for Intelligent Agents (showing the
importance of logic based approaches in this field).

4.1 A new real time learning algorithm (RTL)

All Artificial Intelligence problems require some sort of search [WEI99], and thus
search represents an important issue in Artificial Intelligence. Search algorithms are
useful for problem solving by intelligent (single or multiple) agents.
In this section we propose an original algorithm (RTL) [SER02a], which extends
the Learning Real-Time A* (LRTA*) algorithm [KOR90], used for solving path-finding
problems.
This algorithm preserves the completeness and the characteristic of LRTA* (a
real-time search algorithm), providing a better exploration of the search space.
LRTA* is a learning version of the A* algorithm [KOR92], a standard search
algorithm in Artificial Intelligence. As we all know, the major problem of A* is, the large
computational complexity.
LRTA* has two major problems:
1. What happens if the search space is a cyclic graph?
2. What happens in some plateau situations - states with more successor
(neighbor) states having the same minimum h' value (the estimated distance
from the current state to the goal state)? The choice of the next action is
nondeterministic.
RTL proposes alternatives for solving these problems.
The proposed solutions are:
1. We maintain a record of the visited nodes, but we do not save the h' values for
each node.
2. In order to choose the next action in a given state, the agent determines the set
S of states not visited by the agent and with a minimum h’ value. If S is
empty, the training fails, otherwise, the agent chooses a state from S using an
action selection mechanism (this allows a better exploration of the search
space).
The idea of our algorithm is the following:

11

through repeated training episodes, the agent tries some paths (possible
optimal) to a goal state, and memorizes the shortest;
the number of trials is selected by the user;
after a training trial there are two possibilities:
the agent reaches a goal state; in this case the agent saves the path and
it's cost;
the learning process fails (the agent does not reach the final state,
because it was blocked).

Theorem 4.1.1. [SER02a] If the values of the heuristic function (the initial h’
values) are admissible (does not overestimate the real values), then the RTL algorithm is
convergent.
At the end of this section we give some directions for future research concerning
the RTL algorithm.

4.2 A new Logic Architecture for Intelligent Agents (ALSO)

The logic approach is a topic of Symbolic Artificial Intelligence and has its own
importance for intelligent agents, even if the controversy between the traditional
approach and intelligent calculus in Artificial Intelligence is well known.
Moreover, the only intelligence requirement we generally make for the agents is
that [WOO97] they can make an acceptable decision about what action to perform next in
their environment, in time for this decision to be useful. Other requirements for
intelligence will be determined by the domain in which the agent is applied: not all agents
will need to be capable of learning, for example.
In such situations, a logic architecture is very appropriate, and offers, in our
opinion, a simple and elegant representation for the agent's environment and desired
behavior.
The most important disadvantage of the traditional logic architecture is the large
complexity of the computational process.
In this section we present a new logic architecture for intelligent agents (ALSO - a
logic architecture based on stacks of goals) [SER02c]. This architecture combines the
traditional logic architecture with a planning architecture [RIC91].
We consider the case of an agent whose goal is to solve a given problem (pass
from an initial to a final state), based on a set of operators (rules) that may be applied on a
given moment.

12

In an ALSO architecture, we will use the declarative representation of
knowledge.
Each operator will be described by a list of new predicates that are made true by
the operator and a list of old predicates that are made false by the operator. The two lists
are ADD, respectively DELETE. Moreover, for each operator a third list is specific,
PRECONDITION, which contains all the predicates that must be true in order to apply
the operator. The frame axioms are implicitly specified in ALSO. Each predicate not
included in the ADD or DELETE lists of a certain operator is not affected by that
operator.
The idea of the algorithm is to use a stack of goals (a unique stack that contains
both goals and operators proposed for solving those goals). The problem solver is also
based on a database that describes the current situation (state) and a set of operators
described by the PRECONDITION, ADD and DELETE lists.
As a case study we illustrate the use of the ALSO algorithm on a maze searching
problem.
The Logic Architecture based on a stack of goals improves the traditional logic
architecture for intelligent agents in the following directions:
Comparatively to the traditional logic approach, where the number of the
frame axioms that should be saved in order to make a correct inference is
large, the proposed architecture reduces this number. In other words, the space
complexity is reduced.
Because of a limited Depth First search, the time complexity is reduced, too.
The representation is very simple and elegant.
At the end of this section we give some extension possibilities of the ALSO
architecture.

4.3 A new reinforcement learning algorithm (URU)

In this section we propose an original algorithm (URU - Utility-Reward-Utility)
[SER03a], namely a temporal difference reinforcement learning algorithm. This
algorithm may be used for path finding problems in a rectangular environment. We also
prove the convergence of this algorithm.
An important problem in robotics is to generate a path between an initial and a
final configuration, so that the robot could avoid some obstacles. This problem is called a
robot path-finding problem.
Many approaches to solve this problem were proposed in Artificial Intelligence
and Computational Geometry [BRA82], [TOR90]. All these approaches are based on
planning. The time complexity of these approaches grows exponentially with the number

13

of moving degrees of the robot [CAN88]. We consider that heuristic approaches (section
4.1) or learning approaches (section 4.3) are important, because they contribute to
reducing the time complexity of the problem.
We consider the environment represented as a space of states (each state is
characterized by its position in the environment - two coordinates specifying the X-
coordinate, respectively the Y-coordinate of the current position). The goal of the agent is
to learn to move from an initial to a final state, on a shortest path (considered as the
transitions number between states).
The notations used in what follows are:
the environment, represented as a space of states: M {s1, s2 ,...,sn } ;
si M , sf M : the initial, respectively the final state (the problem could be
generalized for a set of final states);
h:M (M ) : the transition function between the states, having the following
meaning: h(i) { j1, j2 ,..., jk } if, at a given moment, from the state i the agent
could move to one of the states j1, j2 ,..., jk ; a state j accessible from the state i, in
other words j h(i) , is called neighbor state of the state i;
the transition probabilities between a state i and any neighbor state j are
1
P(i, j ) .
| h(i) |
To solve this problem, we propose a reinforcement learning algorithm, based on
learning the states' utilities (values), in which the agent receives rewards from
interactions with the environment.
The algorithm, based on Temporal Differences, works on the following idea:
the agent starts with some initial estimates of the states‟ utilities;
during certain training episodes, the agent will experiment some possible optimal
paths from si to sf, updating, properly, the states' utilities estimations;
during the training process the states' utilities estimations converge to their exact
values, and, thus, at the end of the training process, the estimations will be in the
neighborhood of the exact values.

We make the following notations:
U(i) - the estimated utility of the state i;
R(i) - the reward received by the agent in the state i.

We make the following specifications:
the training process during an episode has the worst case complexity O(n2),
where n is the number of environment states;

14

in a training sequence, the agent updates the utility of the current state using
only the selected successor state, not all the successors (the temporal
difference characteristic).

Case study

If we consider the reward function defined as r(s)=-1 if s sf, and r(s)=0,
otherwise, it is obvious that the goal of the learning process is to minimize the transitions
number from the initial to the final state (the agent learns to move on the shortest path).
We consider that the environment has the following characteristics:
the environment has a rectangular form;
at any given moment, from any given state, the agent can move in only four
directions: North, South, East, West.
In what follows, we state an original theorem that gives the convergence
conditions of the URU algorithm.

Theorem 4.3.1. [SER03a] Let us consider the learning problem described above
and which satisfies the conditions:
the initial values of the states' utilities are computed as
U ( s) d (s, sf ) 2, s M
where d(s1,s2) represents the Manhattan distance between the states s1 and s2;
1
the reward factor ,
3
In this case, the URU algorithm is convergent (the states' utilities are convergent
after the training sequence).

We denote by U n (i ) the estimated utility of state i at the n-th training episode.
The proof of Theorem 4.3.1 is based on the following lemmas:

Lemma 1. [SER03a] At the n-th training episode of the agent the following
inequality holds:
U n (i) 2, i M
Lemma 2. [SER03a] At the n-th training episode of the agent the following
inequality holds:
| U n (i) U n ( j ) | 1
for each transition from i (i sf) to j made by the agent in the current training sequence.

15

Lemma 3. [SER03a] The inequality Un 1 (i) Un (i) holds for every i M and
n N , in other words the states' utilities increase from a training episode to another.

Theorem 4.3.2 gives the equilibrium equation of the states utilities after the URU
algorithm has been run.

Theorem 4.3.2. [SER03a] In our learning problem, the equilibrium equation of
the states' utilities is given by the following equation:
* *
UURU (i) ri UURU ( j ), i M
card(h(i)) j succesor al lui i
*
where UURU (i) represents the exact utility of the state i, obtained after the URU algorithm
has been run. We note by card(M) the number of elements of the set M.

Comparatively to the classical TD algorithm, we make the following remarks
(which naturally come from the theoretical results given):
the states' utilities grow faster in the URU algorithm than in the TD algorithm;
in other words UURU (i) U DT (i), i M ;
in the case of our learning problem, as we proved in Theorem 4.3.1, for 1
(the TD algorithm), we cannot prove the convergence of the states' utilities.

Further work is planned to be done in the following directions:
to analyze what happens if the transitions between states are nondeterministic
(the environment is a Hidden Markov Model [SER01a]);
to analyze what happens if the reward factor ( ) is not a fixed parameter, but
a function whose values depend on the current state of the agent.
to develop the algorithm for solving path-finding problems with multiple
agents.

4.4 A new algorithm for words sense disambiguation

We introduce in this section an original algorithm (BA) [SER01c] which
combines the elements of Yarowsky‟s algorithm [YAR99, RES98, YAR01, YAR95] and
from the Naive Bayes Classifier (NBC).
This algorithm preserves the advantage of principles of Yarowsky (one sense per
discourse and one sense per collocation) with the known high performance of NBC

16

algorithms. The algorithm learns to make predictions based on local contexts, starting
from a reduced set of labeled contexts (with meanings) and from a bigger set of unlabeled
contexts.
The two important characteristics of the BA algorithm are:
the set of words used as contextual features for the disambiguation - as the number of
elements from this set grows, as the disambiguation's precision grows (this is
experimentally shown in section 5.6);
the relation ( Wx (W ) , W being the set of words, and (W ) being the set of
parts of W, particularly a set of contexts) - using this relation grows, also, the
disambiguation's precision.

17

CHAPTER V
IMPLEMENTATIONS OF INTELLIGENT SYSTEMS

5.1 A reinforcement learning Intelligent Agent based on Adaptive
Dynamic Programming

In this section, we present an Intelligent Agent [SER00b] that learns by
reinforcement using an Adaptive Dynamic Programming method (section 3.2).
We consider a simple Markov decision process with n2 states. The state space can
be visualized using a nxn grid. Each square represents a state. The reinforcement function
is constant -1 (i.e., the agent receives a reinforcement of -1 on each transition). There are
4 actions possible in each state: North, South, East, and West. The goal states are the
upper left corner and the lower right corner.
The program that implements the agent with the functionality described above has
been realized in two versions: in Borland Pascal and Visual Prolog.

5.2 An Intelligent Agent based on training Hidden Markov
Models

In this section we want to test a system represented as a Hidden Markov Model
(section 2.3). We propose an agent for characters recognition [SER01a].
Let us the consider the example of an agent that has to recognize two characters
“I” and “U”. We propose the model described in Figure 5.1.

Figure 5.1. The Hidden Markov Model

18

In this section we present how the Hidden Markov Model can be trained to
recognize the two characters and how the recognition of other characters works.
The agent implementation has been written in Microsoft Visual C++6.0.
In the case of the proposed experiment, some future works may be:
how the proposed model could be generalized for as many characters as
possible;
how the structure of the HMM could be dynamically generated;
when such probabilistic methods are more suitable than other character
recognition methods;
how probabilistic methods could be combined with others (heuristics or
otherwise) for obtaining a higher performance of the model;
what would happen in certain "plateau" situations (where a given entry would
have the same probability in more environments).
This section aimed to emphasize another way of working on problems of training
intelligent agents.

5.3 A game playing Reinforcement Learning Intelligent Agent
("Agent 21")

In this section we design an Intelligent Agent [SER01b] who learns (by
reinforcement) to play the "Game 21", a simplified version of the game "Black -Jack".
The algorithm used for training the agent is the SARSA algorithm (section 3.6).
We propose a game with two players: a dealer (which distributes the cards) and an
agent, in fact the computer (which uses the SARSA algorithm for learning the game's
rules and which plays against the dealer). The agent tries to win, obtaining a score (the
sum of the cards' values) less than or equal to 21, but greater than the dealer's score.
The probabilistic nature of this game makes it an interesting problem for testing
learning algorithms, even if learning of a good playing strategy is not obvious.
Learning from a teacher (the supervised learning) is not useful in this game, as the
output data for a given state are unknown. The agent has to explore different actions and
to develop a strategy by selecting and maintaining the actions that maximize its
performance.
The purpose of this application is to study the behavior of an agent, that initially
knows neither the rules nor the game's goal, and that follows the way for assimilating
these goals.
After learning, the agent (the computer) wins about 54% of the rounds (Figure
5.2). This is due to the fact that “Game 21” is a random game. The agent tries to manage,
as well as it is able to, the cards received from the dealer.

19

As a remark, even if the game has a strong probabilistic character, after the
training rounds the agent plays approximately identically or maybe even better than a
human player (with a random playing strategy).

Figure 5.2. Percentage of the won games for the trained agent

5.4 "Know-How" Agent

This section refers to the implementation of the “Know-how” concept (section
1.3.4) in a Multi-Agent System (system with several non-cooperative agents) [SER02b].
Because of the strong logical formalism of the “Know-How” concept, the
application was written in Visual Prolog.

5.5 A real-time learning Intelligent Agent

In this section we describe an Intelligent Agent [SER02a] who learns to find the
minimum number of transitions needed for moving (in a space of states) between an
initial and a final configuration, using the RTL algorithm (section 4.1).
The use of A* algorithm for solving such problems is well known. The A*
algorithm, an informed search algorithm has the major disadvantage of the exponential
complexity of the execution time.

20

Thus, a learning version is very suitable, since it reduces the algorithm‟s
complexity.
We propose an experiment in order to illustrate the advantages of the RTL
algorithm.
The application is written in Borland C and implements the behavior of an
Intelligent Agent whose aim is to find the way out of a maze on a shortest path.
Comparatively, we have also used the LRTA* algorithm, and we noticed that,
because of the insufficient exploration of the environment, the agent fails.

5.6 An agent for words disambiguation

In this section we describe an Intelligent Agent [SER03e] who learns to
disambiguate a polysemic word (to determine the meaning of the word in a set of
contexts).
The application is written in JDK 1.4 and uses the disambiguation algorithm BA
described in section 4.4.
We made two experiments: one for the English language and one for the
Romanian language. The second experiment is very significant, because for the
Romanian language something similar with WordNet does not exist and the BA
algorithm only requires information that can be extracted from untagged corpus.
Our aim is to use the BA algorithm for the Romanian language, to disambiguate
the word poarta in some contexts obtained with an on-line corpus tool (we use a
Romanian corpus).
We make the following statements:
the target word poarta has, in Romanian language, four possible meanings
(two nouns and two verbs);
we test our algorithm starting with 38 contexts for the target word;
we start with four words as contextual features for the disambiguation (a
single feature for each sense).
As an evaluation measure of the algorithm we use the accuracy (5.1).

number of contexts correctly solved
A= (5.1)
total number of contexts that have to be solved

The accuracy of the BA algorithm in the proposed experiment is 60%.
We realised an experimental comparison between the Naive Bayes Classifier and
the BA algorithm with and without the relation (section 4.4).

21

The comparative experimental results are shown in Figure 5.3. In Figure 5.3, we
give a graphical representation of the accuracy/context for each algorithm. More exactly,
for any given algorithm, for the i-th context we represent the accuracy (5.1) of the
algorithm for the first i contexts.
From Figure 5.3, it is obvious that the most efficient is the BA algorithm with the
relation (the algorithm's accuracy grows with the number of contexts).

100
90 BA with relation
Accuracy (%)

80
70
60 NBC Algorithm
50
40
30 BA Algorithm without
20
10 relation
0
11
16
21
26
31
36
1
6

Contexts

Figure 5.3. Experimental comparison between the NBC and BA algorithms

We make the following remarks:
in the above experiment we grow the number of occurrences for each meaning
of the target word and we remark that with two occurrences for each meaning
the algorithm's accuracy grows with 10%, with three occurrences for each
meaning the accuracy grows with 15%;
we increase the number of input contexts (100) and we remark that the
algorithm's accuracy grows with 15%.
As a conclusion, if the number of words used as contextual features for the
disambiguation and the number of contexts grows, the accuracy of the BA algorithm
grows, too.

5.7 A Web Agent to display a tree structure

In this section we present an Intelligent Agent [SER01e] aimed to browse a
structure of remote files. It is an agent whose environment is a soft environment, with the
sensors as software functions and the actions are software actions.

22

5.8 A Logic-based Agent (ALSO)

In this section we describe the implementation part of a Logic Agent based on a
stack of goals [SER02c], whose architecture has been presented in section 4.2.
Because the above described architecture is based on logic and because the
algorithm described in section 4.2 needs backtracking to find all the solutions, the
implementation was written in Visual Prolog. The declarative programming languages
(as Prolog) have a built-in control mechanism which allows finding all the solutions of a
problem.
We have to say that the stack (stacks) of goals that we have to create for applying
the algorithm are implicitly memorised by the control strategy of Prolog (a mechanism
which allows backtracking).

5.9 An URU reinforcement learning Intelligent Agent

In this section we describe an Intelligent Agent [SER03a] that learns to find the
minimum number of transitions (steps) for moving between an initial and a final
configuration, using the URU algorithm (described in section 4.3).
The application is written in JDK 1.4. We develop an experiment for an agent
that moves in a rectangular maze, illustrating the theoretical results obtained in section
4.3.

5.10 An Agent for words clustering

In this section we describe an Intelligent Agent [SER03b] who learns to cluster
[MAN99] a set of words, starting from a set of contexts, from a set of words having a
maximum frequency in the given contexts.
We propose an experiment for the Romanian language; we ran the algorithm on
10000 contexts and 200 words, with very good results.
We mention that this application is used as a part of a Question Answering (QA)
system for the Romanian language [SER03f].

23

CHAPTER VI
A PROGRAMMING INTERFACE FOR REINFORCEMENT
LEARNING SIMULATIONS

In this chapter we propose an original programming interface for reinforcement
learning simulations in known environments [SER03c]. Using this interface, simulations
are possible both for reinforcement learning based on the states utilities and learning
based on actions values (Q-learning).
The interface is realized in JDK 1.4, and is meant to facilitate software
development for reinforcement learning in known environments.
There are three basic objects: agents, environments and simulations.
The agent is the learning agent and the environment is the task it interacts with.
The simulation manages the interaction between agent and environment, collects data and
manages the display, if any.
Generally, the inputs of the agent are perceptions about the environment (in our
case states from the environment), the outputs are actions, and the environment offers
rewards after interacting with it.
Figure 6.1 illustrates the interaction between the agent and the environment in a
reinforcement learning task.

Agent

Perception Reward Action

Environment

Figure 6.1. Interaction between the Agent and the Environment

The reward is a number; the environment, the actions and perceptions are
instances of classes derived from the IEnvironment, IAction and IState interfaces
respectively. The implementation of actions and perception may be arbitrary as long as
they are properly understood by agent and environment. The agent and the environment
have, obviously, to be chosen to be compatible with each other.
The interaction between agent and environment is handled in discrete time. We
assume we are working with simulations. In other words there are no real-time

24

constraints enforced by the interface: the environment waits for the agent while the agent
is selecting its action and the agent waits for the environment while the environment is
computing its next state.
We assume that the agent's environment is a finite Markov Decision Process.
To use the interface, the user has to define the specialized object classes HisState,
HisEnvironment and HisAgent, by creating instances for each. The agent and the
environment are then passed to a simulation object (CSimulation), that initializes and
interconnects them. Then, CSimulation::init() will initialize and run the simulation.
If the agent learns the states' utilities, it has to be derived from the AgentUtility
class, otherwise, if it learns the actions' values (Q-learning) it has to be derived from the
AgentQValues class.
RLAgent is an abstract class, the basic class for all the agents. The specific agents
will be instances of subclasses derived from RLAgent.
AgentUtility is an abstract class, a subclass of the class RLAgent, being the entity
defining the behavior of an agent that learns by reinforcement based on the states' utilities
(the algorithm used for learning is the URU algorithm described in section 4.3
[SER03a]).
AgentQValues is an abstract class, a subclass of the class RLAgent, being the
entity defining the behavior of an agent that learns by reinforcement based on actions'
values (the algorithm used for learning is the SARSA algorithm [SUT98]).
IEnvironment is an interface, the basic class for all environments. The specific
environments will be instances of subclasses derived from IEnvironment.
CSimulation is the basic object of the interface, that manages the interaction
between the agent and the environment. It defines the heart of the interface, the uniform
usage that all agents and environments are meant to conform to. An instance of the
simulation class is associated to an instance of an agent and an environment at the
creation moment. This is made in the constructor of the class CSimulation.
We illustrate the use of the interface on a concrete example: the motion problem
of an embodied agent whose goal is to find the way out from a maze with obstacles.
The UML diagram for the interface is described below.

25

CSimulation Interface
IEnvironment
a: RLAgent
m: IEnvironment
+ isValid(s: IState): Boolean
+ CSimulation(a: RLAgent, m: IEnvironment)
+ initial(): IState
+ init(alpha: Real, gamma: Real, epsilon: Real, episoade: Integer): void
+ final(s: IState): Boolean
+ policy(ps: PrintStream)
+ next(s: IState, a: IAction, r: Integer): IState
+ value(s: IState): Real

Abstract
RLAgent
MyEnvironment
Interface # a: IList
IList # l: IList

+ abstract actions(): void
+ add(o: Object): void + abstract choose(e: IElement, r: Real, epsilon: Real, m: IEnvironment): IElement
+ delete(i: Integer): void + abstract next(s: IState, m: IEnvironment): QValues
+ dim(): Integer + abstract initial(s: IState): IElement
+ elem(i: Integer): Object + learning(alpha: Real, gamma: Real, epsilon: Real, episodes: Integer) : void
# notVisited(s: IState): Boolean
# value(e: IElement, m: IEnvironment): Real
appear(e: IElement): Boolean
add(e: IElement, m: IEnvironment): void
init(s: IState): void
CList update(e: IElement, eurm: IElement, r: Integer, alpha: Real, gamma: Real): void

list: ArrayList

RLAgent

AgentUtility AgentQValues

next(s: IState, r: Integer, epsilon: Real, m: IEnvironment): IElement

MyAgent Abstract
IElement
Interface # o: Boolean Interface
IState # s: IState IAction
# a: IAction

+ toString(): String + IElement(s: IState) + toString(): String
+ equal(s: IState): Boolean + IElement(s: IState, a:IAction) + equal(a: IAction): Boolean
+ state(): IState
+ action(): IAction
+ value(): Real
+ retO(): Boolean
+ setValue(v: Real): void MyAction
+ setAction(a: IAction): void
MyState + setO(o: Boolean): void
+ toString(): String
+ abstract equal(e: IElement): Boolean

Utility QValues

+ Utility(s: IState) + QValues(s: IState)
+ Utility(s: IState, a: IAction) + QValues(s: IState, a: IAction)
27

CONCLUSIONS

The aim of this work is to be a supporting evidence to the idea that the field of
agents, and especially of intelligent agents is considered an important research and
development area in the field of Computer Science and of Artificial Intelligence.
On the other hand, learning problems represent an important research direction in
Artificial Intelligence, machine learning, being the study of system models that, based on
an input data set (training set) improve their performances by experience and by learning
some specific knowledge.
Reinforcement Learning, seen as a method to solve problems of machine learning,
based on new knowledge acquisition [BER91], is one of the ways that may form the basis
for proving the machines' performances, being, in the same time, a field open to future
researches.
Because of the potential to eliminate manual encoding of control strategies,
reinforcement learning is one of the most active research domains in machine learning.
Reinforcement learning in unknown environments is a current research topic in this field.
The aim of the current work is to realise a presentation of the main results related
to reinforcement learning, as well as the design and implementation of intelligent agents,
to offer support to understand:
why agents are considered a new important way in conceptualization and
implementation of a different kind of applications;
what intelligent agents are (and what they are not) and how they are related to other
programming paradigms, the object oriented programming;
why learning, and reinforcement learning, particularly, represent an extremely novel
and dynamic field in Artificial Intelligence.
The work presents some new approaches in Intelligent Systems (new learning
algorithms), presenting possible generalizations, open problems, opening, in this way,
new research directions in Machine Learning.
The study of learning was focussed in this thesis on the model of systems with a
single agent, in known environments. As future research we will consider the case of
unknown environments and systems with more agents (multiagent systems).

REFERENCES

[ALE95] Allen, J., Natural language understanding, Benjamin/Cummings Publ.,
2nd ed., 1195
[ANT96] Antos, A., Lugosi, A., Strong MiniMax lower bounds for learning,
Proceedings of the 9th annual Conference on Computational Learning
theory, 1996, pp.303-309
[BAR91] Barto, A.G., Bradtke, S.J., Singh, S.P., Real time learning and control
using asynchronous dynamic programming, (Technical Report 91-57),
Amherst, MA: University of Massachusetts, Computer Science
Department
[BAR99] Barber, K.S., Kim, J., Constructing and dynamically maintaining
perspective-based agent models in a MA Environment, Proceedings of the
3rd annual Conference on Autonomous Agents, 1999, pp.416-417
[BER91] Bergadano, F., Giordana, A., Machine Learning, Ellis Horwood
Limited, Chichester, England, 1991
[BET93] Betke, M., Rivest, R., Singh, M., Piecemeal Learning of an Unknown
Environment, AI Memo, No. 1474, CBCL Memo No. 93, 1993
[BOO88] Booker, L., B. Classifier systems that learn world models, Machine
Learning, Number 3, 1988, pp.161-192
[BOO94] Booch, G., Object-Oriented Analysis and Design, Addison-Wesley:
Reading, MA, 1994
[BRA82] Brady, M., Hollerbach, J.M., Robot motion: planning and control,
Cambridge MA, MIT Press, 1982
[BR086] Brooks, R.A,, A robot layered control system for a mobile robot, IEE
Journal of Robotics and Automation, 2(1), pp.14-23, 1986
[CAN88] Canny, J.F., The complexity of robot motion planning, Cambridge MA,
MIT Press, 1988
[CAR79] Carthy, J., Ascribing mental qualities to machines, Philosophical
Perspectives in Artificial Intelligence, Harvester Press, 1979
[CÂR94] Cârstoiu, D. Sisteme expert, Editura ALL, Bucureşti, 1994
[CAS94] Cassandra, A.R., Kaelbling, L.P., Littman, M.L., Acting Optimally in
Partial Observable Stochastic Domains, Proceedings, 7th AAAI-94,
Seattle, Washington, 1994
[CHA95] Chaib-Draa, B., Industrial Applications of Distributed AI,
Communications on the ACM, Vol. 38, No.11, 26 Octombrie, 1997
[CHA96] Charniak, E. Statistical language learning, The MIT Press, 1996
[CHE80] Chellas, B., Modal Logic: An Introduction, Cambridge University Press:
Cambridge, England, 1980
[CHI99] Chiampolini, A., Laerma, E., Mello, P., Abductive Coordination for
logic agents, Proceedings of the 1999 ACM symposium on Applied
computing, 1999, pp.134-140
[COB02] http://titania.cobuild.collins.co.uk/form.html
[DAE00] Daelemans, W., Zavrel, J., Tilburg Memory Learner - version 3.0,
University of Antwerp, http://ilk.kub.nl, 2000

2

[DAY92] Dayan, P., The convergence of TD( ) for general , Machine Learning,
Number 8, 1992, pp.341-362
[DEN87] Dennett, D.C., The Intentional Stance, The MIT Press: Cambridge. MA,
1987
[DUM95] Dumitrescu, D., Inteligenţă Artificială, Litografie, Universitatea "Babeş-
Bolyai" Cluj-Napoca, 1995
[DUM99] Dumitrescu, D., Principiile Inteligenţei Artificiale, Ed. Albastră, Cluj-
Napoca, 1999
[ESC00] Escudero, G., Marquez, L., Rigau, G. Boosting applied to WSD,
ACML, Barcelona, Spain , 2000
[ETZ96] Etzioni, O., Moving up the information food chain: Deploying softbots on
the world-wide web, Proceedings of the Thirteenth National Conference
on Artificial Intelligence (AAAI-96), Portland. OR, 1996
[FIE94] Fiechter, C.N., Efficient Reinforcement Learning, Proceedings of the 7th
annual ACM Conference on Computer Learning theory, 1994, pp.88-97
[FIS93] Fisher, M., Concurrent MetateM, A language for modeling reactive
systems, Proceedings of Parallel Architectures and languages Europe
(PARLE), Springer Verlag, 1993
[FIS93] Fisher, M., Wooldrige, M., Specifying and Verifying Distributed
Intelligent Systems, in Progress in Artificial Intelligence, Proceedings of
Portuguese Conference on Artificial Intelligence (EPIA), Porto, Portugal,
1993, Springer Verlag as "Lecture notes in Computer Science", vol. 727
[FIS95/1] Fisher, M., Towards a semantics for Concurrent MetateM, in Executable
Modal and Temporal Logics, Springer Verlag, Lecture Notes in Artificial
Intelligence, vol. 897, 1995
[FIS95/2] Fisher, M., Representing and Executing Agent-based Systems, Springer
Verlag, Lecture Notes in Computer Science, vol. 890, 1995
[FLA94] Flach, P., Simply Logical Intelligent reasoning by Example, John Wiley &
Sons, Chichester, England, 1994
[FLO94] Florea, A., Boangiu, A., Inteligenţă Artificială, Editura Universităţii
"Politehnica", Bucureşti, 1994
[FRE94] Frenţiu, M., Pârv, B., Elaborarea Programelor. Metode şi tehnici
moderne, Editura Pro-Media, Cluj-Napoca, 1994
[GAR95] Gârbacea, I., Andonie, R., Algoritmi fundamentali – O perspectiva C++,
Ed. Libris, Cluj-Napoca, 1995
[GEN94] Genesereth, M.R., Ketchpel, S.P., Software agents, Communications of
the ACM, 37(7), 1994 , pp.48-53
[GRE90] Grefenstette, J.J., Ramsey, C.L., Schultz, A.C., Learning sequential
decision rules using simulation models and competition, Machine
Learning, Number 5, 1990, pp.355-382
[GRE93] Grefenstette, J., Genetic Algorithms and Machine Learning, Proceedings
of the 6th annual ACM Conference on Computational Learning theory,
1993, pp.3-4
[HAD96] Haddadi, A., Communication and Cooperation in Agent Systems, LNAI
Volume 1056, Springer-Verlag: Berlin, Germany, 1996

3

[HAY83] Hayes-Roth, F., Waterman, A., Building Expert Systems, Addison-
Wesley: Reading, MA, 1983
[HAR00] Harmon, M., Harmon, S., Reinforcement Learning – A Tutorial, Wright
State University, www-anw.cs.umass.edu./~mharmon/rltutorial/frames.
html, 2000
[HAR85] Harmon, P., King, D., Expert systems, John Wiley&Sons Inc, 1985
[HAR88] Harmon, P., Maus, R., Expert systems. Tools and applications, John
Wiley&Sons Inc, 1988
[HAS88] Hassler, D., Artificiall Intelligence, Vol. 106, Number 2, 1988
[HOL75] Holland, J.H., Adaptation in natural and artificial systems, Ann Arbor,
MI, University of Michigan Press, 1975
[HOL86] Holland, J.H., Escaping brittleness: The possibilities of general-purpose
learning algorithms applied to parallel rule-based systems, in Machine
Learning, An Artificial Intelligence Approach, Volume 2, Los Altos, CA:
Morgan Kaufman, R.S. Michalski, J.G. Carbonelli, T.M. Mitchell (Eds.)
1986, pp.593-623
[ISH95] Ishida, T., Korf, R.E., A moving target search. A real-time search for
changing goals. IEEE Transaction on Pattern Analysis and Machine
Intelligence, 1995
[JAC86] Jackson, P., Introduction to Expert Systems, Addison-Wesley: Reading,
MA, 1986
[JUN97] Junling, H., Wellman, M., Learning about other agents in a dynamic
multi-agent system, Artificial Intelligence Laboratory, Michigan, 26
Octombrie, 1997
[JUN98] Junling, H., Wellman, M., Online learning about other agents in a
dynamic MAS, Proceedings of the 2nd International Conference on
Autonomous Agents, 1998, pp.239-246
[JUR00] Jurafsky, D., Martin, J.H., Speech and language processing, Prentice
Hall, 2000
[KEL93] Kelly, J., Artificial Intelligence - a modern myth, Ellis Horwood, 1993
[KIL01] Kilgarriff, A., Gold Standard Datasets for Evaluating Word Sense
Disambiguation Programs, http:// citeseer.nj.nec.com
[KOR90] Korf, R.E., Real-time heuristic search, Artificial Intelligence, 1990
[KOR92] Korf, R.E., Search, Encyclopedia of Artificial Intelligence, Wiley-
Interscience Publication, New York, 1992
[LEE88] Lee, K.F., Majahan, S., A pattern classification approach to evaluation
function learning, in Artificial Intelligence, 36, 1988, pp.1-25
[LES96] Lesperance, Y., Levesque, J., Foundation of a logical approach to agent
programming, LNAI Volume 1037, Springer Verlag: Berlin, Germany,
1996, pp.331-346
[LES99] Lesser, V., Atighetchi, M., The UMASS Intelligent home project,
Proceedings of the 3rd annual Conference on Autonomous Agents, 1999,
pp.291-298
[LET99] Leţia, I.A, Crăciun, F., şi alţii, Multiagent Systems, Casa Cărţii de
Ştiinţă, Cluj-Napoca, 1999

4

[LIN91/1] Lin, L.J., Self-improving reactive agents: Case studies of reinforcement
learning frameworks, Proceedings of the 1st International Conference on
Simulation of Adaptive Behavior: From Animals to Animats, 1991,
pp.297-305
[LIN91/2] Lin, L.J., Self-improvement based on reinforcement learning, planning
and teaching, Proceedings of the 8th International Workshop on Machine
Learning, 1991, pp.323-327
[LIN91/1] Lin, L.J., Programming robots using reinforcement learning and
teaching, Proceedings of AAAI-91, 1991, pp.781-786
[MAE90] Maes, P., Situated agents can have goals, in Designing Autonomous
Agents, the MIT Press: Cambridge, MA, 1990, pp.49-70
[MAH90] Mahadevan, S., Connell, J., Automatic programming of behavior-based
robots using reinforcement learning, IBM Technical Report, 1990
[MAN99] Manning, C., Schutze, H., Foundation of statistical natural language
processing, MIT, 1999
[MAT94] Mataric, J.M., Interaction and Intelligent Behavior, PhD Thesis,
Massachusetts Institute of Technology, 1994
[MIN61] Minsky, M.L., Computers and Thought, Mc Graw Hill, New York, 1963,
E.A. Feigenbaum & J. Feldman (Eds), pp.406-450
[MIT98] Mitchell, M., An introduction to Genetic Algorithms, MIT Press,
Cambridge 1998
[MOR99] Morley, D., Semantics of Actions, Agents and Environments, Doctoral
Thesis, University of Melbourne, 1999
[NAR98] Nardi, B.A., Miller, J., Wright, D., Collaborative programmable
Intelligent Agents, Communications ACM 41, 3, March, 1998, pp.96-104
[NEW82] Newell, A., The Knowledge Level, in Artificial Intelligence 18(1), 1982,
pp.87-127
[NIL80] Nilsson, N.J., Principles of Artificial Intelligence, Palo Alto, C.A.
Morgan-Kaufmann, 1980
[NIL98] Nilsson, N.J., Artificial Intelligence - A new Synthesis, Morgan-
Kaufmann Publishers Inc., San Francisco, California, 1998
[PER98] Perez-Uribe, A., Sanchez, E., BlackJack as a TestBed for Learning
Strategies in Neural Networks, Proceedings of the IEEE International Joint
Conference on Neural Networks, 1998
[POO98] Poole, D., MachWorth, A., Goebel, R., Computational Intelligence - A
Logical Approach, New York, Oxford University Press, 1998
[PYE99] Pyeatt, D.L, Howe, A.E., Integrated POMDP and Reinforcement
Learning for a two Layer simulated robot architecture, Proceedings of the
3rd annual Conference on Autonomous Agents, 19989, pp.168-174
[RAO92] Rao, A.S., Georgeff, M., An abstract architecture for rational agents,
Proceedings of Knowledge Representation and Reasoning (KR&R-92),
1992, pp.439-449
[RAO95] Rao, A.S., Georgeff, M., BDI Agents: from theory to practice,
Proceedings of the First International Conference on Multi-Agent Systems
(ICMAS-95), pp.312-319, San Francisco, CA, 1995

5

[RES98] Resnik, P., Yarowsky, D., Distinguishing Systems and Distinguishing
sense: new evaluation methods fot WSD, Natural Language Engineering,
1, nr. 1, 1998
[RIC91] Rich, E., Knight, K., Artificial Intelligence, Mc Graw Hill, New York,
1991, 2nd ed.
[RUS95] Russell, J.S, Norvig, P., Artificial Intelligence- A Modern Approach,
Prentice- Hall, Inc.,New Jersey, 1995
[RUS98] Russell, J.S., Learning agents for uncertain environments, Proceedings of
the 11th annual Conference on Computational Learning theory, 1998,
pp.101-103
[SAM59] Samuel, A.L., Some studies in Machine Learning using the game of
checkers, IBM Journal on research and Development, Number 3, 1959,
pp.210-229
[SEB01] Sebastiani, F., A tutorial on Automated Text Categorization, pp.1-25,
ESSLLI 2001
[SEO00] Seo, Y.W., Byoung, T.Z., A Reinforcement Learning agent for
personalized information filtering, Proceedings of the 2000 International
Conference on Intelligent user Interfaces, 2000, pp.248-251
[SER99] Şerban, G., Searching graphs in logical programming, Universitatea
“Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică,
Research Seminars, Seminar on Computer Science, 1999, pp.67-72
[SER00a] Tătar, D., Şerban, G., Training probabilistic context-free grammars as
hidden Markov models, Studia Universitatis "Babes-Bolyai", Informatica,
XLV(1), 2000, pp.17-30
[SER00b] Şerban, G., Intelligent agents and reinforcement learning, Universitatea
“Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică,
Research Seminars, Seminar on Computer Science, 2000, pp.13-20
[SER00c] Şerban, G., A method for training Intelligent Agents using Hidden
Markov Models, Studia Universitatis "Babes-Bolyai", Informatica,
XLV(2), 2000, pp.41-50
[SER01a] Şerban, G., Training Hidden Markov Models - a Method for Training
Intelligent Agents, Proceedings of the Second International Workshop of
Central and Eastern Europe on Multi-Agent Systems, Krakow, Poland,
2001, pp.267-276
[SER01b] Şerban, G., A Reinforcement Learning Intelligent Agent, Studia
Universitatis "Babes-Bolyai", Informatica, XLVI(2), 2001, pp.9-18
[SER01c] Tătar, D., Şerban, G., A new algorithm for word sense disambiguation,
Studia Universitatis "Babes-Bolyai", Informatica, XLVI(2), 2001, pp.99-
108
[SER01d] Şerban, G., Training intelligent agents as Hidden Markov Models,
Proceedings of Abstracts of the 4th Joint Conference on Mathematics and
Computer Science, Felix, Romania, 2001, pp.91
[SER01e] Şerban, G., A Simple Web Agent for displaying a tree structure,
Universitatea “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi
Informatică, Research Seminars, Seminar on Computer Science, 2001,
pp.41-48

6

[SER02a] Şerban, G., A New Real Time Learning Algorithm, Studia Universitatis
"Babes-Bolyai", Informatica, XLVII(1), 2002, pp.3-14
[SER02b] Şerban, G., Formal Methods in Agent-Based Systems, Proceedings of the
Symposium "Zilele Academice Clujene", Cluj-Napoca, 2002, pp.59-68
[SER02c] Şerban, G., LASG - A New Logic Architecture for Intelligent Agents,
Studia Universitatis "Babes-Bolyai", Informatica, XLVII(2), 2002, pp.13-
22
[SER02d] Şerban, G., Real Time Learning in Agent Based Systems, Proceedings of
the 4th International Workshop on Symbolic and Numeric Algorithms for
Scientific Computing, Timişoara, Romania, 2002, pp.337-347
[SER02e] Şerban, G., Tătar, D., A Word Sense Disambiguation Experiment for
Romanian Language, Studia Universitatis "Babes-Bolyai", Informatica,
XLVII(2), 2002, pp.37-42
[SER03a] Şerban, G., A New Reinforcement Learning Algorithm, Studia
Universitatis "Babes-Bolyai", Informatica, XLVIII(1), 2003, pp.3-14
[SER03b] Tătar, D., Şerban, G., Words Clustering in Question Answering Systems,
Studia Universitatis "Babes-Bolyai", Informatica, XLVIII(2), 2003, to
print
[SER03c] Şerban, G., A New Interface for Reinforcement Learning Software, Studia
Universitatis "Babes-Bolyai", Informatica, XLVIII(2), 2003, to print
[SER03d] Şerban, G., Tătar, D., Word Sense Disambiguation for Untagged
Corpus: Application to Romanian Language, Proceedings of CICLing
2003 (Intelligent Text Processing and Computational Linguistics), Mexico
City, Mexic, Lecture Notes in Computer Science N 2588, Springer-
Verlag, 2003, pp.270-275
[SER03e] Şerban, G., Tătar, D., An Improved Algorithm on Word Sense
Disambiguation, Proceedings of IIS 2003 (Intelligent Information Systems
- New Trends in Intelligent Information Proccessing and Web Mining),
Zakopane, Poland, 2003, to print
[SER03f] Orăşan, C., Tătar, D., Şerban, G., Oneţ, A., Avram, D., How to build a
QA system in your back-garden: application to Romanian, Proceedings of
EACL 2003, Budapest, Hungary, 2003, to print
[SHI98] Shimbo, M., Ishida, T., On the convergence of real-time search.. Journal
of Japanese Society of Artificial Intelligence, 1998
[SHO93] Shoham, Y., Agent-oriented programming, Artificial Intelligence, 60(1),
pp.51-92, 1993
[SUN96] Sung, K.K., Niyogi, P., A formulation for active learning with application
to Object Detection, AI Memo No. 1438, CBCL Paper No. 116, 1996
[SUN99] Sun, R., Session, C., Bidding in Reinforcement Learning - a paradigm for
MAS, Proceedings of the 3rd annual Conference on Autonomous Agents,
1999, pp.344-345
[SUT84] Sutton, R., Temporal credit assignment in Reinforcement Learning,
Doctoral Dissertation, Dept. of Computer and Information Science, Univ.
of Massachusetts, Amherst, 1884
[SUT88] Sutton, R., Learning to predict by the methods of temporal differences, in
Machine Learning, 3, 1988, pp.9-44

7

[SUT92] Sutton, R., A Special Issue of Machine Learning on Reinforcement
learning, Kluwer Academy Publishers, Boston, London, 1992
[SUT98] Sutton, R.S., Barto, A.G., Reinforcement learning, The MIT Press
Cambridge, Massachusetts, London, England, 1998
[TAT01] Tătar, D., Inteligenţa artificială: demonstrare automată de teoreme, pre-
lucrarea limbajului natural, Editura Albastră, 2000
[TOR90] Torras, C., Illari, J., 2D path planning: A configuration space heuristic
approach, The International Journal of Robotics Research, Number 2,
1990, pp.75-91
[WAL65] Waltz, M.D., Fu, K.S., A heuristic approach to reinforcement learning
control systems, IEEE Transactions on Automatic Control, AC-10, 1965,
pp.390-398
[WAN99] Wang, F., McKenzie, E., A multi-agent based evolutionary artificial
Neural Network for general navigation in unknown environments,
Proceedings of the 3rd annual Conference on Autonomous Agents, 1999,
pp.154-159
[WAT97] Watson, M., Intelligent Java applications for Internet and intranets,
1997, traducere, Ed. ALL, 1999
[WAT89] Watkins, C.J., Learning from delayed rewards, Ph.D. Thesis, King's
College, Cambridge, 1989
[WEI99] Weiss, G., Multiagent systems – A Modern Approach to Distributed
Artificial Intelligence, The MIT Press, Cambridge, Massachusetts,
London, 1999
[WER87] Werbos, P.J., Building and understanding adaptive systems: A
statistical/numerical approach to factory automation and brain research,
IEEE Transactions on Systems, Man and Cybernetics, Jan-Feb, 1987
[WHI89] Whitehead, S.D., Ballard, D.H., A role for anticipation in reactive
systems that learn, Proceedings of the 6th International Workshop on
Machine Learning, 1989, pp.354-357
[WHI90] Whitehead, S.D., Ballard, D.H., Active perception and Reinforcement
Learning, Proceedings, 7th International Conference on machine Learning,
Austin, Texas, 1990
[WHI91] Whitehead, S.D., Ballard, D.H., Learning to perceive and act by trial
and error, Machine Learning, Number 7, 1991, pp.45-84
[WIN84] Winston, P.H., Artificial Intelligence, Addison Wesley, Reading, MA,
1984, 2nd ed.
[WOL99] Wolpert, D., Wheeler, K., Turner, K., General Principles of learning-
based MAS, Proceedings of the 3rd annual Conference on Autonomous
Agents, 1999, pp.77-83
[WOO95] Wooldridge, M., Jennings, N.R., Intelligent Agents. Theory and
practice., The Knowledge Engineering Review, 10(2), pp. 115-152, 1995
[WOO97] Wooldridge, M., Agent-Based Software Engineering, Mitsubishi Electric
Digital Library Group, London, 1997
[WOR01] http://www.cogsci.princeton.edu/~wn/
[YAR95] Yarowsky, D., Unsupervised Word Sense Disambiguation Rivaling
Supervised Methods, Proceedings of ACL'95, pp.189-196

8

[YAR99] Yarowsky, D., Hierarchical Decision Lists for WSD, Kluwer Acadmic
Publishers, 1999
[YAR01] Yarowsky, D., Word-Sense Disambiguation Using Statistical Models of
Roget's Categories Trained on Large Corpora, http:// citeseer.nj.nec.com
[YIS99] Yishay, M., Reinforcement Learning and Mistake bounded algorithms,
Proceedings of the 12th annual Conference on Computational Learning
theory, 1999, pp.183-192

9

"Development Methods for Intelligent Systems"

Recommandé

Recommandé

Contenu connexe

Similaire à "Development Methods for Intelligent Systems"

Similaire à "Development Methods for Intelligent Systems" (20)

Plus de butest

Plus de butest (20)

"Development Methods for Intelligent Systems"