SlideShare une entreprise Scribd logo
1  sur  37
“BABEŞ-BOLYAI” UNIVERSITY OF CLUJ-NAPOCA
FACULTY OF MATHEMATICS AND COMPUTER SCIENCE




             GABRIELA ŞERBAN

   Development methods for Intelligent Systems


            Abstract of the Ph.D. Thesis




                      Supervisor
         Prof. univ. dr. MILITON FRENŢIU




                       2003
Table of Contents                                                                                                                                           page / page
                                                                                                                                                            abstract / thesis

INTRODUCTION ...........................................................................................................                                           2/ 5
CHAPTER I    INTELLIGENT SYSTEMS..............................................................                                                                     5/ 9

CHAPTER II                      MATHEMATICAL MODELS FOR INTELLIGENT
                                LEARNING AGENTS..........................................................................                                          7 / 65
CHAPTER III                     REINFORCEMENT LEARNING..............................................                                                               9 / 75
CHAPTER IV                      NEW APPROACHES IN INTELLIGENT
                                SYSTEMS......................................................................................................                      11 / 107
                                4.1 A new real time learning algorithm (RTL) ..................                                                                    11 / 107
                                4.2 A new Logic Architecture for Intelligent Agents
                                     (ALSO) ..................................................................................................                     12 / 115
                                4.3 A new reinforcement learning algorithm (URU) .....                                                                             13 / 123
                                4.4 A new algorithm for words sense disambiguation...                                                                              16 / 134
CHAPTER V                       IMPLEMENTATIONS OF INTELLIGENT
                                SYSTEMS......................................................................................................                      18 / 141
                                5.1 A reinforcement learning Intelligent Agent based
                                    on Adaptive Dynamic Programming.................................                                                               18 / 142
                                5.2 An Intelligent Agent based on training Hidden
                                    Markov Models..................................................................................                                18 / 147
                                5.3 A game playing Reinforcement Learning
                                    Intelligent Agent ("Agent 21").............................................                                                    19 / 152
                                5.4 "Know-How" Agent.......................................................................                                        20 / 152
                                5.5 A real-time learning Intelligent Agent..............................                                                           20 / 153
                                5.6 An agent for words disambiguation...................................                                                           21 / 159
                                5.7 A Web Agent to display a tree structure.........................                                                               22 / 168
                                5.8 A Logic-based Agent (ALSO) .................................................                                                   23 / 172
                                5.9 An URU reinforcement learning Intelligent Agent.                                                                               23 / 174
                                5.10 An Agent for words clustering.............................................                                                    23 / 181
CHAPTER VI                      A PROGRAMMING INTERFACE FOR
                                REINFORCEMENT LEARNING SIMULATIONS.....                                                                                            24 / 184
CONCLUSIONS                     ................................................................................................................................   28 / 199
REFERENCES                      ................................................................................................................................   29 / 200




                                                                         1
INTRODUCTION

         This PhD. thesis is the result of my researches in the field of Artificial
Intelligence, particularly in Machine Learning, started in 1998.
         The purpose of this abstract is to present an overview of the most important
aspects of the thesis, including the original results.
         Artificial Intelligence, one of the most recent disciplines appeared in the field of
computer science, tries to understand the intelligent entities. But, unlike philosophy and
psychology, which deal with the study of intelligence, too, the aim of Artificial
Intelligence is building and as well understanding intelligent entities.
         As nobody can predict the future in detail, computers endowed with a degree of
human intelligence (or more than that) would have a great impact in everyday life and in
the future course of civilization.
         So, the field of machine learning represents one of the most recent challenges of
the Artificial Intelligence, opening new perspectives for research in the field, and, of
course, a way for achieving intelligent systems.
         In our opinion, the most challenging field of Machine Learning is
Reinforcement Learning, dealing with the problem of improving the behavior of an
artificial agent based on feedback of its performance.
         On the other hand, we consider that one of the most suitable fields for exploiting
the intelligence is Natural Language Processing. This is why our researches are
directed, especially, to the fields of Reinforcement Learning and Natural Language
Processing.
         The thesis contains 133 references and is structured on six chapters, as follows.
         Chapter I, Intelligent Systems, discusses general aspects related to Artificial
Intelligence. The field of Intelligent Agents is presented, a domain of Artificial
Intelligence that offers a new problem solving method and a new way for interaction
between user and computer. Some of the most important (in our opinion) abstract and
concrete architectures for intelligent agents are presented, as well as the new
programming paradigm, agent oriented programming. In the last but one section of the
first chapter the representation based on logic and the reasoning in agent based systems is
discussed. There are presented aspects related to the Software Engineering and the formal
aspects of the agent based systems. The last section of the first chapter tackles with the
problem of machine learning, as an extremely important research direction in Artificial
Intelligence. The general model of a learning agent, aspects related to the learning of a
field, as well as the most important learning strategies are presented.
         Chapter II, Mathematical Models for Learning Intelligent Agents, presents three
mathematical models for agents that learn: Markov Decision Processes, Partial
Observable Markov Decision Processes and Hidden Markov Models. In our opinion, the
mathematical models described in this chapter have a very important role in modeling
and designing agent based systems, especially systems that learn by reinforcement (by
interaction with the environment).


                                             2
Chapter III, Reinforcement Learning, tackles with the extremely complex
problem of reinforcement learning, a learning approach that combines two disciplines in
order to solve problems that neither one could solve independently: Dynamic
Programming and Supervised Learning. This chapter presents the particularities of
Reinforcement Learning. There are discussed: the problem of learning based on a utility
function, the passive or active learning in known or unknown environments, the problem
of learning based on temporal differences (the most interesting variant of reinforcement
learning), the problem of Q-learning. The most important action selection mechanisms, as
well as the SARSA algorithm, the most efficient variant for Q-learning, are
comparatively discussed.
        Chapter IV, New approaches in Intelligent Systems, presents a few new learning
algorithms (approaches of Reinforcement Learning and learning algorithms that may be
used for Natural Language Processing) as well as a new logic architecture for intelligent
agents (showing the importance of logical approaches for intelligent agents). The learning
algorithms described in this section are: an algorithm for words sense disambiguation, a
real time learning algorithm, a reinforcement learning algorithm with application to
problems of motion for robots in environments with obstacles. Each of the original
approaches from this chapter are comparatively discussed with already existent
approaches, the advantages of the new approaches being illustrated.
        Chapter V, Examples of Intelligent Systems, presents practical applications
(concrete architectures for intelligent agents that improve their behavior by learning), i.e.
implementations of the theoretical aspects presented in chapters II and IV. For each such
application we developed experiments that illustrate the efficiency of the approaches. We
also suggested directions for generalization and for future researches.
        Chapter VI, A programming interface for reinforcement learning simulations,
presents a standard programming interface for reinforcement learning simulations in a
known environment, both the simulation of reinforcement learning based on utility
function and the simulation of Q-learning being possible. The use of the interface on an
example of movement of an embodied robotic agent in an environment with obstacles, is,
as well, illustrated.
        From the original elements presented through out this thesis we mention the
following:
             the use of Hidden Markov Models as mathematical models for Intelligent
             Learning Agents [SER01a] (section 2.3);
             algorithm for words sense disambiguation [SER03e] (section 4.4);
             a real time learning algorithm - RTL [SER02a] (section 4.1);
             the convergence theorem for the RTL algorithm [SER02a] (Theorem 4.1.1,
             section 4.1);
             a new Logic Architecture for Intelligent Agents - ALSO [SER02c] (section
             4.2);
             the reinforcement learning algorithm URU [SER03a] (section 4.3);



                                             3
the convergence theorem for the URU learning algorithm [SER03a]
           (Theorem 4.3.1, section 4.3);
           Theorems 4.3.2 and 4.3.3: the equilibrium equation of the utilities in the URU
           algorithm and, respectively, the error estimation in the URU algorithm
           [SER03a] (section 4.3);
           a reinforcement learning agent based on dynamic adaptive programming
           [SER00b] (section 5.1);
           intelligent agent based on training Hidden Markov Models [SER00c] (section
           5.2);
           a game playing reinforcement learning intelligent agent [SER01b] (section
           5.3);
           "Know-How" agent based on logic [SER02b] (section 5.4);
           real time learning agent [SER02d] (section 5.5);
           intelligent agent for words disambiguation [SER03d] (section 5.6);
           Web agent for displaying a tree structure [SER01e] (section 5.7);
           agent based on logic (ALSO) [SER02c] (section 5.8);
           an URU reinforcement learning intelligent agent [SER03a] (section 5.9);
           agent for words clustering [SER03b] (section 5.10);
           a programming interface for reinforcement learning simulations [SER03c]
           (chapter VI).

                                               *
                                            *      *
        I would like to express my gratitude to my supervisor, Prof. Dr. Militon
FRENŢIU, for his competent guidance, suggestions and encouragement.
        I would like to express my thanks to Assoc. Prof. Dr. Doina TĂTAR from the
Department of Computer Science of the Faculty of Mathematics and Computer Science
in Cluj-Napoca for her support along all the elaboration period of the thesis and for our
extremely good cooperation.
        I also want to give my thanks to Assoc. Prof. Dr. Horia POP from the same
department for all his competent recommendations.
        I am grateful to all the members of Computer Science Department for their
helpful suggestions.




                                           4
CHAPTER I
                         INTELLIGENT SYSTEMS

        Artificial Intelligence (AI) tries to understand intelligent entities. Therefore, a
reason for studying this field is to learn about ourselves.
        One question that AI has to answer is the following: how is it possible that a small
and slow, biological or electronic brain, to perceive, understand, predict and manipulate a
world much larger and more complicated than itself. All the researchers should look
themselves in the mirror in order to see an example of an intelligent system.
        There are two fundamental research directions in Artificial Intelligence [DUM99].
On one hand, the achievement of machines capable to reproduce the cognitive functions
of the human mind, the so-called cognitive modeling. In this way, the cognitive
psychology is one of the cognitive sciences whose method are taken over by Artificial
Intelligence, in fields as Natural Language Processing, learning, perception. Such
intelligent machines were named in the specific literature as “artificial brains”, the basic
idea being building intelligent machines starting from the models of the neuronal activity.
Such models are known as artificial neuronal networks. These research directions in
Artificial Intelligence define the field of Computational Intelligence (Intelligent
Calculus), its main characteristic being the numerical representation of knowledge.
        On the other hand, the achievement of programs that allows the symbolical
processing of information (knowledge) using logical methods, the so-called logical-
symbolical approach (rational thinking). The logical-symbolical tradition of Artificial
Intelligence tries to achieve intelligent systems using deduction mechanisms in order to
obtain a solution of the problem, based on the problem's description in the logical
formalism. This field of the traditional Artificial Intelligence is based on a symbolic
representation of the knowledge and on logical inferences on knowledge.
        All the AI problems could be gathered under the concept of intelligent agent. The
AI goal is to describe and build agents that receive perception from the environment and
executes actions.
        An agent is an entity that perceives its environment by sensors and acts on this
environment by actions.
        An agent is described by:
    the architecture part, the so-called behavior of the agent - the action accomplished
    after a perceptual sequence;
    the program part, the so-called internal part of the agent - how the agent's inside
    works.
        The goal of AI is to achieve the program part of the agent: a function that
implements the mapping from perception to action.


                                             5
The abstract architectures of intelligent agents described in this chapter are purely
reactive agents, and agents with states. The concrete architectures presented are logic
based architectures, reactive architectures, Belief-Desire-Intention architectures and
layered architectures.
        For many AI researchers, the idea of programming computer systems in terms of
mental notions such as belief, desire, and intention is the key component of agent-based
computing. The concept was articulated by Yoav Shoham, in his agent-oriented
programming (AOP) proposal [SHO93]. Thus, Yoav Shoham proposes a “a new
programming paradigm, based on a new point of view on the computational process”.
        AOP may be regarded as a kind of „post-declarative‟ programming. In AOP, the
idea is that, as in declarative programming, we state our goals, and let the built-in control
mechanism figure out what to do in order to achieve them. So, it is obvious that, when
specifying such systems, we have to use logical statements that may be formalized using
a logic of knowledge.
        “Intelligent agents are ninety-nine percent computer science and one percent AI”
[ETZ96].
        The technology of intelligent agents and multi-agent systems seems set to
radically alter the way complex, distributed, open systems are conceptualized and
implemented. The logic based representation and reasoning in agent based systems is an
important topic of the field, with formal methods having an important role [SER02b].
        There are two problems related to the formalization of the agent based systems:
    how the formal methods could be used for describing and reasoning about agents and
    their behavior, and
    how formal methods provide a way to show that the implemented agent systems are
    correct with respect to their specifications.
        The field of intelligent agents is related to another field of AI, the field of
machine learning. Machine learning represents the study of system models that, based on
a set of training data, improve their performance by experiences and by learning some
specific experimental knowledge.
        The attempt of modeling the human reasoning leads to the concept of intelligent
reasoning [FLA94]. Reasoning is the process of conclusion deduction; intelligent
reasoning is a kind of reasoning accomplished by humans. Most of the AI systems are
deductive, able to make inferences (draw conclusions), given their initial or supplied
knowledge, without being able of new knowledge acquisition or to generate new
knowledge.




                                             6
CHAPTER II
         MATHEMATICAL MODELS FOR INTELLIGENT
                  LEARNING AGENTS

         Formally, an agent based system is represented as a 4-tuple               S , A, h,U   , where
S        si is the space of the agent's states (the environment), A                 ai is the space of
     i                                                                         i

the agent's actions (the agent's behavior), h is the transition function between the states,
and U     (u1, u 2 ,...,u n ) is an array representing the utility function of the agent ( u i is the
utility of the state s i ).
         Most of the computational models of reinforcement learning are based on the
assumption that the environment the agent interacts with can be modeled as a Markov
Decision Process, defined as follows:
1. both the agent and the environment may be modeled as finite syncronized automata
     (with a finite number of states);
2. the agent and the environment interact in a discrete time;
3. the agent could perceive the environment's states and use them in order to select its
     actions;
4. after the agent acts, the environment makes a transition to a new state;
5. the agent receives a reward after each action performed.
         We say that the environment is a Markov Decision Process (satisfies the Markov
property) if the environment's response (signal) at the moment t+1 only depends on the
state and action at the moment t.
         It is well known that the Markov Decision Processes are very important in the
reinforcement learning theory (they represent 90% of modern reinforcement learning).
         In many situations the environment cannot not be “labeled” beforehand with
appropriate states, and the environment's states are not completely accessible to the agent.
Thus, the environment is continuous and partial observable. Partially Observable Markov
Decision Processes were developed specifically to deal with this problem. The use of
mathematical statistic-methods represents a leading topic in the field of Artificial
Intelligence.
         In the last section of this chapter we present how Hidden Markov Models may be
used as a mathematical tool for modeling the environment of intelligent agents
[SER01a].
         The Hidden Markov Model (HMM) is a generalization of Markov decision
processes, more transitions (arrows) from a state for a given action being possible.
Therefore, in an HMM we may have more paths covered for the same sequence of



                                                 7
actions, which implies that P(a1, n ) (which is the probability to have as input a sequence
made up of n actions, a1, a2 ,...,an , shortly written as a1, n ) is computed as the sum of the
probabilities on all the possible paths. The probability on a given path is computed by
multiplying the probabilities on each segment (transition) of that path.
        Definition. An HMM is a 4-tuple        s1, S , A, P , where S is a (finite) set of states,
 s1 S is the initial state, A is a set of input symbols (actions), and P is a set of transitions
(labeled arrows).
        A transition is defined as a 4-tuple: (si , s j , a k , p) , representing passing from
state s i to state s j for input (action) a k , transition evaluated as having the probability p.
As for a given input sequence we have more possible paths, the sequence of states it has
been passed through is not deductible from the input, but hidden (this gives the name of
the model we focus on).
           In the last section of this chapter, we present the way intelligent agents can be
modeled using a Hidden Markov Model and the way they can be trained using Viterbi's
algorithm [JUR00] and Baum-Welch algorithm [JUR00].
           In our opinion, Hidden Markov Models are very appropriate to be used in fields
as Natural Language Processing, writing and speech recognition, and also in motion
problems for robots in environments with perturbations [SER01a].




                                               8
CHAPTER III
                    REINFORCEMENT LEARNING
        Reinforcement learning is learning the mapping from situations to actions in order
to maximize a scalar reward or a reward signal (function). The learning agent does not
know a priori what actions to perform, as in most of learning cases, but it has to discover
alone what actions will lead to a maximum reward and try these.
        In most interesting and challenging cases, the actions could affect not only the
immediate reward, but all the subsequent situations, and consequently all the subsequent
rewards. These two characteristics - trial-error searches and delayed rewards are the most
important particularities of reinforcement learning.
        Reinforcement Learning can be considered, at the same time, a new but a very old
topic in Artificial Intelligence. The term was first used by Minsky (1961) [MIN61] and
independently in control theory by Waltz and Fu (1965) [WAL65].
        The first relevant learning research in this field is the checkers player of Samuel
(1959) [SAM59], which uses learning based on temporal differences to handle the
rewards. Of course, learning and reinforcement were studied by psychologists for more
than a century, and these researches had a great impact on AI and engineering activities.
        In this chapter, we present the way agents can learn in much less generous
environments, where the agent receives no examples, and starts with no model of the
environment and no utility function.
        Reinforcement Learning (RL) is the way to improve the behavior of an agent,
given a certain feedback about its performance.
        Reinforcement Learning [HAR00] is an approach to machine intelligence that
combines two disciplines to successfully solve problems that neither discipline can
address individually. Dynamic Programming is a field of mathematics that has
traditionally been used to solve problems of optimization and control. However,
traditional dynamic programming is limited in the size and complexity of the problems it
can address.
        Supervised learning is a general method for training a parameterized function
approximator, such as a neural network, to represent functions. However, supervised
learning requires sample input-output pairs from the function to be learned. In other
words, supervised learning requires a set of questions with the right answers.
        Unfortunately, we don‟t often know the correct answers that supervised learning
requires. For these reasons there has recently been much interest in a different approach
known as reinforcement learning (RL). Reinforcement learning is not a type of neural
network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach
that addresses a different, more difficult question. Reinforcement learning combines the



                                            9
fields of dynamic programming and supervised learning to yield powerful machine-
learning systems. Reinforcement learning appeals to many researchers because of its
generality. In RL, the computer is simply given a goal to achieve. The computer then
learns how to achieve that goal by trial-and-error interactions with its environment.
        In a reinforcement learning problem, the agent receives a feedback, known as
reward or reinforcement; the reward is received at the end, in a terminal state, or in any
other state, where the agent has exactly information about what he did well or wrong.
        A reinforcement learning problem has three fundamental parts [HAR00]:
    the environment – represented by “states”. Every RL system learns a mapping from
    situations to actions by trial-and-error interactions with a dynamic environment. This
    environment must at least be partially observable by the reinforcement learning
    system;
    the reinforcement function – the “goal” of the RL system is defined using the concept
    of a reinforcement function, which is the exact function of future reinforcements the
    agent seeks to maximize. In other words, there exists a mapping from state/action
    pairs to reinforcements; after performing an action in a given state the RL agent will
    receive some reinforcement (reward) in the form of a scalar value. The RL agent
    learns to perform actions that will maximize the sum of the reinforcements received
    when starting from some initial state and proceeding to a terminal state;
    the value (utility) function – explain how the agent learns to choose “good” actions, or
    even how we might measure the utility of an action. Two terms are defined: a policy
    determines which action should be performed in each state; a policy is a mapping
    from states to actions. The value of a state is defined as the sum of the reinforcements
    received when starting in that state and following some fixed policy to a terminal
    state. The value (utility) function would therefore be the mapping from states to
    actions that maximizes the sum of the reinforcements when starting in an arbitrary
    state and performing actions until a terminal state is reached.




                                            10
CHAPTER IV
      NEW APPROACHES IN INTELLIGENT SYSTEMS

       In this chapter we present some original approaches of Intelligent Systems: new
learning algorithms (for reinforcement learning, learning algorithms with use in Natural
Language Processing) and a new logic architecture for Intelligent Agents (showing the
importance of logic based approaches in this field).


       4.1 A new real time learning algorithm (RTL)

        All Artificial Intelligence problems require some sort of search [WEI99], and thus
search represents an important issue in Artificial Intelligence. Search algorithms are
useful for problem solving by intelligent (single or multiple) agents.
        In this section we propose an original algorithm (RTL) [SER02a], which extends
the Learning Real-Time A* (LRTA*) algorithm [KOR90], used for solving path-finding
problems.
        This algorithm preserves the completeness and the characteristic of LRTA* (a
real-time search algorithm), providing a better exploration of the search space.
        LRTA* is a learning version of the A* algorithm [KOR92], a standard search
algorithm in Artificial Intelligence. As we all know, the major problem of A* is, the large
computational complexity.
        LRTA* has two major problems:
        1. What happens if the search space is a cyclic graph?
        2. What happens in some plateau situations - states with more successor
            (neighbor) states having the same minimum h' value (the estimated distance
            from the current state to the goal state)? The choice of the next action is
            nondeterministic.
        RTL proposes alternatives for solving these problems.
        The proposed solutions are:
        1. We maintain a record of the visited nodes, but we do not save the h' values for
            each node.
        2. In order to choose the next action in a given state, the agent determines the set
            S of states not visited by the agent and with a minimum h’ value. If S is
            empty, the training fails, otherwise, the agent chooses a state from S using an
            action selection mechanism (this allows a better exploration of the search
            space).
        The idea of our algorithm is the following:



                                            11
through repeated training episodes, the agent tries some paths (possible
           optimal) to a goal state, and memorizes the shortest;
           the number of trials is selected by the user;
           after a training trial there are two possibilities:
                    the agent reaches a goal state; in this case the agent saves the path and
                    it's cost;
                    the learning process fails (the agent does not reach the final state,
                    because it was blocked).

       Theorem 4.1.1. [SER02a] If the values of the heuristic function (the initial h’
values) are admissible (does not overestimate the real values), then the RTL algorithm is
convergent.
       At the end of this section we give some directions for future research concerning
the RTL algorithm.



       4.2 A new Logic Architecture for Intelligent Agents (ALSO)

        The logic approach is a topic of Symbolic Artificial Intelligence and has its own
importance for intelligent agents, even if the controversy between the traditional
approach and intelligent calculus in Artificial Intelligence is well known.
        Moreover, the only intelligence requirement we generally make for the agents is
that [WOO97] they can make an acceptable decision about what action to perform next in
their environment, in time for this decision to be useful. Other requirements for
intelligence will be determined by the domain in which the agent is applied: not all agents
will need to be capable of learning, for example.
        In such situations, a logic architecture is very appropriate, and offers, in our
opinion, a simple and elegant representation for the agent's environment and desired
behavior.
        The most important disadvantage of the traditional logic architecture is the large
complexity of the computational process.
        In this section we present a new logic architecture for intelligent agents (ALSO - a
logic architecture based on stacks of goals) [SER02c]. This architecture combines the
traditional logic architecture with a planning architecture [RIC91].
        We consider the case of an agent whose goal is to solve a given problem (pass
from an initial to a final state), based on a set of operators (rules) that may be applied on a
given moment.



                                              12
In an ALSO architecture, we will use the declarative representation of
knowledge.
        Each operator will be described by a list of new predicates that are made true by
the operator and a list of old predicates that are made false by the operator. The two lists
are ADD, respectively DELETE. Moreover, for each operator a third list is specific,
PRECONDITION, which contains all the predicates that must be true in order to apply
the operator. The frame axioms are implicitly specified in ALSO. Each predicate not
included in the ADD or DELETE lists of a certain operator is not affected by that
operator.
        The idea of the algorithm is to use a stack of goals (a unique stack that contains
both goals and operators proposed for solving those goals). The problem solver is also
based on a database that describes the current situation (state) and a set of operators
described by the PRECONDITION, ADD and DELETE lists.
        As a case study we illustrate the use of the ALSO algorithm on a maze searching
problem.
        The Logic Architecture based on a stack of goals improves the traditional logic
architecture for intelligent agents in the following directions:
            Comparatively to the traditional logic approach, where the number of the
            frame axioms that should be saved in order to make a correct inference is
            large, the proposed architecture reduces this number. In other words, the space
            complexity is reduced.
            Because of a limited Depth First search, the time complexity is reduced, too.
            The representation is very simple and elegant.
         At the end of this section we give some extension possibilities of the ALSO
architecture.

       4.3 A new reinforcement learning algorithm (URU)

        In this section we propose an original algorithm (URU - Utility-Reward-Utility)
[SER03a], namely a temporal difference reinforcement learning algorithm. This
algorithm may be used for path finding problems in a rectangular environment. We also
prove the convergence of this algorithm.
        An important problem in robotics is to generate a path between an initial and a
final configuration, so that the robot could avoid some obstacles. This problem is called a
robot path-finding problem.
        Many approaches to solve this problem were proposed in Artificial Intelligence
and Computational Geometry [BRA82], [TOR90]. All these approaches are based on
planning. The time complexity of these approaches grows exponentially with the number



                                            13
of moving degrees of the robot [CAN88]. We consider that heuristic approaches (section
4.1) or learning approaches (section 4.3) are important, because they contribute to
reducing the time complexity of the problem.
        We consider the environment represented as a space of states (each state is
characterized by its position in the environment - two coordinates specifying the X-
coordinate, respectively the Y-coordinate of the current position). The goal of the agent is
to learn to move from an initial to a final state, on a shortest path (considered as the
transitions number between states).
        The notations used in what follows are:
        the environment, represented as a space of states: M {s1, s2 ,...,sn } ;
        si   M , sf    M : the initial, respectively the final state (the problem could be
       generalized for a set of final states);
       h:M        (M ) : the transition function between the states, having the following
       meaning: h(i) { j1, j2 ,..., jk } if, at a given moment, from the state i the agent
       could move to one of the states j1, j2 ,..., jk ; a state j accessible from the state i, in
       other words j     h(i) , is called neighbor state of the state i;
       the transition probabilities between a state i          and any neighbor state j are
                    1
        P(i, j )          .
                 | h(i) |
        To solve this problem, we propose a reinforcement learning algorithm, based on
learning the states' utilities (values), in which the agent receives rewards from
interactions with the environment.
        The algorithm, based on Temporal Differences, works on the following idea:
    the agent starts with some initial estimates of the states‟ utilities;
    during certain training episodes, the agent will experiment some possible optimal
    paths from si to sf, updating, properly, the states' utilities estimations;
    during the training process the states' utilities estimations converge to their exact
    values, and, thus, at the end of the training process, the estimations will be in the
    neighborhood of the exact values.

       We make the following notations:
         U(i) - the estimated utility of the state i;
         R(i) - the reward received by the agent in the state i.

       We make the following specifications:
         the training process during an episode has the worst case complexity O(n2),
         where n is the number of environment states;


                                               14
in a training sequence, the agent updates the utility of the current state using
           only the selected successor state, not all the successors (the temporal
           difference characteristic).



       Case study

       If we consider the reward function defined as r(s)=-1 if s sf, and r(s)=0,
otherwise, it is obvious that the goal of the learning process is to minimize the transitions
number from the initial to the final state (the agent learns to move on the shortest path).
       We consider that the environment has the following characteristics:
           the environment has a rectangular form;
           at any given moment, from any given state, the agent can move in only four
           directions: North, South, East, West.
       In what follows, we state an original theorem that gives the convergence
conditions of the URU algorithm.

      Theorem 4.3.1. [SER03a] Let us consider the learning problem described above
and which satisfies the conditions:
          the initial values of the states' utilities are computed as
               U ( s)    d (s, sf ) 2,    s M
        where d(s1,s2) represents the Manhattan distance between the states s1 and s2;
                                  1
             the reward factor      ,
                                  3
        In this case, the URU algorithm is convergent (the states' utilities are convergent
after the training sequence).

       We denote by U n (i ) the estimated utility of state i at the n-th training episode.
       The proof of Theorem 4.3.1 is based on the following lemmas:

       Lemma 1. [SER03a] At the n-th training episode of the agent the following
inequality holds:
               U n (i) 2, i M
       Lemma 2. [SER03a] At the n-th training episode of the agent the following
inequality holds:
               | U n (i) U n ( j ) | 1
for each transition from i (i sf) to j made by the agent in the current training sequence.



                                              15
Lemma 3. [SER03a] The inequality Un 1 (i) Un (i) holds for every i              M and
n   N , in other words the states' utilities increase from a training episode to another.

        Theorem 4.3.2 gives the equilibrium equation of the states utilities after the URU
algorithm has been run.

        Theorem 4.3.2. [SER03a] In our learning problem, the equilibrium equation of
the states' utilities is given by the following equation:
        *                                       *
       UURU (i)   ri                           UURU ( j ),   i   M
                       card(h(i)) j succesor al lui i
       *
where UURU (i) represents the exact utility of the state i, obtained after the URU algorithm
has been run. We note by card(M) the number of elements of the set M.

       Comparatively to the classical TD algorithm, we make the following remarks
(which naturally come from the theoretical results given):
           the states' utilities grow faster in the URU algorithm than in the TD algorithm;
           in other words UURU (i) U DT (i), i M ;
           in the case of our learning problem, as we proved in Theorem 4.3.1, for          1
           (the TD algorithm), we cannot prove the convergence of the states' utilities.

       Further work is planned to be done in the following directions:
          to analyze what happens if the transitions between states are nondeterministic
          (the environment is a Hidden Markov Model [SER01a]);
          to analyze what happens if the reward factor ( ) is not a fixed parameter, but
           a function whose values depend on the current state of the agent.
           to develop the algorithm for solving path-finding problems with multiple
           agents.



       4.4 A new algorithm for words sense disambiguation

       We introduce in this section an original algorithm (BA) [SER01c] which
combines the elements of Yarowsky‟s algorithm [YAR99, RES98, YAR01, YAR95] and
from the Naive Bayes Classifier (NBC).
       This algorithm preserves the advantage of principles of Yarowsky (one sense per
discourse and one sense per collocation) with the known high performance of NBC



                                                        16
algorithms. The algorithm learns to make predictions based on local contexts, starting
from a reduced set of labeled contexts (with meanings) and from a bigger set of unlabeled
contexts.
        The two important characteristics of the BA algorithm are:
    the set of words used as contextual features for the disambiguation - as the number of
    elements from this set grows, as the disambiguation's precision grows (this is
    experimentally shown in section 5.6);
    the relation (      Wx (W ) , W being the set of words, and (W ) being the set of
   parts of W, particularly a set of contexts) - using this relation grows, also, the
   disambiguation's precision.




                                           17
CHAPTER V
     IMPLEMENTATIONS OF INTELLIGENT SYSTEMS

       5.1 A reinforcement learning Intelligent Agent based on Adaptive
           Dynamic Programming

        In this section, we present an Intelligent Agent [SER00b] that learns by
reinforcement using an Adaptive Dynamic Programming method (section 3.2).
        We consider a simple Markov decision process with n2 states. The state space can
be visualized using a nxn grid. Each square represents a state. The reinforcement function
is constant -1 (i.e., the agent receives a reinforcement of -1 on each transition). There are
4 actions possible in each state: North, South, East, and West. The goal states are the
upper left corner and the lower right corner.
        The program that implements the agent with the functionality described above has
been realized in two versions: in Borland Pascal and Visual Prolog.


       5.2 An Intelligent Agent based on training Hidden Markov
           Models

        In this section we want to test a system represented as a Hidden Markov Model
(section 2.3). We propose an agent for characters recognition [SER01a].
        Let us the consider the example of an agent that has to recognize two characters
“I” and “U”. We propose the model described in Figure 5.1.




                            Figure 5.1. The Hidden Markov Model




                                             18
In this section we present how the Hidden Markov Model can be trained to
recognize the two characters and how the recognition of other characters works.
        The agent implementation has been written in Microsoft Visual C++6.0.
        In the case of the proposed experiment, some future works may be:
            how the proposed model could be generalized for as many characters as
            possible;
            how the structure of the HMM could be dynamically generated;
            when such probabilistic methods are more suitable than other character
            recognition methods;
            how probabilistic methods could be combined with others (heuristics or
            otherwise) for obtaining a higher performance of the model;
            what would happen in certain "plateau" situations (where a given entry would
            have the same probability in more environments).
        This section aimed to emphasize another way of working on problems of training
intelligent agents.


       5.3 A game playing Reinforcement Learning Intelligent Agent
           ("Agent 21")

        In this section we design an Intelligent Agent [SER01b] who learns (by
reinforcement) to play the "Game 21", a simplified version of the game "Black -Jack".
The algorithm used for training the agent is the SARSA algorithm (section 3.6).
        We propose a game with two players: a dealer (which distributes the cards) and an
agent, in fact the computer (which uses the SARSA algorithm for learning the game's
rules and which plays against the dealer). The agent tries to win, obtaining a score (the
sum of the cards' values) less than or equal to 21, but greater than the dealer's score.
        The probabilistic nature of this game makes it an interesting problem for testing
learning algorithms, even if learning of a good playing strategy is not obvious.
        Learning from a teacher (the supervised learning) is not useful in this game, as the
output data for a given state are unknown. The agent has to explore different actions and
to develop a strategy by selecting and maintaining the actions that maximize its
performance.
        The purpose of this application is to study the behavior of an agent, that initially
knows neither the rules nor the game's goal, and that follows the way for assimilating
these goals.
        After learning, the agent (the computer) wins about 54% of the rounds (Figure
5.2). This is due to the fact that “Game 21” is a random game. The agent tries to manage,
as well as it is able to, the cards received from the dealer.


                                            19
As a remark, even if the game has a strong probabilistic character, after the
training rounds the agent plays approximately identically or maybe even better than a
human player (with a random playing strategy).




               Figure 5.2. Percentage of the won games for the trained agent




       5.4 "Know-How" Agent

        This section refers to the implementation of the “Know-how” concept (section
1.3.4) in a Multi-Agent System (system with several non-cooperative agents) [SER02b].
        Because of the strong logical formalism of the “Know-How” concept, the
application was written in Visual Prolog.


       5.5 A real-time learning Intelligent Agent

         In this section we describe an Intelligent Agent [SER02a] who learns to find the
minimum number of transitions needed for moving (in a space of states) between an
initial and a final configuration, using the RTL algorithm (section 4.1).
         The use of A* algorithm for solving such problems is well known. The A*
algorithm, an informed search algorithm has the major disadvantage of the exponential
complexity of the execution time.


                                           20
Thus, a learning version is very suitable, since it reduces the algorithm‟s
complexity.
        We propose an experiment in order to illustrate the advantages of the RTL
algorithm.
        The application is written in Borland C and implements the behavior of an
Intelligent Agent whose aim is to find the way out of a maze on a shortest path.
        Comparatively, we have also used the LRTA* algorithm, and we noticed that,
because of the insufficient exploration of the environment, the agent fails.


       5.6 An agent for words disambiguation

        In this section we describe an Intelligent Agent [SER03e] who learns to
disambiguate a polysemic word (to determine the meaning of the word in a set of
contexts).
        The application is written in JDK 1.4 and uses the disambiguation algorithm BA
described in section 4.4.
        We made two experiments: one for the English language and one for the
Romanian language. The second experiment is very significant, because for the
Romanian language something similar with WordNet does not exist and the BA
algorithm only requires information that can be extracted from untagged corpus.
         Our aim is to use the BA algorithm for the Romanian language, to disambiguate
the word poarta in some contexts obtained with an on-line corpus tool (we use a
Romanian corpus).
        We make the following statements:
           the target word poarta has, in Romanian language, four possible meanings
           (two nouns and two verbs);
           we test our algorithm starting with 38 contexts for the target word;
           we start with four words as contextual features for the disambiguation (a
           single feature for each sense).
        As an evaluation measure of the algorithm we use the accuracy (5.1).

                     number of contexts correctly solved
              A=                                                                 (5.1)
                     total number of contexts that have to be solved

       The accuracy of the BA algorithm in the proposed experiment is 60%.
       We realised an experimental comparison between the Naive Bayes Classifier and
the BA algorithm with and without the relation (section 4.4).


                                          21
The comparative experimental results are shown in Figure 5.3. In Figure 5.3, we
give a graphical representation of the accuracy/context for each algorithm. More exactly,
for any given algorithm, for the i-th context we represent the accuracy (5.1) of the
algorithm for the first i contexts.
        From Figure 5.3, it is obvious that the most efficient is the BA algorithm with the
relation (the algorithm's accuracy grows with the number of contexts).

                       100
                        90                                            BA with relation
   Accuracy (%)




                        80
                        70
                        60                                            NBC Algorithm
                        50
                        40
                        30                                            BA Algorithm without
                        20
                        10                                            relation
                         0
                                       11
                                            16
                                                 21
                                                       26
                                                            31
                                                                 36
                               1
                                   6




                                            Contexts

                  Figure 5.3. Experimental comparison between the NBC and BA algorithms

       We make the following remarks:
            in the above experiment we grow the number of occurrences for each meaning
            of the target word and we remark that with two occurrences for each meaning
            the algorithm's accuracy grows with 10%, with three occurrences for each
            meaning the accuracy grows with 15%;
            we increase the number of input contexts (100) and we remark that the
            algorithm's accuracy grows with 15%.
       As a conclusion, if the number of words used as contextual features for the
disambiguation and the number of contexts grows, the accuracy of the BA algorithm
grows, too.


                  5.7 A Web Agent to display a tree structure

        In this section we present an Intelligent Agent [SER01e] aimed to browse a
structure of remote files. It is an agent whose environment is a soft environment, with the
sensors as software functions and the actions are software actions.




                                                  22
5.8 A Logic-based Agent (ALSO)

        In this section we describe the implementation part of a Logic Agent based on a
stack of goals [SER02c], whose architecture has been presented in section 4.2.
        Because the above described architecture is based on logic and because the
algorithm described in section 4.2 needs backtracking to find all the solutions, the
implementation was written in Visual Prolog. The declarative programming languages
(as Prolog) have a built-in control mechanism which allows finding all the solutions of a
problem.
        We have to say that the stack (stacks) of goals that we have to create for applying
the algorithm are implicitly memorised by the control strategy of Prolog (a mechanism
which allows backtracking).

       5.9 An URU reinforcement learning Intelligent Agent

       In this section we describe an Intelligent Agent [SER03a] that learns to find the
minimum number of transitions (steps) for moving between an initial and a final
configuration, using the URU algorithm (described in section 4.3).
       The application is written in JDK 1.4. We develop an experiment for an agent
that moves in a rectangular maze, illustrating the theoretical results obtained in section
4.3.

       5.10 An Agent for words clustering

       In this section we describe an Intelligent Agent [SER03b] who learns to cluster
[MAN99] a set of words, starting from a set of contexts, from a set of words having a
maximum frequency in the given contexts.
       We propose an experiment for the Romanian language; we ran the algorithm on
10000 contexts and 200 words, with very good results.
       We mention that this application is used as a part of a Question Answering (QA)
system for the Romanian language [SER03f].




                                            23
CHAPTER VI
A PROGRAMMING INTERFACE FOR REINFORCEMENT
          LEARNING SIMULATIONS

        In this chapter we propose an original programming interface for reinforcement
learning simulations in known environments [SER03c]. Using this interface, simulations
are possible both for reinforcement learning based on the states utilities and learning
based on actions values (Q-learning).
        The interface is realized in JDK 1.4, and is meant to facilitate software
development for reinforcement learning in known environments.
        There are three basic objects: agents, environments and simulations.
        The agent is the learning agent and the environment is the task it interacts with.
The simulation manages the interaction between agent and environment, collects data and
manages the display, if any.
        Generally, the inputs of the agent are perceptions about the environment (in our
case states from the environment), the outputs are actions, and the environment offers
rewards after interacting with it.
        Figure 6.1 illustrates the interaction between the agent and the environment in a
reinforcement learning task.



                                   Agent

        Perception                          Reward                Action


                                Environment

              Figure 6.1. Interaction between the Agent and the Environment

        The reward is a number; the environment, the actions and perceptions are
instances of classes derived from the IEnvironment, IAction and IState interfaces
respectively. The implementation of actions and perception may be arbitrary as long as
they are properly understood by agent and environment. The agent and the environment
have, obviously, to be chosen to be compatible with each other.
        The interaction between agent and environment is handled in discrete time. We
assume we are working with simulations. In other words there are no real-time


                                           24
constraints enforced by the interface: the environment waits for the agent while the agent
is selecting its action and the agent waits for the environment while the environment is
computing its next state.
        We assume that the agent's environment is a finite Markov Decision Process.
        To use the interface, the user has to define the specialized object classes HisState,
HisEnvironment and HisAgent, by creating instances for each. The agent and the
environment are then passed to a simulation object (CSimulation), that initializes and
interconnects them. Then, CSimulation::init() will initialize and run the simulation.
        If the agent learns the states' utilities, it has to be derived from the AgentUtility
class, otherwise, if it learns the actions' values (Q-learning) it has to be derived from the
AgentQValues class.
        RLAgent is an abstract class, the basic class for all the agents. The specific agents
will be instances of subclasses derived from RLAgent.
        AgentUtility is an abstract class, a subclass of the class RLAgent, being the entity
defining the behavior of an agent that learns by reinforcement based on the states' utilities
(the algorithm used for learning is the URU algorithm described in section 4.3
[SER03a]).
        AgentQValues is an abstract class, a subclass of the class RLAgent, being the
entity defining the behavior of an agent that learns by reinforcement based on actions'
values (the algorithm used for learning is the SARSA algorithm [SUT98]).
        IEnvironment is an interface, the basic class for all environments. The specific
environments will be instances of subclasses derived from IEnvironment.
        CSimulation is the basic object of the interface, that manages the interaction
between the agent and the environment. It defines the heart of the interface, the uniform
usage that all agents and environments are meant to conform to. An instance of the
simulation class is associated to an instance of an agent and an environment at the
creation moment. This is made in the constructor of the class CSimulation.
        We illustrate the use of the interface on a concrete example: the motion problem
of an embodied agent whose goal is to find the way out from a maze with obstacles.
        The UML diagram for the interface is described below.




                                             25
CSimulation                                                              Interface
                                                                                                                         IEnvironment
               a: RLAgent
               m: IEnvironment
                                                                                                      + isValid(s: IState): Boolean
             + CSimulation(a: RLAgent, m: IEnvironment)
                                                                                                      + initial(): IState
             + init(alpha: Real, gamma: Real, epsilon: Real, episoade: Integer): void
                                                                                                      + final(s: IState): Boolean
             + policy(ps: PrintStream)
                                                                                                      + next(s: IState, a: IAction, r: Integer): IState
                                                                                                      + value(s: IState): Real


                                                                                   Abstract
                                                                                   RLAgent
                                                                                                                                              MyEnvironment
         Interface                   # a: IList
           IList                     # l: IList

                                     + abstract actions(): void
+ add(o: Object): void               + abstract choose(e: IElement, r: Real, epsilon: Real, m: IEnvironment): IElement
+ delete(i: Integer): void           + abstract next(s: IState, m: IEnvironment): QValues
+ dim(): Integer                     + abstract initial(s: IState): IElement
+ elem(i: Integer): Object           + learning(alpha: Real, gamma: Real, epsilon: Real, episodes: Integer) : void
                                     # notVisited(s: IState): Boolean
                                     # value(e: IElement, m: IEnvironment): Real
                                       appear(e: IElement): Boolean
                                       add(e: IElement, m: IEnvironment): void
                                       init(s: IState): void
            CList                      update(e: IElement, eurm: IElement, r: Integer, alpha: Real, gamma: Real): void

     list: ArrayList
RLAgent




                              AgentUtility                                                             AgentQValues

                                                                            next(s: IState, r: Integer, epsilon: Real, m: IEnvironment): IElement


                        MyAgent                                                Abstract
                                                                              IElement
         Interface                                           # o: Boolean                                                      Interface
          IState                                             # s: IState                                                        IAction
                                                             # a: IAction

+ toString(): String                                         + IElement(s: IState)                                 + toString(): String
+ equal(s: IState): Boolean                                  + IElement(s: IState, a:IAction)                      + equal(a: IAction): Boolean
                                                             + state(): IState
                                                             + action(): IAction
                                                             + value(): Real
                                                             + retO(): Boolean
                                                             + setValue(v: Real): void                                         MyAction
                                                             + setAction(a: IAction): void
         MyState                                             + setO(o: Boolean): void
                                                             + toString(): String
                                                             + abstract equal(e: IElement): Boolean



                                               Utility                                                   QValues

                          + Utility(s: IState)                                        + QValues(s: IState)
                          + Utility(s: IState, a: IAction)                            + QValues(s: IState, a: IAction)
                                                                   27
CONCLUSIONS


        The aim of this work is to be a supporting evidence to the idea that the field of
agents, and especially of intelligent agents is considered an important research and
development area in the field of Computer Science and of Artificial Intelligence.
        On the other hand, learning problems represent an important research direction in
Artificial Intelligence, machine learning, being the study of system models that, based on
an input data set (training set) improve their performances by experience and by learning
some specific knowledge.
        Reinforcement Learning, seen as a method to solve problems of machine learning,
based on new knowledge acquisition [BER91], is one of the ways that may form the basis
for proving the machines' performances, being, in the same time, a field open to future
researches.
        Because of the potential to eliminate manual encoding of control strategies,
reinforcement learning is one of the most active research domains in machine learning.
Reinforcement learning in unknown environments is a current research topic in this field.
        The aim of the current work is to realise a presentation of the main results related
to reinforcement learning, as well as the design and implementation of intelligent agents,
to offer support to understand:
    why agents are considered a new important way in conceptualization and
    implementation of a different kind of applications;
    what intelligent agents are (and what they are not) and how they are related to other
    programming paradigms, the object oriented programming;
    why learning, and reinforcement learning, particularly, represent an extremely novel
    and dynamic field in Artificial Intelligence.
        The work presents some new approaches in Intelligent Systems (new learning
algorithms), presenting possible generalizations, open problems, opening, in this way,
new research directions in Machine Learning.
        The study of learning was focussed in this thesis on the model of systems with a
single agent, in known environments. As future research we will consider the case of
unknown environments and systems with more agents (multiagent systems).
REFERENCES

[ALE95]   Allen, J., Natural language understanding, Benjamin/Cummings Publ.,
          2nd ed., 1195
[ANT96]   Antos, A., Lugosi, A., Strong MiniMax lower bounds for learning,
          Proceedings of the 9th annual Conference on Computational Learning
          theory, 1996, pp.303-309
[BAR91]   Barto, A.G., Bradtke, S.J., Singh, S.P., Real time learning and control
          using asynchronous dynamic programming, (Technical Report 91-57),
          Amherst, MA: University of Massachusetts, Computer Science
          Department
[BAR99]   Barber, K.S., Kim, J., Constructing and dynamically maintaining
          perspective-based agent models in a MA Environment, Proceedings of the
          3rd annual Conference on Autonomous Agents, 1999, pp.416-417
[BER91]   Bergadano, F., Giordana, A., Machine Learning, Ellis Horwood
          Limited, Chichester, England, 1991
[BET93]   Betke, M., Rivest, R., Singh, M., Piecemeal Learning of an Unknown
          Environment, AI Memo, No. 1474, CBCL Memo No. 93, 1993
[BOO88]   Booker, L., B. Classifier systems that learn world models, Machine
          Learning, Number 3, 1988, pp.161-192
[BOO94]   Booch, G., Object-Oriented Analysis and Design, Addison-Wesley:
          Reading, MA, 1994
[BRA82]   Brady, M., Hollerbach, J.M., Robot motion: planning and control,
          Cambridge MA, MIT Press, 1982
[BR086]   Brooks, R.A,, A robot layered control system for a mobile robot, IEE
          Journal of Robotics and Automation, 2(1), pp.14-23, 1986
[CAN88]   Canny, J.F., The complexity of robot motion planning, Cambridge MA,
          MIT Press, 1988
[CAR79]   Carthy, J., Ascribing mental qualities to machines, Philosophical
          Perspectives in Artificial Intelligence, Harvester Press, 1979
[CÂR94]   Cârstoiu, D. Sisteme expert, Editura ALL, Bucureşti, 1994
[CAS94]   Cassandra, A.R., Kaelbling, L.P., Littman, M.L., Acting Optimally in
          Partial Observable Stochastic Domains, Proceedings, 7th AAAI-94,
          Seattle, Washington, 1994
[CHA95]   Chaib-Draa, B., Industrial Applications of Distributed AI,
          Communications on the ACM, Vol. 38, No.11, 26 Octombrie, 1997
[CHA96]   Charniak, E. Statistical language learning, The MIT Press, 1996
[CHE80]   Chellas, B., Modal Logic: An Introduction, Cambridge University Press:
          Cambridge, England, 1980
[CHI99]   Chiampolini, A., Laerma, E., Mello, P., Abductive Coordination for
          logic agents, Proceedings of the 1999 ACM symposium on Applied
          computing, 1999, pp.134-140
[COB02]   http://titania.cobuild.collins.co.uk/form.html
[DAE00]   Daelemans, W., Zavrel, J., Tilburg Memory Learner - version 3.0,
          University of Antwerp, http://ilk.kub.nl, 2000


                                     2
[DAY92]     Dayan, P., The convergence of TD( ) for general , Machine Learning,
            Number 8, 1992, pp.341-362
[DEN87]     Dennett, D.C., The Intentional Stance, The MIT Press: Cambridge. MA,
            1987
[DUM95]     Dumitrescu, D., Inteligenţă Artificială, Litografie, Universitatea "Babeş-
            Bolyai" Cluj-Napoca, 1995
[DUM99]     Dumitrescu, D., Principiile Inteligenţei Artificiale, Ed. Albastră, Cluj-
            Napoca, 1999
[ESC00]     Escudero, G., Marquez, L., Rigau, G. Boosting applied to WSD,
            ACML, Barcelona, Spain , 2000
[ETZ96]     Etzioni, O., Moving up the information food chain: Deploying softbots on
            the world-wide web, Proceedings of the Thirteenth National Conference
            on Artificial Intelligence (AAAI-96), Portland. OR, 1996
[FIE94]     Fiechter, C.N., Efficient Reinforcement Learning, Proceedings of the 7th
            annual ACM Conference on Computer Learning theory, 1994, pp.88-97
[FIS93]     Fisher, M., Concurrent MetateM, A language for modeling reactive
            systems, Proceedings of Parallel Architectures and languages Europe
            (PARLE), Springer Verlag, 1993
[FIS93]     Fisher, M., Wooldrige, M., Specifying and Verifying Distributed
            Intelligent Systems, in Progress in Artificial Intelligence, Proceedings of
            Portuguese Conference on Artificial Intelligence (EPIA), Porto, Portugal,
            1993, Springer Verlag as "Lecture notes in Computer Science", vol. 727
[FIS95/1]   Fisher, M., Towards a semantics for Concurrent MetateM, in Executable
            Modal and Temporal Logics, Springer Verlag, Lecture Notes in Artificial
            Intelligence, vol. 897, 1995
[FIS95/2]   Fisher, M., Representing and Executing Agent-based Systems, Springer
            Verlag, Lecture Notes in Computer Science, vol. 890, 1995
[FLA94]     Flach, P., Simply Logical Intelligent reasoning by Example, John Wiley &
            Sons, Chichester, England, 1994
[FLO94]     Florea, A., Boangiu, A., Inteligenţă Artificială, Editura Universităţii
            "Politehnica", Bucureşti, 1994
[FRE94]     Frenţiu, M., Pârv, B., Elaborarea Programelor. Metode şi tehnici
            moderne, Editura Pro-Media, Cluj-Napoca, 1994
[GAR95]     Gârbacea, I., Andonie, R., Algoritmi fundamentali – O perspectiva C++,
            Ed. Libris, Cluj-Napoca, 1995
[GEN94]     Genesereth, M.R., Ketchpel, S.P., Software agents, Communications of
            the ACM, 37(7), 1994 , pp.48-53
[GRE90]     Grefenstette, J.J., Ramsey, C.L., Schultz, A.C., Learning sequential
            decision rules using simulation models and competition, Machine
            Learning, Number 5, 1990, pp.355-382
[GRE93]     Grefenstette, J., Genetic Algorithms and Machine Learning, Proceedings
            of the 6th annual ACM Conference on Computational Learning theory,
            1993, pp.3-4
[HAD96]     Haddadi, A., Communication and Cooperation in Agent Systems, LNAI
            Volume 1056, Springer-Verlag: Berlin, Germany, 1996




                                         3
[HAY83]    Hayes-Roth, F., Waterman, A., Building Expert Systems, Addison-
           Wesley: Reading, MA, 1983
[HAR00]    Harmon, M., Harmon, S., Reinforcement Learning – A Tutorial, Wright
           State University, www-anw.cs.umass.edu./~mharmon/rltutorial/frames.
           html, 2000
[HAR85]    Harmon, P., King, D., Expert systems, John Wiley&Sons Inc, 1985
[HAR88]    Harmon, P., Maus, R., Expert systems. Tools and applications, John
           Wiley&Sons Inc, 1988
[HAS88]    Hassler, D., Artificiall Intelligence, Vol. 106, Number 2, 1988
 [HOL75]   Holland, J.H., Adaptation in natural and artificial systems, Ann Arbor,
           MI, University of Michigan Press, 1975
[HOL86]    Holland, J.H., Escaping brittleness: The possibilities of general-purpose
           learning algorithms applied to parallel rule-based systems, in Machine
           Learning, An Artificial Intelligence Approach, Volume 2, Los Altos, CA:
           Morgan Kaufman, R.S. Michalski, J.G. Carbonelli, T.M. Mitchell (Eds.)
           1986, pp.593-623
[ISH95]    Ishida, T., Korf, R.E., A moving target search. A real-time search for
           changing goals. IEEE Transaction on Pattern Analysis and Machine
           Intelligence, 1995
[JAC86]    Jackson, P., Introduction to Expert Systems, Addison-Wesley: Reading,
           MA, 1986
[JUN97]    Junling, H., Wellman, M., Learning about other agents in a dynamic
           multi-agent system, Artificial Intelligence Laboratory, Michigan, 26
           Octombrie, 1997
[JUN98]    Junling, H., Wellman, M., Online learning about other agents in a
           dynamic MAS, Proceedings of the 2nd International Conference on
           Autonomous Agents, 1998, pp.239-246
[JUR00]    Jurafsky, D., Martin, J.H., Speech and language processing, Prentice
           Hall, 2000
[KEL93]    Kelly, J., Artificial Intelligence - a modern myth, Ellis Horwood, 1993
[KIL01]    Kilgarriff, A., Gold Standard Datasets for Evaluating Word Sense
           Disambiguation Programs, http:// citeseer.nj.nec.com
[KOR90]    Korf, R.E., Real-time heuristic search, Artificial Intelligence, 1990
[KOR92]    Korf, R.E., Search, Encyclopedia of Artificial Intelligence, Wiley-
           Interscience Publication, New York, 1992
[LEE88]    Lee, K.F., Majahan, S., A pattern classification approach to evaluation
           function learning, in Artificial Intelligence, 36, 1988, pp.1-25
[LES96]    Lesperance, Y., Levesque, J., Foundation of a logical approach to agent
           programming, LNAI Volume 1037, Springer Verlag: Berlin, Germany,
           1996, pp.331-346
[LES99]    Lesser, V., Atighetchi, M., The UMASS Intelligent home project,
           Proceedings of the 3rd annual Conference on Autonomous Agents, 1999,
           pp.291-298
[LET99]    Leţia, I.A, Crăciun, F., şi alţii, Multiagent Systems, Casa Cărţii de
           Ştiinţă, Cluj-Napoca, 1999




                                       4
[LIN91/1]   Lin, L.J., Self-improving reactive agents: Case studies of reinforcement
            learning frameworks, Proceedings of the 1st International Conference on
            Simulation of Adaptive Behavior: From Animals to Animats, 1991,
            pp.297-305
[LIN91/2]   Lin, L.J., Self-improvement based on reinforcement learning, planning
            and teaching, Proceedings of the 8th International Workshop on Machine
            Learning, 1991, pp.323-327
[LIN91/1]   Lin, L.J., Programming robots using reinforcement learning and
            teaching, Proceedings of AAAI-91, 1991, pp.781-786
[MAE90]     Maes, P., Situated agents can have goals, in Designing Autonomous
            Agents, the MIT Press: Cambridge, MA, 1990, pp.49-70
[MAH90]     Mahadevan, S., Connell, J., Automatic programming of behavior-based
            robots using reinforcement learning, IBM Technical Report, 1990
[MAN99]     Manning, C., Schutze, H., Foundation of statistical natural language
            processing, MIT, 1999
[MAT94]     Mataric, J.M., Interaction and Intelligent Behavior, PhD Thesis,
            Massachusetts Institute of Technology, 1994
[MIN61]     Minsky, M.L., Computers and Thought, Mc Graw Hill, New York, 1963,
            E.A. Feigenbaum & J. Feldman (Eds), pp.406-450
[MIT98]     Mitchell, M., An introduction to Genetic Algorithms, MIT Press,
            Cambridge 1998
[MOR99]     Morley, D., Semantics of Actions, Agents and Environments, Doctoral
            Thesis, University of Melbourne, 1999
[NAR98]     Nardi, B.A., Miller, J., Wright, D., Collaborative programmable
            Intelligent Agents, Communications ACM 41, 3, March, 1998, pp.96-104
[NEW82]     Newell, A., The Knowledge Level, in Artificial Intelligence 18(1), 1982,
            pp.87-127
[NIL80]     Nilsson, N.J., Principles of Artificial Intelligence, Palo Alto, C.A.
            Morgan-Kaufmann, 1980
[NIL98]     Nilsson, N.J., Artificial Intelligence - A new Synthesis, Morgan-
            Kaufmann Publishers Inc., San Francisco, California, 1998
[PER98]     Perez-Uribe, A., Sanchez, E., BlackJack as a TestBed for Learning
            Strategies in Neural Networks, Proceedings of the IEEE International Joint
            Conference on Neural Networks, 1998
[POO98]     Poole, D., MachWorth, A., Goebel, R., Computational Intelligence - A
            Logical Approach, New York, Oxford University Press, 1998
[PYE99]     Pyeatt, D.L, Howe, A.E., Integrated POMDP and Reinforcement
            Learning for a two Layer simulated robot architecture, Proceedings of the
            3rd annual Conference on Autonomous Agents, 19989, pp.168-174
[RAO92]     Rao, A.S., Georgeff, M., An abstract architecture for rational agents,
            Proceedings of Knowledge Representation and Reasoning (KR&R-92),
            1992, pp.439-449
[RAO95]     Rao, A.S., Georgeff, M., BDI Agents: from theory to practice,
            Proceedings of the First International Conference on Multi-Agent Systems
            (ICMAS-95), pp.312-319, San Francisco, CA, 1995




                                         5
[RES98]    Resnik, P., Yarowsky, D., Distinguishing Systems and Distinguishing
           sense: new evaluation methods fot WSD, Natural Language Engineering,
           1, nr. 1, 1998
[RIC91]    Rich, E., Knight, K., Artificial Intelligence, Mc Graw Hill, New York,
           1991, 2nd ed.
[RUS95]    Russell, J.S, Norvig, P., Artificial Intelligence- A Modern Approach,
           Prentice- Hall, Inc.,New Jersey, 1995
[RUS98]    Russell, J.S., Learning agents for uncertain environments, Proceedings of
           the 11th annual Conference on Computational Learning theory, 1998,
           pp.101-103
[SAM59]    Samuel, A.L., Some studies in Machine Learning using the game of
           checkers, IBM Journal on research and Development, Number 3, 1959,
           pp.210-229
[SEB01]    Sebastiani, F., A tutorial on Automated Text Categorization, pp.1-25,
           ESSLLI 2001
[SEO00]    Seo, Y.W., Byoung, T.Z., A Reinforcement Learning agent for
           personalized information filtering, Proceedings of the 2000 International
           Conference on Intelligent user Interfaces, 2000, pp.248-251
[SER99]    Şerban, G., Searching graphs in logical programming, Universitatea
           “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică,
           Research Seminars, Seminar on Computer Science, 1999, pp.67-72
[SER00a]   Tătar, D., Şerban, G., Training probabilistic context-free grammars as
           hidden Markov models, Studia Universitatis "Babes-Bolyai", Informatica,
           XLV(1), 2000, pp.17-30
[SER00b]   Şerban, G., Intelligent agents and reinforcement learning, Universitatea
           “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică,
           Research Seminars, Seminar on Computer Science, 2000, pp.13-20
[SER00c]   Şerban, G., A method for training Intelligent Agents using Hidden
           Markov Models, Studia Universitatis "Babes-Bolyai", Informatica,
           XLV(2), 2000, pp.41-50
[SER01a]   Şerban, G., Training Hidden Markov Models - a Method for Training
           Intelligent Agents, Proceedings of the Second International Workshop of
           Central and Eastern Europe on Multi-Agent Systems, Krakow, Poland,
           2001, pp.267-276
[SER01b]   Şerban, G., A Reinforcement Learning Intelligent Agent, Studia
           Universitatis "Babes-Bolyai", Informatica, XLVI(2), 2001, pp.9-18
[SER01c]   Tătar, D., Şerban, G., A new algorithm for word sense disambiguation,
           Studia Universitatis "Babes-Bolyai", Informatica, XLVI(2), 2001, pp.99-
           108
[SER01d]   Şerban, G., Training intelligent agents as Hidden Markov Models,
           Proceedings of Abstracts of the 4th Joint Conference on Mathematics and
           Computer Science, Felix, Romania, 2001, pp.91
[SER01e]   Şerban, G., A Simple Web Agent for displaying a tree structure,
           Universitatea “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi
           Informatică, Research Seminars, Seminar on Computer Science, 2001,
           pp.41-48



                                       6
[SER02a]   Şerban, G., A New Real Time Learning Algorithm, Studia Universitatis
           "Babes-Bolyai", Informatica, XLVII(1), 2002, pp.3-14
[SER02b]   Şerban, G., Formal Methods in Agent-Based Systems, Proceedings of the
           Symposium "Zilele Academice Clujene", Cluj-Napoca, 2002, pp.59-68
[SER02c]   Şerban, G., LASG - A New Logic Architecture for Intelligent Agents,
           Studia Universitatis "Babes-Bolyai", Informatica, XLVII(2), 2002, pp.13-
           22
[SER02d]   Şerban, G., Real Time Learning in Agent Based Systems, Proceedings of
           the 4th International Workshop on Symbolic and Numeric Algorithms for
           Scientific Computing, Timişoara, Romania, 2002, pp.337-347
[SER02e]   Şerban, G., Tătar, D., A Word Sense Disambiguation Experiment for
           Romanian Language, Studia Universitatis "Babes-Bolyai", Informatica,
           XLVII(2), 2002, pp.37-42
[SER03a]   Şerban, G., A New Reinforcement Learning Algorithm, Studia
           Universitatis "Babes-Bolyai", Informatica, XLVIII(1), 2003, pp.3-14
[SER03b]   Tătar, D., Şerban, G., Words Clustering in Question Answering Systems,
           Studia Universitatis "Babes-Bolyai", Informatica, XLVIII(2), 2003, to
           print
[SER03c]   Şerban, G., A New Interface for Reinforcement Learning Software, Studia
           Universitatis "Babes-Bolyai", Informatica, XLVIII(2), 2003, to print
[SER03d]   Şerban, G., Tătar, D., Word Sense Disambiguation for Untagged
           Corpus: Application to Romanian Language, Proceedings of CICLing
           2003 (Intelligent Text Processing and Computational Linguistics), Mexico
           City, Mexic, Lecture Notes in Computer Science N 2588, Springer-
           Verlag, 2003, pp.270-275
[SER03e]   Şerban, G., Tătar, D., An Improved Algorithm on Word Sense
           Disambiguation, Proceedings of IIS 2003 (Intelligent Information Systems
           - New Trends in Intelligent Information Proccessing and Web Mining),
           Zakopane, Poland, 2003, to print
[SER03f]   Orăşan, C., Tătar, D., Şerban, G., Oneţ, A., Avram, D., How to build a
           QA system in your back-garden: application to Romanian, Proceedings of
           EACL 2003, Budapest, Hungary, 2003, to print
[SHI98]    Shimbo, M., Ishida, T., On the convergence of real-time search.. Journal
           of Japanese Society of Artificial Intelligence, 1998
[SHO93]    Shoham, Y., Agent-oriented programming, Artificial Intelligence, 60(1),
           pp.51-92, 1993
[SUN96]    Sung, K.K., Niyogi, P., A formulation for active learning with application
           to Object Detection, AI Memo No. 1438, CBCL Paper No. 116, 1996
[SUN99]    Sun, R., Session, C., Bidding in Reinforcement Learning - a paradigm for
           MAS, Proceedings of the 3rd annual Conference on Autonomous Agents,
           1999, pp.344-345
[SUT84]    Sutton, R., Temporal credit assignment in Reinforcement Learning,
           Doctoral Dissertation, Dept. of Computer and Information Science, Univ.
           of Massachusetts, Amherst, 1884
[SUT88]    Sutton, R., Learning to predict by the methods of temporal differences, in
           Machine Learning, 3, 1988, pp.9-44



                                        7
[SUT92]   Sutton, R., A Special Issue of Machine Learning on Reinforcement
          learning, Kluwer Academy Publishers, Boston, London, 1992
[SUT98]   Sutton, R.S., Barto, A.G., Reinforcement learning, The MIT Press
          Cambridge, Massachusetts, London, England, 1998
[TAT01]   Tătar, D., Inteligenţa artificială: demonstrare automată de teoreme, pre-
          lucrarea limbajului natural, Editura Albastră, 2000
[TOR90]   Torras, C., Illari, J., 2D path planning: A configuration space heuristic
          approach, The International Journal of Robotics Research, Number 2,
          1990, pp.75-91
[WAL65]   Waltz, M.D., Fu, K.S., A heuristic approach to reinforcement learning
          control systems, IEEE Transactions on Automatic Control, AC-10, 1965,
          pp.390-398
[WAN99]   Wang, F., McKenzie, E., A multi-agent based evolutionary artificial
          Neural Network for general navigation in unknown environments,
          Proceedings of the 3rd annual Conference on Autonomous Agents, 1999,
          pp.154-159
[WAT97]   Watson, M., Intelligent Java applications for Internet and intranets,
          1997, traducere, Ed. ALL, 1999
[WAT89]   Watkins, C.J., Learning from delayed rewards, Ph.D. Thesis, King's
          College, Cambridge, 1989
[WEI99]   Weiss, G., Multiagent systems – A Modern Approach to Distributed
          Artificial Intelligence, The MIT Press, Cambridge, Massachusetts,
          London, 1999
[WER87]   Werbos, P.J., Building and understanding adaptive systems: A
          statistical/numerical approach to factory automation and brain research,
          IEEE Transactions on Systems, Man and Cybernetics, Jan-Feb, 1987
[WHI89]   Whitehead, S.D., Ballard, D.H., A role for anticipation in reactive
          systems that learn, Proceedings of the 6th International Workshop on
          Machine Learning, 1989, pp.354-357
[WHI90]   Whitehead, S.D., Ballard, D.H., Active perception and Reinforcement
          Learning, Proceedings, 7th International Conference on machine Learning,
          Austin, Texas, 1990
[WHI91]   Whitehead, S.D., Ballard, D.H., Learning to perceive and act by trial
          and error, Machine Learning, Number 7, 1991, pp.45-84
[WIN84]   Winston, P.H., Artificial Intelligence, Addison Wesley, Reading, MA,
          1984, 2nd ed.
[WOL99]   Wolpert, D., Wheeler, K., Turner, K., General Principles of learning-
          based MAS, Proceedings of the 3rd annual Conference on Autonomous
          Agents, 1999, pp.77-83
[WOO95]   Wooldridge, M., Jennings, N.R., Intelligent Agents. Theory and
          practice., The Knowledge Engineering Review, 10(2), pp. 115-152, 1995
[WOO97]   Wooldridge, M., Agent-Based Software Engineering, Mitsubishi Electric
          Digital Library Group, London, 1997
[WOR01]   http://www.cogsci.princeton.edu/~wn/
[YAR95]   Yarowsky, D., Unsupervised Word Sense Disambiguation Rivaling
          Supervised Methods, Proceedings of ACL'95, pp.189-196



                                      8
[YAR99]   Yarowsky, D., Hierarchical Decision Lists for WSD, Kluwer Acadmic
          Publishers, 1999
[YAR01]   Yarowsky, D., Word-Sense Disambiguation Using Statistical Models of
          Roget's Categories Trained on Large Corpora, http:// citeseer.nj.nec.com
[YIS99]   Yishay, M., Reinforcement Learning and Mistake bounded algorithms,
          Proceedings of the 12th annual Conference on Computational Learning
          theory, 1999, pp.183-192




                                      9

Contenu connexe

Similaire à "Development Methods for Intelligent Systems"

An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...
Mohammad Salah uddin
 
Embeded microcontroler
Embeded microcontrolerEmbeded microcontroler
Embeded microcontroler
Yugo Sulistyo
 
Hibernate Reference
Hibernate ReferenceHibernate Reference
Hibernate Reference
Syed Shahul
 
Flash as3 programming
Flash as3 programmingFlash as3 programming
Flash as3 programming
subhash85
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
Dario Bonino
 
Linux_kernelmodule
Linux_kernelmodule Linux_kernelmodule
Linux_kernelmodule
sudhir1223
 

Similaire à "Development Methods for Intelligent Systems" (20)

U M Lvs I D E F
U M Lvs I D E FU M Lvs I D E F
U M Lvs I D E F
 
An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...
 
eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdf
 
Embeded microcontroler
Embeded microcontrolerEmbeded microcontroler
Embeded microcontroler
 
Hibernate Reference
Hibernate ReferenceHibernate Reference
Hibernate Reference
 
Plesk Modules
Plesk ModulesPlesk Modules
Plesk Modules
 
Flash as3 programming
Flash as3 programmingFlash as3 programming
Flash as3 programming
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
 
Software architecture for developers
Software architecture for developersSoftware architecture for developers
Software architecture for developers
 
PythonIntro
PythonIntroPythonIntro
PythonIntro
 
Whitepaper on distributed ledger technology
Whitepaper on distributed ledger technologyWhitepaper on distributed ledger technology
Whitepaper on distributed ledger technology
 
B035-2447-220K.pdf
B035-2447-220K.pdfB035-2447-220K.pdf
B035-2447-220K.pdf
 
Market Research Report
Market Research ReportMarket Research Report
Market Research Report
 
Market research report
Market research reportMarket research report
Market research report
 
thesis-2005-029
thesis-2005-029thesis-2005-029
thesis-2005-029
 
Linux_kernelmodule
Linux_kernelmodule Linux_kernelmodule
Linux_kernelmodule
 
Learn python the right way
Learn python the right wayLearn python the right way
Learn python the right way
 
A Real Time Application Integration Solution
A Real Time Application Integration SolutionA Real Time Application Integration Solution
A Real Time Application Integration Solution
 
Uni leicester
Uni leicesterUni leicester
Uni leicester
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on Steroids
 

Plus de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Plus de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

"Development Methods for Intelligent Systems"

  • 1. “BABEŞ-BOLYAI” UNIVERSITY OF CLUJ-NAPOCA FACULTY OF MATHEMATICS AND COMPUTER SCIENCE GABRIELA ŞERBAN Development methods for Intelligent Systems Abstract of the Ph.D. Thesis Supervisor Prof. univ. dr. MILITON FRENŢIU 2003
  • 2. Table of Contents page / page abstract / thesis INTRODUCTION ........................................................................................................... 2/ 5 CHAPTER I INTELLIGENT SYSTEMS.............................................................. 5/ 9 CHAPTER II MATHEMATICAL MODELS FOR INTELLIGENT LEARNING AGENTS.......................................................................... 7 / 65 CHAPTER III REINFORCEMENT LEARNING.............................................. 9 / 75 CHAPTER IV NEW APPROACHES IN INTELLIGENT SYSTEMS...................................................................................................... 11 / 107 4.1 A new real time learning algorithm (RTL) .................. 11 / 107 4.2 A new Logic Architecture for Intelligent Agents (ALSO) .................................................................................................. 12 / 115 4.3 A new reinforcement learning algorithm (URU) ..... 13 / 123 4.4 A new algorithm for words sense disambiguation... 16 / 134 CHAPTER V IMPLEMENTATIONS OF INTELLIGENT SYSTEMS...................................................................................................... 18 / 141 5.1 A reinforcement learning Intelligent Agent based on Adaptive Dynamic Programming................................. 18 / 142 5.2 An Intelligent Agent based on training Hidden Markov Models.................................................................................. 18 / 147 5.3 A game playing Reinforcement Learning Intelligent Agent ("Agent 21")............................................. 19 / 152 5.4 "Know-How" Agent....................................................................... 20 / 152 5.5 A real-time learning Intelligent Agent.............................. 20 / 153 5.6 An agent for words disambiguation................................... 21 / 159 5.7 A Web Agent to display a tree structure......................... 22 / 168 5.8 A Logic-based Agent (ALSO) ................................................. 23 / 172 5.9 An URU reinforcement learning Intelligent Agent. 23 / 174 5.10 An Agent for words clustering............................................. 23 / 181 CHAPTER VI A PROGRAMMING INTERFACE FOR REINFORCEMENT LEARNING SIMULATIONS..... 24 / 184 CONCLUSIONS ................................................................................................................................ 28 / 199 REFERENCES ................................................................................................................................ 29 / 200 1
  • 3. INTRODUCTION This PhD. thesis is the result of my researches in the field of Artificial Intelligence, particularly in Machine Learning, started in 1998. The purpose of this abstract is to present an overview of the most important aspects of the thesis, including the original results. Artificial Intelligence, one of the most recent disciplines appeared in the field of computer science, tries to understand the intelligent entities. But, unlike philosophy and psychology, which deal with the study of intelligence, too, the aim of Artificial Intelligence is building and as well understanding intelligent entities. As nobody can predict the future in detail, computers endowed with a degree of human intelligence (or more than that) would have a great impact in everyday life and in the future course of civilization. So, the field of machine learning represents one of the most recent challenges of the Artificial Intelligence, opening new perspectives for research in the field, and, of course, a way for achieving intelligent systems. In our opinion, the most challenging field of Machine Learning is Reinforcement Learning, dealing with the problem of improving the behavior of an artificial agent based on feedback of its performance. On the other hand, we consider that one of the most suitable fields for exploiting the intelligence is Natural Language Processing. This is why our researches are directed, especially, to the fields of Reinforcement Learning and Natural Language Processing. The thesis contains 133 references and is structured on six chapters, as follows. Chapter I, Intelligent Systems, discusses general aspects related to Artificial Intelligence. The field of Intelligent Agents is presented, a domain of Artificial Intelligence that offers a new problem solving method and a new way for interaction between user and computer. Some of the most important (in our opinion) abstract and concrete architectures for intelligent agents are presented, as well as the new programming paradigm, agent oriented programming. In the last but one section of the first chapter the representation based on logic and the reasoning in agent based systems is discussed. There are presented aspects related to the Software Engineering and the formal aspects of the agent based systems. The last section of the first chapter tackles with the problem of machine learning, as an extremely important research direction in Artificial Intelligence. The general model of a learning agent, aspects related to the learning of a field, as well as the most important learning strategies are presented. Chapter II, Mathematical Models for Learning Intelligent Agents, presents three mathematical models for agents that learn: Markov Decision Processes, Partial Observable Markov Decision Processes and Hidden Markov Models. In our opinion, the mathematical models described in this chapter have a very important role in modeling and designing agent based systems, especially systems that learn by reinforcement (by interaction with the environment). 2
  • 4. Chapter III, Reinforcement Learning, tackles with the extremely complex problem of reinforcement learning, a learning approach that combines two disciplines in order to solve problems that neither one could solve independently: Dynamic Programming and Supervised Learning. This chapter presents the particularities of Reinforcement Learning. There are discussed: the problem of learning based on a utility function, the passive or active learning in known or unknown environments, the problem of learning based on temporal differences (the most interesting variant of reinforcement learning), the problem of Q-learning. The most important action selection mechanisms, as well as the SARSA algorithm, the most efficient variant for Q-learning, are comparatively discussed. Chapter IV, New approaches in Intelligent Systems, presents a few new learning algorithms (approaches of Reinforcement Learning and learning algorithms that may be used for Natural Language Processing) as well as a new logic architecture for intelligent agents (showing the importance of logical approaches for intelligent agents). The learning algorithms described in this section are: an algorithm for words sense disambiguation, a real time learning algorithm, a reinforcement learning algorithm with application to problems of motion for robots in environments with obstacles. Each of the original approaches from this chapter are comparatively discussed with already existent approaches, the advantages of the new approaches being illustrated. Chapter V, Examples of Intelligent Systems, presents practical applications (concrete architectures for intelligent agents that improve their behavior by learning), i.e. implementations of the theoretical aspects presented in chapters II and IV. For each such application we developed experiments that illustrate the efficiency of the approaches. We also suggested directions for generalization and for future researches. Chapter VI, A programming interface for reinforcement learning simulations, presents a standard programming interface for reinforcement learning simulations in a known environment, both the simulation of reinforcement learning based on utility function and the simulation of Q-learning being possible. The use of the interface on an example of movement of an embodied robotic agent in an environment with obstacles, is, as well, illustrated. From the original elements presented through out this thesis we mention the following: the use of Hidden Markov Models as mathematical models for Intelligent Learning Agents [SER01a] (section 2.3); algorithm for words sense disambiguation [SER03e] (section 4.4); a real time learning algorithm - RTL [SER02a] (section 4.1); the convergence theorem for the RTL algorithm [SER02a] (Theorem 4.1.1, section 4.1); a new Logic Architecture for Intelligent Agents - ALSO [SER02c] (section 4.2); the reinforcement learning algorithm URU [SER03a] (section 4.3); 3
  • 5. the convergence theorem for the URU learning algorithm [SER03a] (Theorem 4.3.1, section 4.3); Theorems 4.3.2 and 4.3.3: the equilibrium equation of the utilities in the URU algorithm and, respectively, the error estimation in the URU algorithm [SER03a] (section 4.3); a reinforcement learning agent based on dynamic adaptive programming [SER00b] (section 5.1); intelligent agent based on training Hidden Markov Models [SER00c] (section 5.2); a game playing reinforcement learning intelligent agent [SER01b] (section 5.3); "Know-How" agent based on logic [SER02b] (section 5.4); real time learning agent [SER02d] (section 5.5); intelligent agent for words disambiguation [SER03d] (section 5.6); Web agent for displaying a tree structure [SER01e] (section 5.7); agent based on logic (ALSO) [SER02c] (section 5.8); an URU reinforcement learning intelligent agent [SER03a] (section 5.9); agent for words clustering [SER03b] (section 5.10); a programming interface for reinforcement learning simulations [SER03c] (chapter VI). * * * I would like to express my gratitude to my supervisor, Prof. Dr. Militon FRENŢIU, for his competent guidance, suggestions and encouragement. I would like to express my thanks to Assoc. Prof. Dr. Doina TĂTAR from the Department of Computer Science of the Faculty of Mathematics and Computer Science in Cluj-Napoca for her support along all the elaboration period of the thesis and for our extremely good cooperation. I also want to give my thanks to Assoc. Prof. Dr. Horia POP from the same department for all his competent recommendations. I am grateful to all the members of Computer Science Department for their helpful suggestions. 4
  • 6. CHAPTER I INTELLIGENT SYSTEMS Artificial Intelligence (AI) tries to understand intelligent entities. Therefore, a reason for studying this field is to learn about ourselves. One question that AI has to answer is the following: how is it possible that a small and slow, biological or electronic brain, to perceive, understand, predict and manipulate a world much larger and more complicated than itself. All the researchers should look themselves in the mirror in order to see an example of an intelligent system. There are two fundamental research directions in Artificial Intelligence [DUM99]. On one hand, the achievement of machines capable to reproduce the cognitive functions of the human mind, the so-called cognitive modeling. In this way, the cognitive psychology is one of the cognitive sciences whose method are taken over by Artificial Intelligence, in fields as Natural Language Processing, learning, perception. Such intelligent machines were named in the specific literature as “artificial brains”, the basic idea being building intelligent machines starting from the models of the neuronal activity. Such models are known as artificial neuronal networks. These research directions in Artificial Intelligence define the field of Computational Intelligence (Intelligent Calculus), its main characteristic being the numerical representation of knowledge. On the other hand, the achievement of programs that allows the symbolical processing of information (knowledge) using logical methods, the so-called logical- symbolical approach (rational thinking). The logical-symbolical tradition of Artificial Intelligence tries to achieve intelligent systems using deduction mechanisms in order to obtain a solution of the problem, based on the problem's description in the logical formalism. This field of the traditional Artificial Intelligence is based on a symbolic representation of the knowledge and on logical inferences on knowledge. All the AI problems could be gathered under the concept of intelligent agent. The AI goal is to describe and build agents that receive perception from the environment and executes actions. An agent is an entity that perceives its environment by sensors and acts on this environment by actions. An agent is described by: the architecture part, the so-called behavior of the agent - the action accomplished after a perceptual sequence; the program part, the so-called internal part of the agent - how the agent's inside works. The goal of AI is to achieve the program part of the agent: a function that implements the mapping from perception to action. 5
  • 7. The abstract architectures of intelligent agents described in this chapter are purely reactive agents, and agents with states. The concrete architectures presented are logic based architectures, reactive architectures, Belief-Desire-Intention architectures and layered architectures. For many AI researchers, the idea of programming computer systems in terms of mental notions such as belief, desire, and intention is the key component of agent-based computing. The concept was articulated by Yoav Shoham, in his agent-oriented programming (AOP) proposal [SHO93]. Thus, Yoav Shoham proposes a “a new programming paradigm, based on a new point of view on the computational process”. AOP may be regarded as a kind of „post-declarative‟ programming. In AOP, the idea is that, as in declarative programming, we state our goals, and let the built-in control mechanism figure out what to do in order to achieve them. So, it is obvious that, when specifying such systems, we have to use logical statements that may be formalized using a logic of knowledge. “Intelligent agents are ninety-nine percent computer science and one percent AI” [ETZ96]. The technology of intelligent agents and multi-agent systems seems set to radically alter the way complex, distributed, open systems are conceptualized and implemented. The logic based representation and reasoning in agent based systems is an important topic of the field, with formal methods having an important role [SER02b]. There are two problems related to the formalization of the agent based systems: how the formal methods could be used for describing and reasoning about agents and their behavior, and how formal methods provide a way to show that the implemented agent systems are correct with respect to their specifications. The field of intelligent agents is related to another field of AI, the field of machine learning. Machine learning represents the study of system models that, based on a set of training data, improve their performance by experiences and by learning some specific experimental knowledge. The attempt of modeling the human reasoning leads to the concept of intelligent reasoning [FLA94]. Reasoning is the process of conclusion deduction; intelligent reasoning is a kind of reasoning accomplished by humans. Most of the AI systems are deductive, able to make inferences (draw conclusions), given their initial or supplied knowledge, without being able of new knowledge acquisition or to generate new knowledge. 6
  • 8. CHAPTER II MATHEMATICAL MODELS FOR INTELLIGENT LEARNING AGENTS Formally, an agent based system is represented as a 4-tuple S , A, h,U , where S si is the space of the agent's states (the environment), A ai is the space of i i the agent's actions (the agent's behavior), h is the transition function between the states, and U (u1, u 2 ,...,u n ) is an array representing the utility function of the agent ( u i is the utility of the state s i ). Most of the computational models of reinforcement learning are based on the assumption that the environment the agent interacts with can be modeled as a Markov Decision Process, defined as follows: 1. both the agent and the environment may be modeled as finite syncronized automata (with a finite number of states); 2. the agent and the environment interact in a discrete time; 3. the agent could perceive the environment's states and use them in order to select its actions; 4. after the agent acts, the environment makes a transition to a new state; 5. the agent receives a reward after each action performed. We say that the environment is a Markov Decision Process (satisfies the Markov property) if the environment's response (signal) at the moment t+1 only depends on the state and action at the moment t. It is well known that the Markov Decision Processes are very important in the reinforcement learning theory (they represent 90% of modern reinforcement learning). In many situations the environment cannot not be “labeled” beforehand with appropriate states, and the environment's states are not completely accessible to the agent. Thus, the environment is continuous and partial observable. Partially Observable Markov Decision Processes were developed specifically to deal with this problem. The use of mathematical statistic-methods represents a leading topic in the field of Artificial Intelligence. In the last section of this chapter we present how Hidden Markov Models may be used as a mathematical tool for modeling the environment of intelligent agents [SER01a]. The Hidden Markov Model (HMM) is a generalization of Markov decision processes, more transitions (arrows) from a state for a given action being possible. Therefore, in an HMM we may have more paths covered for the same sequence of 7
  • 9. actions, which implies that P(a1, n ) (which is the probability to have as input a sequence made up of n actions, a1, a2 ,...,an , shortly written as a1, n ) is computed as the sum of the probabilities on all the possible paths. The probability on a given path is computed by multiplying the probabilities on each segment (transition) of that path. Definition. An HMM is a 4-tuple s1, S , A, P , where S is a (finite) set of states, s1 S is the initial state, A is a set of input symbols (actions), and P is a set of transitions (labeled arrows). A transition is defined as a 4-tuple: (si , s j , a k , p) , representing passing from state s i to state s j for input (action) a k , transition evaluated as having the probability p. As for a given input sequence we have more possible paths, the sequence of states it has been passed through is not deductible from the input, but hidden (this gives the name of the model we focus on). In the last section of this chapter, we present the way intelligent agents can be modeled using a Hidden Markov Model and the way they can be trained using Viterbi's algorithm [JUR00] and Baum-Welch algorithm [JUR00]. In our opinion, Hidden Markov Models are very appropriate to be used in fields as Natural Language Processing, writing and speech recognition, and also in motion problems for robots in environments with perturbations [SER01a]. 8
  • 10. CHAPTER III REINFORCEMENT LEARNING Reinforcement learning is learning the mapping from situations to actions in order to maximize a scalar reward or a reward signal (function). The learning agent does not know a priori what actions to perform, as in most of learning cases, but it has to discover alone what actions will lead to a maximum reward and try these. In most interesting and challenging cases, the actions could affect not only the immediate reward, but all the subsequent situations, and consequently all the subsequent rewards. These two characteristics - trial-error searches and delayed rewards are the most important particularities of reinforcement learning. Reinforcement Learning can be considered, at the same time, a new but a very old topic in Artificial Intelligence. The term was first used by Minsky (1961) [MIN61] and independently in control theory by Waltz and Fu (1965) [WAL65]. The first relevant learning research in this field is the checkers player of Samuel (1959) [SAM59], which uses learning based on temporal differences to handle the rewards. Of course, learning and reinforcement were studied by psychologists for more than a century, and these researches had a great impact on AI and engineering activities. In this chapter, we present the way agents can learn in much less generous environments, where the agent receives no examples, and starts with no model of the environment and no utility function. Reinforcement Learning (RL) is the way to improve the behavior of an agent, given a certain feedback about its performance. Reinforcement Learning [HAR00] is an approach to machine intelligence that combines two disciplines to successfully solve problems that neither discipline can address individually. Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control. However, traditional dynamic programming is limited in the size and complexity of the problems it can address. Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions. However, supervised learning requires sample input-output pairs from the function to be learned. In other words, supervised learning requires a set of questions with the right answers. Unfortunately, we don‟t often know the correct answers that supervised learning requires. For these reasons there has recently been much interest in a different approach known as reinforcement learning (RL). Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question. Reinforcement learning combines the 9
  • 11. fields of dynamic programming and supervised learning to yield powerful machine- learning systems. Reinforcement learning appeals to many researchers because of its generality. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment. In a reinforcement learning problem, the agent receives a feedback, known as reward or reinforcement; the reward is received at the end, in a terminal state, or in any other state, where the agent has exactly information about what he did well or wrong. A reinforcement learning problem has three fundamental parts [HAR00]: the environment – represented by “states”. Every RL system learns a mapping from situations to actions by trial-and-error interactions with a dynamic environment. This environment must at least be partially observable by the reinforcement learning system; the reinforcement function – the “goal” of the RL system is defined using the concept of a reinforcement function, which is the exact function of future reinforcements the agent seeks to maximize. In other words, there exists a mapping from state/action pairs to reinforcements; after performing an action in a given state the RL agent will receive some reinforcement (reward) in the form of a scalar value. The RL agent learns to perform actions that will maximize the sum of the reinforcements received when starting from some initial state and proceeding to a terminal state; the value (utility) function – explain how the agent learns to choose “good” actions, or even how we might measure the utility of an action. Two terms are defined: a policy determines which action should be performed in each state; a policy is a mapping from states to actions. The value of a state is defined as the sum of the reinforcements received when starting in that state and following some fixed policy to a terminal state. The value (utility) function would therefore be the mapping from states to actions that maximizes the sum of the reinforcements when starting in an arbitrary state and performing actions until a terminal state is reached. 10
  • 12. CHAPTER IV NEW APPROACHES IN INTELLIGENT SYSTEMS In this chapter we present some original approaches of Intelligent Systems: new learning algorithms (for reinforcement learning, learning algorithms with use in Natural Language Processing) and a new logic architecture for Intelligent Agents (showing the importance of logic based approaches in this field). 4.1 A new real time learning algorithm (RTL) All Artificial Intelligence problems require some sort of search [WEI99], and thus search represents an important issue in Artificial Intelligence. Search algorithms are useful for problem solving by intelligent (single or multiple) agents. In this section we propose an original algorithm (RTL) [SER02a], which extends the Learning Real-Time A* (LRTA*) algorithm [KOR90], used for solving path-finding problems. This algorithm preserves the completeness and the characteristic of LRTA* (a real-time search algorithm), providing a better exploration of the search space. LRTA* is a learning version of the A* algorithm [KOR92], a standard search algorithm in Artificial Intelligence. As we all know, the major problem of A* is, the large computational complexity. LRTA* has two major problems: 1. What happens if the search space is a cyclic graph? 2. What happens in some plateau situations - states with more successor (neighbor) states having the same minimum h' value (the estimated distance from the current state to the goal state)? The choice of the next action is nondeterministic. RTL proposes alternatives for solving these problems. The proposed solutions are: 1. We maintain a record of the visited nodes, but we do not save the h' values for each node. 2. In order to choose the next action in a given state, the agent determines the set S of states not visited by the agent and with a minimum h’ value. If S is empty, the training fails, otherwise, the agent chooses a state from S using an action selection mechanism (this allows a better exploration of the search space). The idea of our algorithm is the following: 11
  • 13. through repeated training episodes, the agent tries some paths (possible optimal) to a goal state, and memorizes the shortest; the number of trials is selected by the user; after a training trial there are two possibilities: the agent reaches a goal state; in this case the agent saves the path and it's cost; the learning process fails (the agent does not reach the final state, because it was blocked). Theorem 4.1.1. [SER02a] If the values of the heuristic function (the initial h’ values) are admissible (does not overestimate the real values), then the RTL algorithm is convergent. At the end of this section we give some directions for future research concerning the RTL algorithm. 4.2 A new Logic Architecture for Intelligent Agents (ALSO) The logic approach is a topic of Symbolic Artificial Intelligence and has its own importance for intelligent agents, even if the controversy between the traditional approach and intelligent calculus in Artificial Intelligence is well known. Moreover, the only intelligence requirement we generally make for the agents is that [WOO97] they can make an acceptable decision about what action to perform next in their environment, in time for this decision to be useful. Other requirements for intelligence will be determined by the domain in which the agent is applied: not all agents will need to be capable of learning, for example. In such situations, a logic architecture is very appropriate, and offers, in our opinion, a simple and elegant representation for the agent's environment and desired behavior. The most important disadvantage of the traditional logic architecture is the large complexity of the computational process. In this section we present a new logic architecture for intelligent agents (ALSO - a logic architecture based on stacks of goals) [SER02c]. This architecture combines the traditional logic architecture with a planning architecture [RIC91]. We consider the case of an agent whose goal is to solve a given problem (pass from an initial to a final state), based on a set of operators (rules) that may be applied on a given moment. 12
  • 14. In an ALSO architecture, we will use the declarative representation of knowledge. Each operator will be described by a list of new predicates that are made true by the operator and a list of old predicates that are made false by the operator. The two lists are ADD, respectively DELETE. Moreover, for each operator a third list is specific, PRECONDITION, which contains all the predicates that must be true in order to apply the operator. The frame axioms are implicitly specified in ALSO. Each predicate not included in the ADD or DELETE lists of a certain operator is not affected by that operator. The idea of the algorithm is to use a stack of goals (a unique stack that contains both goals and operators proposed for solving those goals). The problem solver is also based on a database that describes the current situation (state) and a set of operators described by the PRECONDITION, ADD and DELETE lists. As a case study we illustrate the use of the ALSO algorithm on a maze searching problem. The Logic Architecture based on a stack of goals improves the traditional logic architecture for intelligent agents in the following directions: Comparatively to the traditional logic approach, where the number of the frame axioms that should be saved in order to make a correct inference is large, the proposed architecture reduces this number. In other words, the space complexity is reduced. Because of a limited Depth First search, the time complexity is reduced, too. The representation is very simple and elegant. At the end of this section we give some extension possibilities of the ALSO architecture. 4.3 A new reinforcement learning algorithm (URU) In this section we propose an original algorithm (URU - Utility-Reward-Utility) [SER03a], namely a temporal difference reinforcement learning algorithm. This algorithm may be used for path finding problems in a rectangular environment. We also prove the convergence of this algorithm. An important problem in robotics is to generate a path between an initial and a final configuration, so that the robot could avoid some obstacles. This problem is called a robot path-finding problem. Many approaches to solve this problem were proposed in Artificial Intelligence and Computational Geometry [BRA82], [TOR90]. All these approaches are based on planning. The time complexity of these approaches grows exponentially with the number 13
  • 15. of moving degrees of the robot [CAN88]. We consider that heuristic approaches (section 4.1) or learning approaches (section 4.3) are important, because they contribute to reducing the time complexity of the problem. We consider the environment represented as a space of states (each state is characterized by its position in the environment - two coordinates specifying the X- coordinate, respectively the Y-coordinate of the current position). The goal of the agent is to learn to move from an initial to a final state, on a shortest path (considered as the transitions number between states). The notations used in what follows are: the environment, represented as a space of states: M {s1, s2 ,...,sn } ; si M , sf M : the initial, respectively the final state (the problem could be generalized for a set of final states); h:M (M ) : the transition function between the states, having the following meaning: h(i) { j1, j2 ,..., jk } if, at a given moment, from the state i the agent could move to one of the states j1, j2 ,..., jk ; a state j accessible from the state i, in other words j h(i) , is called neighbor state of the state i; the transition probabilities between a state i and any neighbor state j are 1 P(i, j ) . | h(i) | To solve this problem, we propose a reinforcement learning algorithm, based on learning the states' utilities (values), in which the agent receives rewards from interactions with the environment. The algorithm, based on Temporal Differences, works on the following idea: the agent starts with some initial estimates of the states‟ utilities; during certain training episodes, the agent will experiment some possible optimal paths from si to sf, updating, properly, the states' utilities estimations; during the training process the states' utilities estimations converge to their exact values, and, thus, at the end of the training process, the estimations will be in the neighborhood of the exact values. We make the following notations: U(i) - the estimated utility of the state i; R(i) - the reward received by the agent in the state i. We make the following specifications: the training process during an episode has the worst case complexity O(n2), where n is the number of environment states; 14
  • 16. in a training sequence, the agent updates the utility of the current state using only the selected successor state, not all the successors (the temporal difference characteristic). Case study If we consider the reward function defined as r(s)=-1 if s sf, and r(s)=0, otherwise, it is obvious that the goal of the learning process is to minimize the transitions number from the initial to the final state (the agent learns to move on the shortest path). We consider that the environment has the following characteristics: the environment has a rectangular form; at any given moment, from any given state, the agent can move in only four directions: North, South, East, West. In what follows, we state an original theorem that gives the convergence conditions of the URU algorithm. Theorem 4.3.1. [SER03a] Let us consider the learning problem described above and which satisfies the conditions: the initial values of the states' utilities are computed as U ( s) d (s, sf ) 2, s M where d(s1,s2) represents the Manhattan distance between the states s1 and s2; 1 the reward factor , 3 In this case, the URU algorithm is convergent (the states' utilities are convergent after the training sequence). We denote by U n (i ) the estimated utility of state i at the n-th training episode. The proof of Theorem 4.3.1 is based on the following lemmas: Lemma 1. [SER03a] At the n-th training episode of the agent the following inequality holds: U n (i) 2, i M Lemma 2. [SER03a] At the n-th training episode of the agent the following inequality holds: | U n (i) U n ( j ) | 1 for each transition from i (i sf) to j made by the agent in the current training sequence. 15
  • 17. Lemma 3. [SER03a] The inequality Un 1 (i) Un (i) holds for every i M and n N , in other words the states' utilities increase from a training episode to another. Theorem 4.3.2 gives the equilibrium equation of the states utilities after the URU algorithm has been run. Theorem 4.3.2. [SER03a] In our learning problem, the equilibrium equation of the states' utilities is given by the following equation: * * UURU (i) ri UURU ( j ), i M card(h(i)) j succesor al lui i * where UURU (i) represents the exact utility of the state i, obtained after the URU algorithm has been run. We note by card(M) the number of elements of the set M. Comparatively to the classical TD algorithm, we make the following remarks (which naturally come from the theoretical results given): the states' utilities grow faster in the URU algorithm than in the TD algorithm; in other words UURU (i) U DT (i), i M ; in the case of our learning problem, as we proved in Theorem 4.3.1, for 1 (the TD algorithm), we cannot prove the convergence of the states' utilities. Further work is planned to be done in the following directions: to analyze what happens if the transitions between states are nondeterministic (the environment is a Hidden Markov Model [SER01a]); to analyze what happens if the reward factor ( ) is not a fixed parameter, but a function whose values depend on the current state of the agent. to develop the algorithm for solving path-finding problems with multiple agents. 4.4 A new algorithm for words sense disambiguation We introduce in this section an original algorithm (BA) [SER01c] which combines the elements of Yarowsky‟s algorithm [YAR99, RES98, YAR01, YAR95] and from the Naive Bayes Classifier (NBC). This algorithm preserves the advantage of principles of Yarowsky (one sense per discourse and one sense per collocation) with the known high performance of NBC 16
  • 18. algorithms. The algorithm learns to make predictions based on local contexts, starting from a reduced set of labeled contexts (with meanings) and from a bigger set of unlabeled contexts. The two important characteristics of the BA algorithm are: the set of words used as contextual features for the disambiguation - as the number of elements from this set grows, as the disambiguation's precision grows (this is experimentally shown in section 5.6); the relation ( Wx (W ) , W being the set of words, and (W ) being the set of parts of W, particularly a set of contexts) - using this relation grows, also, the disambiguation's precision. 17
  • 19. CHAPTER V IMPLEMENTATIONS OF INTELLIGENT SYSTEMS 5.1 A reinforcement learning Intelligent Agent based on Adaptive Dynamic Programming In this section, we present an Intelligent Agent [SER00b] that learns by reinforcement using an Adaptive Dynamic Programming method (section 3.2). We consider a simple Markov decision process with n2 states. The state space can be visualized using a nxn grid. Each square represents a state. The reinforcement function is constant -1 (i.e., the agent receives a reinforcement of -1 on each transition). There are 4 actions possible in each state: North, South, East, and West. The goal states are the upper left corner and the lower right corner. The program that implements the agent with the functionality described above has been realized in two versions: in Borland Pascal and Visual Prolog. 5.2 An Intelligent Agent based on training Hidden Markov Models In this section we want to test a system represented as a Hidden Markov Model (section 2.3). We propose an agent for characters recognition [SER01a]. Let us the consider the example of an agent that has to recognize two characters “I” and “U”. We propose the model described in Figure 5.1. Figure 5.1. The Hidden Markov Model 18
  • 20. In this section we present how the Hidden Markov Model can be trained to recognize the two characters and how the recognition of other characters works. The agent implementation has been written in Microsoft Visual C++6.0. In the case of the proposed experiment, some future works may be: how the proposed model could be generalized for as many characters as possible; how the structure of the HMM could be dynamically generated; when such probabilistic methods are more suitable than other character recognition methods; how probabilistic methods could be combined with others (heuristics or otherwise) for obtaining a higher performance of the model; what would happen in certain "plateau" situations (where a given entry would have the same probability in more environments). This section aimed to emphasize another way of working on problems of training intelligent agents. 5.3 A game playing Reinforcement Learning Intelligent Agent ("Agent 21") In this section we design an Intelligent Agent [SER01b] who learns (by reinforcement) to play the "Game 21", a simplified version of the game "Black -Jack". The algorithm used for training the agent is the SARSA algorithm (section 3.6). We propose a game with two players: a dealer (which distributes the cards) and an agent, in fact the computer (which uses the SARSA algorithm for learning the game's rules and which plays against the dealer). The agent tries to win, obtaining a score (the sum of the cards' values) less than or equal to 21, but greater than the dealer's score. The probabilistic nature of this game makes it an interesting problem for testing learning algorithms, even if learning of a good playing strategy is not obvious. Learning from a teacher (the supervised learning) is not useful in this game, as the output data for a given state are unknown. The agent has to explore different actions and to develop a strategy by selecting and maintaining the actions that maximize its performance. The purpose of this application is to study the behavior of an agent, that initially knows neither the rules nor the game's goal, and that follows the way for assimilating these goals. After learning, the agent (the computer) wins about 54% of the rounds (Figure 5.2). This is due to the fact that “Game 21” is a random game. The agent tries to manage, as well as it is able to, the cards received from the dealer. 19
  • 21. As a remark, even if the game has a strong probabilistic character, after the training rounds the agent plays approximately identically or maybe even better than a human player (with a random playing strategy). Figure 5.2. Percentage of the won games for the trained agent 5.4 "Know-How" Agent This section refers to the implementation of the “Know-how” concept (section 1.3.4) in a Multi-Agent System (system with several non-cooperative agents) [SER02b]. Because of the strong logical formalism of the “Know-How” concept, the application was written in Visual Prolog. 5.5 A real-time learning Intelligent Agent In this section we describe an Intelligent Agent [SER02a] who learns to find the minimum number of transitions needed for moving (in a space of states) between an initial and a final configuration, using the RTL algorithm (section 4.1). The use of A* algorithm for solving such problems is well known. The A* algorithm, an informed search algorithm has the major disadvantage of the exponential complexity of the execution time. 20
  • 22. Thus, a learning version is very suitable, since it reduces the algorithm‟s complexity. We propose an experiment in order to illustrate the advantages of the RTL algorithm. The application is written in Borland C and implements the behavior of an Intelligent Agent whose aim is to find the way out of a maze on a shortest path. Comparatively, we have also used the LRTA* algorithm, and we noticed that, because of the insufficient exploration of the environment, the agent fails. 5.6 An agent for words disambiguation In this section we describe an Intelligent Agent [SER03e] who learns to disambiguate a polysemic word (to determine the meaning of the word in a set of contexts). The application is written in JDK 1.4 and uses the disambiguation algorithm BA described in section 4.4. We made two experiments: one for the English language and one for the Romanian language. The second experiment is very significant, because for the Romanian language something similar with WordNet does not exist and the BA algorithm only requires information that can be extracted from untagged corpus. Our aim is to use the BA algorithm for the Romanian language, to disambiguate the word poarta in some contexts obtained with an on-line corpus tool (we use a Romanian corpus). We make the following statements: the target word poarta has, in Romanian language, four possible meanings (two nouns and two verbs); we test our algorithm starting with 38 contexts for the target word; we start with four words as contextual features for the disambiguation (a single feature for each sense). As an evaluation measure of the algorithm we use the accuracy (5.1). number of contexts correctly solved A= (5.1) total number of contexts that have to be solved The accuracy of the BA algorithm in the proposed experiment is 60%. We realised an experimental comparison between the Naive Bayes Classifier and the BA algorithm with and without the relation (section 4.4). 21
  • 23. The comparative experimental results are shown in Figure 5.3. In Figure 5.3, we give a graphical representation of the accuracy/context for each algorithm. More exactly, for any given algorithm, for the i-th context we represent the accuracy (5.1) of the algorithm for the first i contexts. From Figure 5.3, it is obvious that the most efficient is the BA algorithm with the relation (the algorithm's accuracy grows with the number of contexts). 100 90 BA with relation Accuracy (%) 80 70 60 NBC Algorithm 50 40 30 BA Algorithm without 20 10 relation 0 11 16 21 26 31 36 1 6 Contexts Figure 5.3. Experimental comparison between the NBC and BA algorithms We make the following remarks: in the above experiment we grow the number of occurrences for each meaning of the target word and we remark that with two occurrences for each meaning the algorithm's accuracy grows with 10%, with three occurrences for each meaning the accuracy grows with 15%; we increase the number of input contexts (100) and we remark that the algorithm's accuracy grows with 15%. As a conclusion, if the number of words used as contextual features for the disambiguation and the number of contexts grows, the accuracy of the BA algorithm grows, too. 5.7 A Web Agent to display a tree structure In this section we present an Intelligent Agent [SER01e] aimed to browse a structure of remote files. It is an agent whose environment is a soft environment, with the sensors as software functions and the actions are software actions. 22
  • 24. 5.8 A Logic-based Agent (ALSO) In this section we describe the implementation part of a Logic Agent based on a stack of goals [SER02c], whose architecture has been presented in section 4.2. Because the above described architecture is based on logic and because the algorithm described in section 4.2 needs backtracking to find all the solutions, the implementation was written in Visual Prolog. The declarative programming languages (as Prolog) have a built-in control mechanism which allows finding all the solutions of a problem. We have to say that the stack (stacks) of goals that we have to create for applying the algorithm are implicitly memorised by the control strategy of Prolog (a mechanism which allows backtracking). 5.9 An URU reinforcement learning Intelligent Agent In this section we describe an Intelligent Agent [SER03a] that learns to find the minimum number of transitions (steps) for moving between an initial and a final configuration, using the URU algorithm (described in section 4.3). The application is written in JDK 1.4. We develop an experiment for an agent that moves in a rectangular maze, illustrating the theoretical results obtained in section 4.3. 5.10 An Agent for words clustering In this section we describe an Intelligent Agent [SER03b] who learns to cluster [MAN99] a set of words, starting from a set of contexts, from a set of words having a maximum frequency in the given contexts. We propose an experiment for the Romanian language; we ran the algorithm on 10000 contexts and 200 words, with very good results. We mention that this application is used as a part of a Question Answering (QA) system for the Romanian language [SER03f]. 23
  • 25. CHAPTER VI A PROGRAMMING INTERFACE FOR REINFORCEMENT LEARNING SIMULATIONS In this chapter we propose an original programming interface for reinforcement learning simulations in known environments [SER03c]. Using this interface, simulations are possible both for reinforcement learning based on the states utilities and learning based on actions values (Q-learning). The interface is realized in JDK 1.4, and is meant to facilitate software development for reinforcement learning in known environments. There are three basic objects: agents, environments and simulations. The agent is the learning agent and the environment is the task it interacts with. The simulation manages the interaction between agent and environment, collects data and manages the display, if any. Generally, the inputs of the agent are perceptions about the environment (in our case states from the environment), the outputs are actions, and the environment offers rewards after interacting with it. Figure 6.1 illustrates the interaction between the agent and the environment in a reinforcement learning task. Agent Perception Reward Action Environment Figure 6.1. Interaction between the Agent and the Environment The reward is a number; the environment, the actions and perceptions are instances of classes derived from the IEnvironment, IAction and IState interfaces respectively. The implementation of actions and perception may be arbitrary as long as they are properly understood by agent and environment. The agent and the environment have, obviously, to be chosen to be compatible with each other. The interaction between agent and environment is handled in discrete time. We assume we are working with simulations. In other words there are no real-time 24
  • 26. constraints enforced by the interface: the environment waits for the agent while the agent is selecting its action and the agent waits for the environment while the environment is computing its next state. We assume that the agent's environment is a finite Markov Decision Process. To use the interface, the user has to define the specialized object classes HisState, HisEnvironment and HisAgent, by creating instances for each. The agent and the environment are then passed to a simulation object (CSimulation), that initializes and interconnects them. Then, CSimulation::init() will initialize and run the simulation. If the agent learns the states' utilities, it has to be derived from the AgentUtility class, otherwise, if it learns the actions' values (Q-learning) it has to be derived from the AgentQValues class. RLAgent is an abstract class, the basic class for all the agents. The specific agents will be instances of subclasses derived from RLAgent. AgentUtility is an abstract class, a subclass of the class RLAgent, being the entity defining the behavior of an agent that learns by reinforcement based on the states' utilities (the algorithm used for learning is the URU algorithm described in section 4.3 [SER03a]). AgentQValues is an abstract class, a subclass of the class RLAgent, being the entity defining the behavior of an agent that learns by reinforcement based on actions' values (the algorithm used for learning is the SARSA algorithm [SUT98]). IEnvironment is an interface, the basic class for all environments. The specific environments will be instances of subclasses derived from IEnvironment. CSimulation is the basic object of the interface, that manages the interaction between the agent and the environment. It defines the heart of the interface, the uniform usage that all agents and environments are meant to conform to. An instance of the simulation class is associated to an instance of an agent and an environment at the creation moment. This is made in the constructor of the class CSimulation. We illustrate the use of the interface on a concrete example: the motion problem of an embodied agent whose goal is to find the way out from a maze with obstacles. The UML diagram for the interface is described below. 25
  • 27. CSimulation Interface IEnvironment a: RLAgent m: IEnvironment + isValid(s: IState): Boolean + CSimulation(a: RLAgent, m: IEnvironment) + initial(): IState + init(alpha: Real, gamma: Real, epsilon: Real, episoade: Integer): void + final(s: IState): Boolean + policy(ps: PrintStream) + next(s: IState, a: IAction, r: Integer): IState + value(s: IState): Real Abstract RLAgent MyEnvironment Interface # a: IList IList # l: IList + abstract actions(): void + add(o: Object): void + abstract choose(e: IElement, r: Real, epsilon: Real, m: IEnvironment): IElement + delete(i: Integer): void + abstract next(s: IState, m: IEnvironment): QValues + dim(): Integer + abstract initial(s: IState): IElement + elem(i: Integer): Object + learning(alpha: Real, gamma: Real, epsilon: Real, episodes: Integer) : void # notVisited(s: IState): Boolean # value(e: IElement, m: IEnvironment): Real appear(e: IElement): Boolean add(e: IElement, m: IEnvironment): void init(s: IState): void CList update(e: IElement, eurm: IElement, r: Integer, alpha: Real, gamma: Real): void list: ArrayList
  • 28. RLAgent AgentUtility AgentQValues next(s: IState, r: Integer, epsilon: Real, m: IEnvironment): IElement MyAgent Abstract IElement Interface # o: Boolean Interface IState # s: IState IAction # a: IAction + toString(): String + IElement(s: IState) + toString(): String + equal(s: IState): Boolean + IElement(s: IState, a:IAction) + equal(a: IAction): Boolean + state(): IState + action(): IAction + value(): Real + retO(): Boolean + setValue(v: Real): void MyAction + setAction(a: IAction): void MyState + setO(o: Boolean): void + toString(): String + abstract equal(e: IElement): Boolean Utility QValues + Utility(s: IState) + QValues(s: IState) + Utility(s: IState, a: IAction) + QValues(s: IState, a: IAction) 27
  • 29. CONCLUSIONS The aim of this work is to be a supporting evidence to the idea that the field of agents, and especially of intelligent agents is considered an important research and development area in the field of Computer Science and of Artificial Intelligence. On the other hand, learning problems represent an important research direction in Artificial Intelligence, machine learning, being the study of system models that, based on an input data set (training set) improve their performances by experience and by learning some specific knowledge. Reinforcement Learning, seen as a method to solve problems of machine learning, based on new knowledge acquisition [BER91], is one of the ways that may form the basis for proving the machines' performances, being, in the same time, a field open to future researches. Because of the potential to eliminate manual encoding of control strategies, reinforcement learning is one of the most active research domains in machine learning. Reinforcement learning in unknown environments is a current research topic in this field. The aim of the current work is to realise a presentation of the main results related to reinforcement learning, as well as the design and implementation of intelligent agents, to offer support to understand: why agents are considered a new important way in conceptualization and implementation of a different kind of applications; what intelligent agents are (and what they are not) and how they are related to other programming paradigms, the object oriented programming; why learning, and reinforcement learning, particularly, represent an extremely novel and dynamic field in Artificial Intelligence. The work presents some new approaches in Intelligent Systems (new learning algorithms), presenting possible generalizations, open problems, opening, in this way, new research directions in Machine Learning. The study of learning was focussed in this thesis on the model of systems with a single agent, in known environments. As future research we will consider the case of unknown environments and systems with more agents (multiagent systems).
  • 30. REFERENCES [ALE95] Allen, J., Natural language understanding, Benjamin/Cummings Publ., 2nd ed., 1195 [ANT96] Antos, A., Lugosi, A., Strong MiniMax lower bounds for learning, Proceedings of the 9th annual Conference on Computational Learning theory, 1996, pp.303-309 [BAR91] Barto, A.G., Bradtke, S.J., Singh, S.P., Real time learning and control using asynchronous dynamic programming, (Technical Report 91-57), Amherst, MA: University of Massachusetts, Computer Science Department [BAR99] Barber, K.S., Kim, J., Constructing and dynamically maintaining perspective-based agent models in a MA Environment, Proceedings of the 3rd annual Conference on Autonomous Agents, 1999, pp.416-417 [BER91] Bergadano, F., Giordana, A., Machine Learning, Ellis Horwood Limited, Chichester, England, 1991 [BET93] Betke, M., Rivest, R., Singh, M., Piecemeal Learning of an Unknown Environment, AI Memo, No. 1474, CBCL Memo No. 93, 1993 [BOO88] Booker, L., B. Classifier systems that learn world models, Machine Learning, Number 3, 1988, pp.161-192 [BOO94] Booch, G., Object-Oriented Analysis and Design, Addison-Wesley: Reading, MA, 1994 [BRA82] Brady, M., Hollerbach, J.M., Robot motion: planning and control, Cambridge MA, MIT Press, 1982 [BR086] Brooks, R.A,, A robot layered control system for a mobile robot, IEE Journal of Robotics and Automation, 2(1), pp.14-23, 1986 [CAN88] Canny, J.F., The complexity of robot motion planning, Cambridge MA, MIT Press, 1988 [CAR79] Carthy, J., Ascribing mental qualities to machines, Philosophical Perspectives in Artificial Intelligence, Harvester Press, 1979 [CÂR94] Cârstoiu, D. Sisteme expert, Editura ALL, Bucureşti, 1994 [CAS94] Cassandra, A.R., Kaelbling, L.P., Littman, M.L., Acting Optimally in Partial Observable Stochastic Domains, Proceedings, 7th AAAI-94, Seattle, Washington, 1994 [CHA95] Chaib-Draa, B., Industrial Applications of Distributed AI, Communications on the ACM, Vol. 38, No.11, 26 Octombrie, 1997 [CHA96] Charniak, E. Statistical language learning, The MIT Press, 1996 [CHE80] Chellas, B., Modal Logic: An Introduction, Cambridge University Press: Cambridge, England, 1980 [CHI99] Chiampolini, A., Laerma, E., Mello, P., Abductive Coordination for logic agents, Proceedings of the 1999 ACM symposium on Applied computing, 1999, pp.134-140 [COB02] http://titania.cobuild.collins.co.uk/form.html [DAE00] Daelemans, W., Zavrel, J., Tilburg Memory Learner - version 3.0, University of Antwerp, http://ilk.kub.nl, 2000 2
  • 31. [DAY92] Dayan, P., The convergence of TD( ) for general , Machine Learning, Number 8, 1992, pp.341-362 [DEN87] Dennett, D.C., The Intentional Stance, The MIT Press: Cambridge. MA, 1987 [DUM95] Dumitrescu, D., Inteligenţă Artificială, Litografie, Universitatea "Babeş- Bolyai" Cluj-Napoca, 1995 [DUM99] Dumitrescu, D., Principiile Inteligenţei Artificiale, Ed. Albastră, Cluj- Napoca, 1999 [ESC00] Escudero, G., Marquez, L., Rigau, G. Boosting applied to WSD, ACML, Barcelona, Spain , 2000 [ETZ96] Etzioni, O., Moving up the information food chain: Deploying softbots on the world-wide web, Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), Portland. OR, 1996 [FIE94] Fiechter, C.N., Efficient Reinforcement Learning, Proceedings of the 7th annual ACM Conference on Computer Learning theory, 1994, pp.88-97 [FIS93] Fisher, M., Concurrent MetateM, A language for modeling reactive systems, Proceedings of Parallel Architectures and languages Europe (PARLE), Springer Verlag, 1993 [FIS93] Fisher, M., Wooldrige, M., Specifying and Verifying Distributed Intelligent Systems, in Progress in Artificial Intelligence, Proceedings of Portuguese Conference on Artificial Intelligence (EPIA), Porto, Portugal, 1993, Springer Verlag as "Lecture notes in Computer Science", vol. 727 [FIS95/1] Fisher, M., Towards a semantics for Concurrent MetateM, in Executable Modal and Temporal Logics, Springer Verlag, Lecture Notes in Artificial Intelligence, vol. 897, 1995 [FIS95/2] Fisher, M., Representing and Executing Agent-based Systems, Springer Verlag, Lecture Notes in Computer Science, vol. 890, 1995 [FLA94] Flach, P., Simply Logical Intelligent reasoning by Example, John Wiley & Sons, Chichester, England, 1994 [FLO94] Florea, A., Boangiu, A., Inteligenţă Artificială, Editura Universităţii "Politehnica", Bucureşti, 1994 [FRE94] Frenţiu, M., Pârv, B., Elaborarea Programelor. Metode şi tehnici moderne, Editura Pro-Media, Cluj-Napoca, 1994 [GAR95] Gârbacea, I., Andonie, R., Algoritmi fundamentali – O perspectiva C++, Ed. Libris, Cluj-Napoca, 1995 [GEN94] Genesereth, M.R., Ketchpel, S.P., Software agents, Communications of the ACM, 37(7), 1994 , pp.48-53 [GRE90] Grefenstette, J.J., Ramsey, C.L., Schultz, A.C., Learning sequential decision rules using simulation models and competition, Machine Learning, Number 5, 1990, pp.355-382 [GRE93] Grefenstette, J., Genetic Algorithms and Machine Learning, Proceedings of the 6th annual ACM Conference on Computational Learning theory, 1993, pp.3-4 [HAD96] Haddadi, A., Communication and Cooperation in Agent Systems, LNAI Volume 1056, Springer-Verlag: Berlin, Germany, 1996 3
  • 32. [HAY83] Hayes-Roth, F., Waterman, A., Building Expert Systems, Addison- Wesley: Reading, MA, 1983 [HAR00] Harmon, M., Harmon, S., Reinforcement Learning – A Tutorial, Wright State University, www-anw.cs.umass.edu./~mharmon/rltutorial/frames. html, 2000 [HAR85] Harmon, P., King, D., Expert systems, John Wiley&Sons Inc, 1985 [HAR88] Harmon, P., Maus, R., Expert systems. Tools and applications, John Wiley&Sons Inc, 1988 [HAS88] Hassler, D., Artificiall Intelligence, Vol. 106, Number 2, 1988 [HOL75] Holland, J.H., Adaptation in natural and artificial systems, Ann Arbor, MI, University of Michigan Press, 1975 [HOL86] Holland, J.H., Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems, in Machine Learning, An Artificial Intelligence Approach, Volume 2, Los Altos, CA: Morgan Kaufman, R.S. Michalski, J.G. Carbonelli, T.M. Mitchell (Eds.) 1986, pp.593-623 [ISH95] Ishida, T., Korf, R.E., A moving target search. A real-time search for changing goals. IEEE Transaction on Pattern Analysis and Machine Intelligence, 1995 [JAC86] Jackson, P., Introduction to Expert Systems, Addison-Wesley: Reading, MA, 1986 [JUN97] Junling, H., Wellman, M., Learning about other agents in a dynamic multi-agent system, Artificial Intelligence Laboratory, Michigan, 26 Octombrie, 1997 [JUN98] Junling, H., Wellman, M., Online learning about other agents in a dynamic MAS, Proceedings of the 2nd International Conference on Autonomous Agents, 1998, pp.239-246 [JUR00] Jurafsky, D., Martin, J.H., Speech and language processing, Prentice Hall, 2000 [KEL93] Kelly, J., Artificial Intelligence - a modern myth, Ellis Horwood, 1993 [KIL01] Kilgarriff, A., Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs, http:// citeseer.nj.nec.com [KOR90] Korf, R.E., Real-time heuristic search, Artificial Intelligence, 1990 [KOR92] Korf, R.E., Search, Encyclopedia of Artificial Intelligence, Wiley- Interscience Publication, New York, 1992 [LEE88] Lee, K.F., Majahan, S., A pattern classification approach to evaluation function learning, in Artificial Intelligence, 36, 1988, pp.1-25 [LES96] Lesperance, Y., Levesque, J., Foundation of a logical approach to agent programming, LNAI Volume 1037, Springer Verlag: Berlin, Germany, 1996, pp.331-346 [LES99] Lesser, V., Atighetchi, M., The UMASS Intelligent home project, Proceedings of the 3rd annual Conference on Autonomous Agents, 1999, pp.291-298 [LET99] Leţia, I.A, Crăciun, F., şi alţii, Multiagent Systems, Casa Cărţii de Ştiinţă, Cluj-Napoca, 1999 4
  • 33. [LIN91/1] Lin, L.J., Self-improving reactive agents: Case studies of reinforcement learning frameworks, Proceedings of the 1st International Conference on Simulation of Adaptive Behavior: From Animals to Animats, 1991, pp.297-305 [LIN91/2] Lin, L.J., Self-improvement based on reinforcement learning, planning and teaching, Proceedings of the 8th International Workshop on Machine Learning, 1991, pp.323-327 [LIN91/1] Lin, L.J., Programming robots using reinforcement learning and teaching, Proceedings of AAAI-91, 1991, pp.781-786 [MAE90] Maes, P., Situated agents can have goals, in Designing Autonomous Agents, the MIT Press: Cambridge, MA, 1990, pp.49-70 [MAH90] Mahadevan, S., Connell, J., Automatic programming of behavior-based robots using reinforcement learning, IBM Technical Report, 1990 [MAN99] Manning, C., Schutze, H., Foundation of statistical natural language processing, MIT, 1999 [MAT94] Mataric, J.M., Interaction and Intelligent Behavior, PhD Thesis, Massachusetts Institute of Technology, 1994 [MIN61] Minsky, M.L., Computers and Thought, Mc Graw Hill, New York, 1963, E.A. Feigenbaum & J. Feldman (Eds), pp.406-450 [MIT98] Mitchell, M., An introduction to Genetic Algorithms, MIT Press, Cambridge 1998 [MOR99] Morley, D., Semantics of Actions, Agents and Environments, Doctoral Thesis, University of Melbourne, 1999 [NAR98] Nardi, B.A., Miller, J., Wright, D., Collaborative programmable Intelligent Agents, Communications ACM 41, 3, March, 1998, pp.96-104 [NEW82] Newell, A., The Knowledge Level, in Artificial Intelligence 18(1), 1982, pp.87-127 [NIL80] Nilsson, N.J., Principles of Artificial Intelligence, Palo Alto, C.A. Morgan-Kaufmann, 1980 [NIL98] Nilsson, N.J., Artificial Intelligence - A new Synthesis, Morgan- Kaufmann Publishers Inc., San Francisco, California, 1998 [PER98] Perez-Uribe, A., Sanchez, E., BlackJack as a TestBed for Learning Strategies in Neural Networks, Proceedings of the IEEE International Joint Conference on Neural Networks, 1998 [POO98] Poole, D., MachWorth, A., Goebel, R., Computational Intelligence - A Logical Approach, New York, Oxford University Press, 1998 [PYE99] Pyeatt, D.L, Howe, A.E., Integrated POMDP and Reinforcement Learning for a two Layer simulated robot architecture, Proceedings of the 3rd annual Conference on Autonomous Agents, 19989, pp.168-174 [RAO92] Rao, A.S., Georgeff, M., An abstract architecture for rational agents, Proceedings of Knowledge Representation and Reasoning (KR&R-92), 1992, pp.439-449 [RAO95] Rao, A.S., Georgeff, M., BDI Agents: from theory to practice, Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), pp.312-319, San Francisco, CA, 1995 5
  • 34. [RES98] Resnik, P., Yarowsky, D., Distinguishing Systems and Distinguishing sense: new evaluation methods fot WSD, Natural Language Engineering, 1, nr. 1, 1998 [RIC91] Rich, E., Knight, K., Artificial Intelligence, Mc Graw Hill, New York, 1991, 2nd ed. [RUS95] Russell, J.S, Norvig, P., Artificial Intelligence- A Modern Approach, Prentice- Hall, Inc.,New Jersey, 1995 [RUS98] Russell, J.S., Learning agents for uncertain environments, Proceedings of the 11th annual Conference on Computational Learning theory, 1998, pp.101-103 [SAM59] Samuel, A.L., Some studies in Machine Learning using the game of checkers, IBM Journal on research and Development, Number 3, 1959, pp.210-229 [SEB01] Sebastiani, F., A tutorial on Automated Text Categorization, pp.1-25, ESSLLI 2001 [SEO00] Seo, Y.W., Byoung, T.Z., A Reinforcement Learning agent for personalized information filtering, Proceedings of the 2000 International Conference on Intelligent user Interfaces, 2000, pp.248-251 [SER99] Şerban, G., Searching graphs in logical programming, Universitatea “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică, Research Seminars, Seminar on Computer Science, 1999, pp.67-72 [SER00a] Tătar, D., Şerban, G., Training probabilistic context-free grammars as hidden Markov models, Studia Universitatis "Babes-Bolyai", Informatica, XLV(1), 2000, pp.17-30 [SER00b] Şerban, G., Intelligent agents and reinforcement learning, Universitatea “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică, Research Seminars, Seminar on Computer Science, 2000, pp.13-20 [SER00c] Şerban, G., A method for training Intelligent Agents using Hidden Markov Models, Studia Universitatis "Babes-Bolyai", Informatica, XLV(2), 2000, pp.41-50 [SER01a] Şerban, G., Training Hidden Markov Models - a Method for Training Intelligent Agents, Proceedings of the Second International Workshop of Central and Eastern Europe on Multi-Agent Systems, Krakow, Poland, 2001, pp.267-276 [SER01b] Şerban, G., A Reinforcement Learning Intelligent Agent, Studia Universitatis "Babes-Bolyai", Informatica, XLVI(2), 2001, pp.9-18 [SER01c] Tătar, D., Şerban, G., A new algorithm for word sense disambiguation, Studia Universitatis "Babes-Bolyai", Informatica, XLVI(2), 2001, pp.99- 108 [SER01d] Şerban, G., Training intelligent agents as Hidden Markov Models, Proceedings of Abstracts of the 4th Joint Conference on Mathematics and Computer Science, Felix, Romania, 2001, pp.91 [SER01e] Şerban, G., A Simple Web Agent for displaying a tree structure, Universitatea “Babeş-Bolyai” Cluj-Napoca, Facultatea de Matematică şi Informatică, Research Seminars, Seminar on Computer Science, 2001, pp.41-48 6
  • 35. [SER02a] Şerban, G., A New Real Time Learning Algorithm, Studia Universitatis "Babes-Bolyai", Informatica, XLVII(1), 2002, pp.3-14 [SER02b] Şerban, G., Formal Methods in Agent-Based Systems, Proceedings of the Symposium "Zilele Academice Clujene", Cluj-Napoca, 2002, pp.59-68 [SER02c] Şerban, G., LASG - A New Logic Architecture for Intelligent Agents, Studia Universitatis "Babes-Bolyai", Informatica, XLVII(2), 2002, pp.13- 22 [SER02d] Şerban, G., Real Time Learning in Agent Based Systems, Proceedings of the 4th International Workshop on Symbolic and Numeric Algorithms for Scientific Computing, Timişoara, Romania, 2002, pp.337-347 [SER02e] Şerban, G., Tătar, D., A Word Sense Disambiguation Experiment for Romanian Language, Studia Universitatis "Babes-Bolyai", Informatica, XLVII(2), 2002, pp.37-42 [SER03a] Şerban, G., A New Reinforcement Learning Algorithm, Studia Universitatis "Babes-Bolyai", Informatica, XLVIII(1), 2003, pp.3-14 [SER03b] Tătar, D., Şerban, G., Words Clustering in Question Answering Systems, Studia Universitatis "Babes-Bolyai", Informatica, XLVIII(2), 2003, to print [SER03c] Şerban, G., A New Interface for Reinforcement Learning Software, Studia Universitatis "Babes-Bolyai", Informatica, XLVIII(2), 2003, to print [SER03d] Şerban, G., Tătar, D., Word Sense Disambiguation for Untagged Corpus: Application to Romanian Language, Proceedings of CICLing 2003 (Intelligent Text Processing and Computational Linguistics), Mexico City, Mexic, Lecture Notes in Computer Science N 2588, Springer- Verlag, 2003, pp.270-275 [SER03e] Şerban, G., Tătar, D., An Improved Algorithm on Word Sense Disambiguation, Proceedings of IIS 2003 (Intelligent Information Systems - New Trends in Intelligent Information Proccessing and Web Mining), Zakopane, Poland, 2003, to print [SER03f] Orăşan, C., Tătar, D., Şerban, G., Oneţ, A., Avram, D., How to build a QA system in your back-garden: application to Romanian, Proceedings of EACL 2003, Budapest, Hungary, 2003, to print [SHI98] Shimbo, M., Ishida, T., On the convergence of real-time search.. Journal of Japanese Society of Artificial Intelligence, 1998 [SHO93] Shoham, Y., Agent-oriented programming, Artificial Intelligence, 60(1), pp.51-92, 1993 [SUN96] Sung, K.K., Niyogi, P., A formulation for active learning with application to Object Detection, AI Memo No. 1438, CBCL Paper No. 116, 1996 [SUN99] Sun, R., Session, C., Bidding in Reinforcement Learning - a paradigm for MAS, Proceedings of the 3rd annual Conference on Autonomous Agents, 1999, pp.344-345 [SUT84] Sutton, R., Temporal credit assignment in Reinforcement Learning, Doctoral Dissertation, Dept. of Computer and Information Science, Univ. of Massachusetts, Amherst, 1884 [SUT88] Sutton, R., Learning to predict by the methods of temporal differences, in Machine Learning, 3, 1988, pp.9-44 7
  • 36. [SUT92] Sutton, R., A Special Issue of Machine Learning on Reinforcement learning, Kluwer Academy Publishers, Boston, London, 1992 [SUT98] Sutton, R.S., Barto, A.G., Reinforcement learning, The MIT Press Cambridge, Massachusetts, London, England, 1998 [TAT01] Tătar, D., Inteligenţa artificială: demonstrare automată de teoreme, pre- lucrarea limbajului natural, Editura Albastră, 2000 [TOR90] Torras, C., Illari, J., 2D path planning: A configuration space heuristic approach, The International Journal of Robotics Research, Number 2, 1990, pp.75-91 [WAL65] Waltz, M.D., Fu, K.S., A heuristic approach to reinforcement learning control systems, IEEE Transactions on Automatic Control, AC-10, 1965, pp.390-398 [WAN99] Wang, F., McKenzie, E., A multi-agent based evolutionary artificial Neural Network for general navigation in unknown environments, Proceedings of the 3rd annual Conference on Autonomous Agents, 1999, pp.154-159 [WAT97] Watson, M., Intelligent Java applications for Internet and intranets, 1997, traducere, Ed. ALL, 1999 [WAT89] Watkins, C.J., Learning from delayed rewards, Ph.D. Thesis, King's College, Cambridge, 1989 [WEI99] Weiss, G., Multiagent systems – A Modern Approach to Distributed Artificial Intelligence, The MIT Press, Cambridge, Massachusetts, London, 1999 [WER87] Werbos, P.J., Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research, IEEE Transactions on Systems, Man and Cybernetics, Jan-Feb, 1987 [WHI89] Whitehead, S.D., Ballard, D.H., A role for anticipation in reactive systems that learn, Proceedings of the 6th International Workshop on Machine Learning, 1989, pp.354-357 [WHI90] Whitehead, S.D., Ballard, D.H., Active perception and Reinforcement Learning, Proceedings, 7th International Conference on machine Learning, Austin, Texas, 1990 [WHI91] Whitehead, S.D., Ballard, D.H., Learning to perceive and act by trial and error, Machine Learning, Number 7, 1991, pp.45-84 [WIN84] Winston, P.H., Artificial Intelligence, Addison Wesley, Reading, MA, 1984, 2nd ed. [WOL99] Wolpert, D., Wheeler, K., Turner, K., General Principles of learning- based MAS, Proceedings of the 3rd annual Conference on Autonomous Agents, 1999, pp.77-83 [WOO95] Wooldridge, M., Jennings, N.R., Intelligent Agents. Theory and practice., The Knowledge Engineering Review, 10(2), pp. 115-152, 1995 [WOO97] Wooldridge, M., Agent-Based Software Engineering, Mitsubishi Electric Digital Library Group, London, 1997 [WOR01] http://www.cogsci.princeton.edu/~wn/ [YAR95] Yarowsky, D., Unsupervised Word Sense Disambiguation Rivaling Supervised Methods, Proceedings of ACL'95, pp.189-196 8
  • 37. [YAR99] Yarowsky, D., Hierarchical Decision Lists for WSD, Kluwer Acadmic Publishers, 1999 [YAR01] Yarowsky, D., Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora, http:// citeseer.nj.nec.com [YIS99] Yishay, M., Reinforcement Learning and Mistake bounded algorithms, Proceedings of the 12th annual Conference on Computational Learning theory, 1999, pp.183-192 9