SlideShare une entreprise Scribd logo
1  sur  20
Decision-Tree Learning for Negotiation Rules



                                              Zhongli Ding



                                        A paper submitted to the
                         Computer Science Electrical Engineering Department
                     in partial fulfillment of the requirements for the M.S. degree at
                               University of Maryland Baltimore County

                                              January, 2001




                                    CMSC 698 Advisory Committee
                   Dr. Peng Yun (Advisor), Associate Professor in Computer Science
               Dr. Charles K Nicholas (Reader), Associate Professor in Computer Science



Certified by
                                                        Yun Peng, CMSC698 Advisor




                                                                                          1
Abstract

The emergency of e-commerce increases the importance of research on various multi-agent systems
(MAS). MAS is used loosely to refer to any distributed system whose components (or agents) are
designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are
suitable for the domains that involve interactions between different people or organizations with
different (possibly conflicting) goals and proprietary information. A potential application area of
MAS is in the “Supply Chain Management System” to integrate a company's activities across the
entire e-supply chain - from acquisition of raw materials and purchased components through
fabrication, assembly, test, and distribution of finished goods, and roles of these individual entities
in the supply chain be implemented as distinct functional software agent which cooperate with each
other in order to implement system functionality in a e-business environment. The major
interactions in supply chains are done through negotiation strategically between enterprises and
consumers. Correspondingly, automated negotiation interactions between two or more agents (say,
buyers and sellers) in a multi-agent SCMS are very important. Much better benefits and profits can
be obtained if these autonomous negotiation agents are capable of learning and reasoning based on
experience and improving their negotiation behavior incrementally just as human negotiators.
Learning can be used either to extract an entire rules set for an agent, or to improve a pre-existing
set of rules. In this project, based on a negotiation-based MAS framework for supply chain
management and a set of negotiation performatives defined, we are trying to test the possibility of
adopt decision-tree learning (or rule-based learning) method in the negotiation process. Experiment
results on the effect of using rule-based learning method in a pair-wised negotiation process
between one buyer and one seller are presented, which show that with carefully designed data
scheme and sufficiently many training samples, decision tree learning method can be used to
effectively learn decision rules for some e-commerce activities such as negotiation in supply
chains.


Keywords          E-Commerce, Multi-Agent System, Supply Chain Management System,
Negotiation, Negotiation, Negotiation Performatives, Decision-Tree Learning, Rule-based learning




                                                                                                     2
1. Introduction
The development of computer software and hardware leads to the appearance of non-human
software agencies. A software agent is considered as an entity with goals, capable of actions
endowed with domain knowledge and situated in an environment [29]. The term of multi-agent
systems (MAS) is used loosely to refer to any distributed system whose components (or agents) are
designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are
suitable for the domains that involve interactions between different people or organizations with
different (possibly conflicting) goals and proprietary information [29]. Comparing with monolith
single systems and traditional distributed systems, due to insufficient knowledge of the system
environment, the required coordination of the activities between multiple agents and the dynamic
nature of the MAS, the design and implementation of a MAS is of considerable complexity with
respect to both its structure and its functionalities.

A supply chain is the process of moving goods from the customer order through the raw materials
stage, supply, production, and distribution of products to the customer. More formally, a supply
chain is a network of suppliers, factories, warehouses, distribution centers and retailers, through
which raw materials are acquired, transformed, produced and delivered to the customer [30]. A
supply chain management system (SCMS) manages the cooperation of these system components.
In the computational world, roles of individual entities in a supply chain can be implemented as
distinct agents. Correspondingly, a SCMS transforms to a MAS, in which functional agents
cooperate with each other in order to implement system functionality [31].

In supply chains, enterprises and consumers interact with each other strategically. A great portion
of these interactions are done through negotiation. Thus, automated negotiation interactions
between two or more agents (say, buyers and sellers) in a multi-agent SCMS are very important.
We can get much better benefits and profits if these autonomous negotiation agents are capable of
learning and reasoning based on experience and improving their negotiation behavior incrementally
just as human negotiators. Moreover, problems stemmed from the complexity of MAS can be
avoided or at least reduced by endowing the agents with the ability to adapt and to learn, that is,
with the ability to improve the future performance of the total system, of a part of it, or of a single
agent [3,17,23]. Learning can be used either to extract an entire rules set for an agent, or to improve
a pre-existing set of rules.

Two most important issues might be concerned with learning in negotiation are: How to model the
overall negotiation process, i.e. the design of the modeling framework of our negotiation-based
multi-agent system for supply chain management? What is the learning algorithm or method we
might choose for the decision-making of the agents?

In [31], researchers propose a negotiation-based MAS framework for supply chain management
and describe a number of negotiation performatives, which can be used to construct pair-wise and
third party negotiation protocols for functional agent cooperation. It also explain how to formally
model the negotiation process by using Colored Petri Nets (CPN) and provide an example of
establishing a virtual chain by solving a distributed constraint satisfaction problem.

Based on this framework, one main job is trying to test the possibility of adopt decision-tree
learning (or rule-based learning) method in the negotiation process and my project experiment is
part of this task. There exist a lot machine learning methods that might be useful, but as pointed
out later, decision-tree learning is the most suitable one. In this project, I experiment the effect of


                                                                                                     3
using rule-based learning method in a pair-wised negotiation process between one buyer and one
seller. This is the first step of our job and future work might extend this into a negotiation-based
multi-agent system that is more complex with a lot functional agents joining in, staying, bargaining,
or leaving the system.

In section 2, we give a brief summary of past research works and learning techniques. Then in
Section 3, we give a simple explanation of the designed MAS framework of the system and a set of
negotiation performatives used. In Section 4 we present our experiment results so far and give some
simple analysis. Finally, in Section 5, we make a conclusion and give our future research goals. In
Appendix, we also provide some sample experiment results.

2. Learning Overview and Decision-Tree Learning
In this section, we briefly survey existing research work on learning and adaptation in multi-agent
systems, especially those applied in e-commerce activities, and give a simple introduction of the
decision-tree learning method along with the reason we choose it, before giving the design
framework of our experimental Negotiation Rules Learning (NRL) system and the experiment
results in later sections.

2.1 Categories and Objectives of Learning

There are a number of different ways in which learning can be used within a MAS for different
objectives.

•   An agent, standing alone, can learn its owner’s intention and decision-making strategies. In this
    case, the human user is often served as the trainer whose decisions in response to environmental
    inputs are used as the training samples. The agent along with the accumulation of the training
    samples can incrementally learn the decision logic of the human, which might be difficult to
    explicitly encode as decision rules by hand.
•   An agent, standing alone, can learn to improve its responses to the environment inputs
    (including those from other agents) as long as some objective functions (e.g., various utility
    functions) are well defined. In this case, training samples are its previous interactions with the
    environment, including the corresponding the objective function values.
•   An agent can learn about other agents in order to compete or cooperate with them. This type of
    learning is deeper than the previous two in that an agent learns something that other agents used
    to make their decisions and uses such knowledge to better fine-tune its own strategy. Learning
    in this category can be as simple as learning a few parameters that other agents used to conduct
    their operation (e.g., the reservation prices and markup percentages of suppliers in a supply
    chain [24]), or can be quite complicated as learning models of other agents’ decision strategies
    [6,9,10,19,23].
•   A set of agents can learn to simultaneously adjust their respective decision processes. This type
    of learning occurs mostly in those MAS whose agents are tightly cooperating with each other to
    achieve a common goal (e.g., winning a robot soccer game), and the learning inputs often
    reflect the system’s performance (e.g., scores in a soccer game) rather than performance of
    individual agents (players) [14,18].

In some applications, MAS learning can be done in the so-called “batch mode”, i.e., the system is
trained over a set of pre-assembled training samples while the system is not in user (either before
the system’s deployment or when the system is taken offline). In most cases, however, it is


                                                                                                    4
preferred that learning is conducted in the “incremental model”, i.e., the system is incrementally
adjusting/modifying itself by learning from a continuous stream of inputs from the environment
while it is in actual use [25]. This is because 1) training samples, which record the interaction
history of an agent, can be collected more efficiently and truthfully when the system is in actual
use; and 2) incremental learning allows the system or its agents to adapt to the change of
environment in a timely fashion.

2.2 Example Techniques in MAS Learning

Interest in research on MAS learning has seen a steady increase in the past several years, there are
many MAS learning systems with vastly different architectures, different application areas, and
different machine learning techniques. What follows are brief descriptions of some examples of
these systems and the learning techniques they use.

•   Reinforcement learning: Reinforcement Learning (RL) is the process by which an agent
    improves its behavior in an environment via experience. RL is based on the idea that the
    tendency to perform an action by an agent should be strengthened (reinforced by a reward) if
    the action produces favorable results, and weakened (punished) if the action produces
    unfavorable results. One of the most important advantages of RL, in comparison with other
    learning methods, is that it requires very little prior knowledge of the environment, as it does
    not require having a target or desirable output for each input when forming a training sample.
    RL algorithms such as Q-learning [7,11] can incrementally adjust the system toward more
    favorable outcomes as long as it is provided a feedback judgment (good or back) on the
    system’s output for a given input. For this reason, RL has been seen as one of the most widely
    used learning methods in MAS [15,16,11]. Most noted application of RL is perhaps in the
    domain of robot soccer games [14,18] where the game’s outcome (win or lose) is fed back to
    train the soccer team. RL has also been applied to other problems, including setting right prices
    in competitive marketplaces [20], learning agent coordination mechanism [11], learning to
    schedule multiple goals [1], and in dealing with malicious agent in a market based MAS [21].

•   Optimization-based learning techniques: Optimization-based learning methods such as genetic
    algorithms [8], neural networks [12,14,20], and linear programming [22] have been used in
    some experimental MAS to train individual agents to optimize their performance as long as
    their performance can be specified as some forms of objective functions (e.g., their utility
    functions). One example of such systems is the DragonChain that uses genetic algorithm (GA)
    approach to improve its performance in playing the MIT “Beer Game”, a game of electronic
    supply chain for beers [8]. Mimicking the law of biological evolution of the survival of the
    fittest, the GA learning in DragonChain was able to help the system to obtain good beer order
    policies for both retailers and wholesalers by search through the huge space of possible order
    policies. Their experiment showed that this system outperformed those governed by classic
    business rules by eliminating the Bullwhip phenomenon, and more interestingly, it can
    dynamically changing its policies to adapt to changing order patterns from the customers.

•   Probabilistic learning techniques: Probabilistic learning techniques are of particular interest to
    MAS learning because of their ability to handle the high degree of uncertainty of the learning
    environment caused by agent interaction. Uncertainty is even more prevalent when an agent
    tries to learn models of other agents. In probabilistic learning, an agent does not attempt to learn
    a deterministic model of another agent, but a probability distribution of a set of possible models
    of that agent. Examples of formalisms that support probabilistic learning include the Bayesian
    Belief Networks (BBN), which represent probabilistic dependencies among variables of interest


                                                                                                      5
in a graphic representation, and Influence Diagrams (ID), which further extend BBN to include
    decision nodes and utility nodes. A decision making model for supply chain management,
    called Bazaar, was developed based on BBN [24]. In this system, an agent (say a buyer) uses
    Bayesian learning to incrementally learn the distributions of reservation prices and the markup
    rates of its negotiation partners (sellers). Work by Suryadi and Gmytrasiewicz [19] uses ID
    learning for an agent to construct models other agents. An agent in their system maintains a
    number of possible models for each of the other agents that it is interacting and the probability
    distribution of these models. When none of these existing models has sufficiently high
    probability, one of them is modified (the parameters and even the structure of the underlying
    network are changed) to better reflect the observed behavior. An unspoken assumption for the
    above probabilistic learning systems is that the learning agent must have some prior knowledge
    of the behaviors of other agents it is trying to learn. At least, it has to assume the knowledge of
    the set of parameters with which the behavior of the other agent can be expressed because these
    parameters are the necessary building blocks for the possible probabilistic models.
    Unfortunately, this assumption may not hold in many MAS applications.

•   Supervised learning: Supervised learning covers a class of learning methods that requires a
    teacher to tell the learning system what is the target/correct/desired output for each training
    input. The target output is then compared with the current system output, and the discrepancy is
    used to drive the update of the system. Supervised learning includes backpropogation training
    in neural networks, K-nearest neighbor, minimum entropy, and some form of decision tree
    learning. Supervised learning is particularly suitable for learning user models for personal
    agents and human interface agents [4,9,12,13]. This type of agent works on behalf of human
    users and tries to best satisfy the users’ need. Instead of provide detailed rules to guide the agent
    (which may not be feasible for complex tasks), the human user can easily work as the teacher to
    provide desirable response to each input as a training sample to the agent. Payne at al has used
    k-nearest neighbor method to train a user interface agent [13], while Pannu and Sycara have
    used backpropagation method to train a personal agent for text filtering and notification [12].

•   Rule-based learning: Learning rules for rule-based reasoning systems has also been reported in
    the literature [5,13]. Decision tree learning is perhaps the most mature technical for this type of
    learning. The advantage of rule-based learning lies on the fact that rules are easy for humans to
    understand. This allows domain experts to inspect and evaluate rules generated by a learning
    module, and make decision on whether to accept each of these rules. Moreover, since rules are
    probably the easiest way to represent and encode experts’ knowledge, many learning systems
    can start with a set of pre-defined rules and then let the rule-based learning module to modify
    the rule set with additional observations. Learning thus will greatly facilitate the growth,
    modification, and maintaining consistence of the knowledge base. These are precisely the
    reasons that we have chosen rule-based learning for our EECOMS Negotiation Rules learning
    (NRL) task.

2.3 Decision-Tree Learning

Simply state, a decision tree is a representation of a decision procedure for determining a class label
to associate with a given instance (represented by a set of attribute-value pairs). All non-leaf nodes
in the tree are decision nodes. A decision node is associated with a test (question on the value of a
particular attribute), and a branch corresponding to each of the possible outcomes of the test. At
each leaf node, there is a class label (answer) to an instance. Traversing a path from the root to a
leaf is much like playing a game of twenty questions, following the decisions made on each



                                                                                                       6
decision node on the path. Decision trees can be induced from examples (training samples) that are
already labeled.

One of the concerns of DT learning is how to construct trees that are as small as possible (measured
by the number of distinct paths a tree has from its roots) and at the same time consistent with the
training samples. In a worst case, the induced tree can be degenerated in which each sample has its
own unique path (the tree size would then be in the order of exponential to the number of attributes
involved). Information theoretic approach has been taken by several DT learning algorithms to
address this problem, also to a lesser extent, to the problem of generalization [15].

The basic thought of a DT learning algorithm is:

    For each decision point,
    If all remaining examples are all positive or all negative, we're done.
    Else if there are some positive and some negative examples left
      and attributes left, pick the remaining attribute which is
      the "most important", the one which tends to divide the
      remaining examples into homogeneous sets
    Else if there are no examples left, no such example has been
      observed; return default
    Else if there are no attributes left, examples with the same
      description have different classifications: noise or insufficient
      attributes or nondeterministic domain

Figure 3 below gives a simple example of DT learning. A tree of good size has been induced. It has
6 distinct paths, but it could have in the worst-case 12 distinct paths, each for a unique value
assignment pattern of (color, size, shape). This is because some general rules were induced (if color
= read, then immediately conclude the class = +; shape will be considered only if color = blue). The
figure also shows a set of if-then rules can be generated from the induced tree. Essentially, each
distinct path represents a rule: value assignments on the branches on the path constitute the
conditional part of the rule, and the value assignment of the leaf node at the end of the path
constitutes the conclusion part of the rule.




                                                                                                   7
2.4
The Choice of the DT Learning Method

An assumption we made when selecting suitable learning method is that decisions included in the
training data (extracted from messages exchanged during negotiation sessions) are good decisions.
Therefore, the goal of learning is not to attempt to further optimizing the decision process that was
used to generate the data, but to learn rules/ strategies that lead to these decisions. In other words,
the learned rules would make the decisions which are the same as (or similar to) those in the
training set if the same or similar decision parameters are given. This lead us to the choice of
supervised learning, instead of unsupervised or reinforcement learning. The training samples serve
as instructions from a teacher or supervisor as each sample gives the desired or “correct” value
assignment to the target attribute with respect to the pattern of value assignment to the decision
parameters in the sample.

Among all supervised learning methods, we have chosen to experiment with Decision Tree
Learning (DT learning) [14,15,26] for the following reasons:

•   DT learning is a mature technology. It has been studied for 20+ years, has been applied to
    various real-world problems, and the learning algorithm has been improved by several
    significant modifications.
•   The basic algorithm and its underlying principles are easy to understand.
•   It is easy to apply DT learning to specific problem domains, including our NRL task.
•   Several good, easy to use DT learning packages are commercially available (free or with
    reasonable cost) [26,27, 28].
•   It is easy to convert the induced decision tree to a set of rules, which are much easier for human
    experts to evaluate and manipulate, and to be incorporated into an existing rule based systems
    than other representations.

3. Negotiation Rules Learning System Framework
In a negotiation MAS, learning can be applied to various aspect of agent negotiation: training the
agent to make decisions in a way similar to what an experienced human manager would make;


                                                                                                     8
learning the models of an agent’s negotiation partner at different level of details; and learning
negotiation strategies that outsmart its negotiation partners. We have experimented decision tree
learning method in our EECOMS negotiation rules learning task. In this section we will give a brief
description of the negotiation rule learning system design and a set of negotiation performatives
used.

3.1 Objectives

Several theoretical issues are concerned with such a MAS system mentioned above: the high time
complexity of learning process, the lack of sufficient learning data and prior knowledge, the
inherently uncertainty of the learning results, and the stability and convergence of learning MAS.
Thus, the overall objective is to study the feasibility of employing machine learning techniques to
automatically induce negotiation decision rules for supply chain players (buyers and sellers) from
transaction data. Specifically, we aim at investigating the following:

•   Constructing a rule base by learning the decision strategy of a human expert: Initially the
    human makes all negotiation decisions. The prospective rules induced from the negotiation
    transactions are shown to the human for approval/rejection. The approved rules are then
    incorporated into existing rule base. Including humans in the loop allows us to have quality
    training samples as they are generated by an experienced human negotiator rather than an agent
    with set of not yet well-developed rules. It also makes the rule base more trustworthy to humans
    since every induced rule is inspected by a human before it is included into the rule base.

•   Learning the model of the negotiation partners’ behaviors: By properly incorporating the
    learned partner’s decision rules, an agent can make more informed decisions, and in turn
    improve its performance (reducing the negotiation time/steps and increasing the payoff).

3.2 Outline of NRL System Design

Negotiation partners (buyers and sellers), represented by computer programs in a virtual supply
chain, constitute a multi-agent system in the broad sense of this term as discussed in Section 1.
Figure 1 shows the diagram of such a system, for each side of the negotiation, we have a decision
module, and the rules in these modules can be more and more complete since each agent has
learning ability that is implemented by the learning module. Initially, we have a human negotiator
(or a set of pre-defined rules) to guide the negotiation process.




                                                                                                  9
3.3 Negotiation Performatives

All of the functional agents in a MAS should have some understanding of system ontology and use
a certain Agent Communication Language (ACL) to make conversation, transfer information, share
knowledge and negotiate with each other, which offers a minimal set of performatives to describe
agent actions and allows users to extend them if the new defined ones conform to the rules of ACL
syntax and semantics. Knowledge Query and Manipulation Language (KQML) and the ACL
defined by Foundation for Intelligent Physical Agents/Agent Communication Language (FIPA
ACL) are the most widely used and studied ACLs. In KQML there are no predefined performatives
for agent negotiation actions. In FIPA ACL there are some performatives, such as proposal, CFP
and so on, for general agent negotiation processes, but they are not sufficient for our purposes. For
example, there are no performatives to handle third party negotiation. The NRL system design
presents a negotiation performative set designed for MAS dealing with supply chain management
[31].

In the following table, we give the negotiation performatives’ name, their corresponding meaning
and the possible performatives a functional agent can use to reply when certain performative comes
in:

Name            Meaning                                  Performative Responsed
CFP             call for proposal                        Proposal | Terminate
CFMP            call for modified proposal               Proposal | Terminate
Reject          reject a proposal                        Proposal | Terminate
Terminate       Terminate the negotiation                NONE
Accept          accept a proposal                        NONE
Proposal        The action of submit a proposal          Accept | Reject | Terminate | CFMP

Initially, one agent starts negotiation by sending a CFP message to the other agent. After several
rounds of conversation in which proposes and counter-proposes are exchanged, the negotiation
between two agents will end when one side accepts (rejects) the other side’s proposal or terminates
the negotiation process without any further explanation [31].

4. Experiments and Results
A preliminary experimental learning system form NRL was constructed earlier to evaluate the
feasibility of learning decision rules for a buyer agent. A set of 27 training samples was manually
generated following the data schema. These samples were fed to C5.0, a decision-tree learning
package (a descendant of Ross Quinlan’s classic ID3 decision tree learning algorithm [14,15])
obtained from RuleQuest Research, http://www.rulequest.com/.
The learning was successfully completed, a decision tree was constructed, from which a set of eight
decision rules were generated by C5.0. These rules suggest to the buyer agent what actions it
should take next (i.e., what types of messages should be sent out next), based on factors such as
how well the terms such as price, quantity and delivery date in its current “Call-For-Proposal”
match those in the returned “Proposal” from the seller it is negotiating with, the reputation of the
seller, and how deep the current negotiation is progressing.

The initial results from the experiment were encouraging. It is shown that the decision tree learning
can be an effective tool to learn the negotiation rules from the data that reflects the past negotiation


                                                                                                     10
experience. The few rules learned are in general reasonable and consistent with the intuitions we
had and used to generate the training data. However, the experiment was restricted by the quality
of the training data (only 27 hand made samples were used), and the results were far from
convincing (only three rules were learned).

Encouraged by the results from the preliminary study, we went forward to conduct a more
extensive experiment of NRL by decision tree learning. The main extensions include the following:
•   A program to automatically generate training samples is developed. This data generator is based
    on a set of decision rules that take into considerations of all important decision factors for
    making proposals (and counter proposals). Unlike the small data set generated manually and
    somewhat arbitrarily in the preliminary study, this data generator allows us to experiment with
    large quantity of learning samples (hundreds or even thousands of them) that are consistent and
    more realistic.
•   A better data schema is selected after a series of experiments to yield good results.
•   The buyer agent not only learns decision rules for itself but also learns rules that the seller
    appears to use, thus constructs a (simple) model of the seller.

Next we summarize the experiment system and present the results.

4.1 The Training Data Generator

More realistic training data may induce better, more meaningful rules. They can also be used to test
and validate the learning results. Since no realistic data set of sufficient size is available to us, nor
could we obtain help from human experts in supply chain negotiation, we were not able to
adequately resolve this problem to our complete satisfaction. As an alternative, we have developed
a training sample automatic generator to generate as many samples as we need. This sample
generator essentially simulates the negotiation process on both the buyer side and seller side, based
on two sets of decision rules encoded in the program, to generate a stream of message exchanges
between the buyer and the seller. The actual training samples are then extracted from these
messages. By changing the relevant rules, the sample generator can be used to generate samples
reflecting different negotiation scenarios.

 Message format: (msg_type, sender, receiver, item, price, quantity, delivery_date)
  For example, a negotiation session may start with the following message from the buyer to the
  seller (CFP Buyer Seller X 9.25 50 7) that Buyer wishes to entertain a proposal from Seller for
  50 pieces of good X at price $9.25 a piece, to be delivered 7 days from this day. Seller may
  response with a message (Proposal Seller Buyer X 11.25 50 7) that it can sell these good by the
  given delivery date at the unit price of $11.25. To simplify the experiment, we chose to let the
  buyer and the seller to negotiate only one type of good, named X.

 System parameters: A set of random numbers are used to determine the attribute values for the
  CFP message for each negotiation session and the values for the initial Proposal message in
  response to the CFP message. These numbers, to an extent, simulate the dynamic nature of the
  internal and external environment in which the negotiation is taking place. These parameters
  include the following:
  Buyer’s need for good X:




                                                                                                      11
− quantity: a random number between 1 and 200 (with 0.1 probability in [1,50], 0.8
           probability in [51,150], and 0.1 probability in [151, 200]), and it can be mapped to three
           regions: [1,50] {small}, [51, 150] {ordinary}, [151, 200] {large}.
       − delivery_date: a random number between 1 and 20, it can also be mapped to three
           regions: 1~5 days{short}, 6~15 days{regular}, 16~20 days{long}.
       − asking price: a random number between 7 and 11, and the fair market unit price is $10,
           the range of all possible prices is partitioned into six regions: min (7t price< 8), low
           (8( price<9),     normalminus(9p price<10),       normalplus      (10p pricep 11),    high
           (11<price( 12), and max (12<price1 13).
       − importance of the order: a random number of binary value
       − Seller’s reputation: a random number of binary value
   Seller’s capacity to supply good X:
       − daily production capacity of good X: a random number between 8 and 12
       − current inventory: a random number between 20 and 50
       − importance of the Buyer as a customer: a random number of binary value

With these random numbers, each negotiation starts with a CFP message of different quantity,
delivery_date, and asking price. In response, the seller first determines if the requested
delivery_date can be met for the given quantity based on the current inventory and the daily
production capacities from this day to the delivery_date. To simplify the data generation, we
assume that the seller submit its initial proposal (usually with a price higher than the asking price in
the CFP message) only if the requested quantity and date can be met. The negotiation then
continues with the buyer gradually increasing the asking price and the seller decreasing the bidding
price until the session ends. The details of the negotiation are governed by decision rules at either
side.

 Negotiation rules for Seller agent: The following rules are used in the data generator to form
  response messages from Seller agent.

   SR-1: Terminate the negotiation IF Seller cannot meet the quantity-date requested in the incoming CFP message
   SR-2: Terminate the negotiation IF asking price = min (7 price< 8)
   SR-3: Terminate the negotiation IF asking price = low (8 price<9) & Buyer is NOT an important customer
   SR-4: Otherwise, submit a Proposal with the requested quantity and date, and the price is determined by:
      SR-41: IF the asking price = normalplus or high or max (10S pricep 13) THEN bidding price = asking price
      SR-42: Otherwise,
        IF incoming msg-type = CFP THEN
            IF asking price = low (8I price<9) & Buyer is an important customer
               THEN propose a higher price (bidding price>asking price)
            ELSE IF asking price = normalminus (9E price<10) Buyer is NOT an important customer
                     THEN propose a higher price (bidding price>asking price)
            ELSE IF asking price = normalminus (9E price<10) Buyer is an important customer
                     THEN bidding price = asking price
        ELSE propose a lower price (bidding price asking price)

 Negotiation rules for Buyer agent: The following rules are used in the data generator to form
  messages from Buyer agent.

   BR-1: Terminate the negotiation IF bidding price =max (12<pricet 13)
   BR-2: Terminate the negotiation IF bidding price = high (11<pricet 12) & the current depth of negotiation 1 7
   BR-3: Reject the incoming proposal IF bidding price = high (11<price t 12) & either Seller’s reputation is bad or
          this order is not important & the current depth of negotiation < 7



                                                                                                                 12
BR-4: CFMP for a lower price IF bidding price = high (11<price 12) & Seller’s reputation is good & this order is
           important & the current depth of negotiation < 7
    BR-5: Accept the current proposal IF the bidding pricet 10
    BR-6: Accept the current proposal IF the bidding price = asking price (delta_price =0)
    BR-7: Accept the current proposal IF bidding price = normalplus (10<price 11) & Seller’s reputation is good &
           The order is important.
    BR-8: CFMP for a lower price IF bidding price = normalplus (10<price 11) & either Seller’s reputation is bad or
           this order is not important & the current depth of negotiation < 7
    BR-9: Terminate IF bidding price = normalplus (10<price 11) & either Seller’s reputation is bad or this order is
           not important the current depth of negotiation n 7

The function used in negotiation to reduce and increase price by 'buyer' agent: 1/(1 + e xp(-x)) (x maybe from -3 to +3)
The function used in negotiation to reduce and increase price by 'seller' agent: (Depth_Max - depth+1) * tan(0.25)

        depth            0         1         2         3         4         5         6         7
        diff_buyer       N/A       0.05      0.12      0.27      0.50      0.73      0.88      0.95
        diff_seller      2.043     1.787     1.532     1.277     1.021     0.766     0.511     0.255


4.2 Data Schema for Training Samples

The negotiation process in NRL is very complex. Consider just the task of making a counter
proposal by a buyer when it receives a new proposal during a negotiation session. This task
amounts to optimizing a function (e.g., payoff) based on a high dimensional many-to-many
mapping. The input involves parameters reflecting the enterprise’s planning and execution
(customer orders, production capacity, inventory, etc.); the distance between asking and bidding
values of negotiation terms (prices, quantities, delivery dates, etc.) at the given point of a
negotiation session; the trustworthiness of the negotiation partner; how long the current session has
been (the longer it lasts, the less likely a good deal can be made); the importance to the buyer for
the on-going negotiation to succeed; and the availability of other alternative sellers, etc. The output
is also composed of a large number of parameters that gives a detailed description of a (counter)
proposal. The training samples for DT learning are composed by these attributes.

A training sample is a vector or a list of values for a set of attributes extracted from a message
exchanged during the negotiation. A sample can be divided into two parts. The first part involves
attributes one hopes that the learned rules can be used to generate. They are thus referred to as
learning target. The second part includes those attributes that support the conclusions of the
learned rules on assigning values to the target attributions; they are referred to as decision
parameters. The data model or data scheme for training samples specifies what is to be learned (the
target attribute) and what are the decision parameters (other attributes the target depend on).

A training sample is synthesized from three consecutive messages: the current incoming message,
the one that precedes it, and the one in response to it. Figure 2 is an example of training samples
used in our early learning experiment.




                                                                                                                     13
In this experiment, to simplify the investigation, we have decided to focus on learning rules for
determining appropriate message type for response to an incoming message, namely, our learning
target is the performative (or message type) that will be used to response the incoming one. The
target attribute will be either CFP, CFMP, Terminate, Accept, Reject for Buyer, Terminate and
Proposal for Seller.

Selection of decision parameters is more complicated. Since the type of each outgoing message
from an agent is determined by the content of the incoming message and the content of the previous
message from the same agent, a large number of attributes that may potentially affect the new
message type can be extracted from the two proceeding two messages and from their differences.
For example, consider the situation that Buyer receives a Proposal (msg-2) from Seller after
sending a CFP message (msg-1). The new message (msg-3) from Buyer, in responding to msg-2,
may depend on:

       attributes from msg-1:
                bprice                      (Buyer’s asking price)
                bquantity                   (Buyer’s requested quantity)
                bdate                       (Buyer’s requested delivery_date)
                last_msg                    (type of the last msg from Buyer to Seller)
       attributes from msg-2:
                sprice                      (Seller’s bidding price)
                squantity                   (Seller’s proposed quantity)
                sdate                       (Seller’s proposed delivery_date)
                incoming_msg                (type of the incoming msg from Seller)
       attributes from the difference between msg-1 and msg-2:
                delta_price
                delta_quantity
                delta_date
                match_dq           (true only if both delta_quantity and delta_date are zero)
       attributes about other properties of Buyer:
                opp-reputation (Buyer’s evaluation of Seller’s reputation)
                weight-item        (whether this order is important to Buyer)
                depth              (number of msgs Buyer has sent during the current session)



                                                                                                14
A small set of decision parameters may be insufficient for the learning module to differentiate
training examples, and thus resulting in a decision tree with many ambiguous leave nodes (nodes
with multiple labels). On the other hand, a large set of decision parameters may refine the decision
tree to a level that is too detailed to be practically useful because the induced tree would have a
great height and a large number of branches (rules). For example, one of our experimental run used
all the parameters listed above. The induced decision tree for Buyer has height of 14, which means
that some rules would have to check as many as 14 conditions before it draws a conclusion.
Moreover, a total of more than 100 rules are generated from this tree. It may be possible to obtain a
workable smaller set of rules can be obtained from these raw rules by some pruning techniques, but
this would require a substantial post-learning process.

After several trials, we have chosen the follows decision parameter sets for the Buyer and Seller,
respectively.

Buyer: sender, depth, receiver, last_msg, incoming_msg, item, sprice, opp_reputation, weight_item, match_qd.
Seller: sender, depth, receiver, last_msg, incoming_msg, item, bprice, opp_importance, match_qd.

4.3 The Experiment Results

We have experimented three software packages for decision tree learning, they are (1) C5.0/See5.0
from RuleQuest Research [27], (2) Ripper from ATT [28], and (3) ITI (Incremental Tree Induced)
from University of Massachusetts [26]. C5.0/See5.0 was not selected for the final experiment
because the free-of-charge version we have restricts the dataset to have no more than 200 training
samples, which are not sufficient to make the learning process converge. Ripper, although not
restricting the size of the training set, was rejected because it always produces a very small number
of rules (possibly due to a severe pruning process it uses to generate the final output). ITI was
selected not only because it works well with our learning task but also because it supports
incremental learning, a valuable feature we plan to further explore in the future.

3000 randomly generated negotiation sessions were generated by the automatic data generator
described in Section 3.1. Each session includes a sequence of message exchanges between Buyer
and Seller, starting with a CFP message from Buyer. The experiment showed that this amount of
training samples is sufficient for the learning process to converge (the induced tree becomes
stable). Datasets of smaller size may be used to learn most, but not all decision rules, because they
do not contain all possible scenarios, especially those with small probabilities. These samples were
fed into ITI under the data model described in Section 3.2. Two induced decision trees and their
corresponding data model files, for Buyer and Seller, respectively, were included in Appendix.

Learned rules for Buyer: 12 rules can be generated from the induced decision tree, corresponding
to the 12 paths in the tree. Two rules (the first and last, counting from left to right) are related to
starting and ending a session, the 10 others are rules for determining the new message types. These
rules match very well with the rules used to generate the training data. There is no apparent
inconsistency between these two sets of rules. For example, the second rule
        IF sprice = max THEN Terminate
is the same as BR-1 in Section 3.1. The next three rules (all under the condition that quantity and
delivery-date match)
        IF sprice = high AND weight_item = unimportant THEN Reject
        IF sprice = high AND weight_item = important AND opp_reputation = bad THEN Reject
        IF sprice = high AND weight_item = important AND opp_reputation = good THEN CFMP




                                                                                                               15
jointly match the rules of BR-3 and BR-4. All other induced rules also match the data generation
rules well.

Learned rules for Seller: 6 rules can be generated from the decision tree, corresponding to the 6
paths in the tree. The last two rules are related to starting and end a session, the first 4 for
determining the new message types. These rules again match very well with the rules used to
generate the training data. The first two rules from the tree
       IF bprice = min THEN Terminate
       IF bprice = low AND opp_weight = unimportant THEN Terminate
are the same as SR-2 and SR-3 in Section 3.1, the next two rules
       IF bprice = low AND opp_weight = important THEN Proposal
       IF bprice > low THEN Proposal
jointly match the rules of SR-4.

5. Conclusions and Future Work
Due to the inherent complex, uncertain, and dynamic nature of the multi-agent systems, it is very
difficult, if not impossible to encode agents’ decision strategies a priori. Learning while doing
becomes imperative for constructing MAS in applications. This is also true for automatic
negotiation systems for supply chain management where each entity operates autonomously and
interacts with others (its negotiating partners) to reach deals. We have begun an empirical
investigation of the feasibility of adopting some existing machine learning techniques to learn
negotiation rules (NRL) from transaction data (mainly from the messages exchanged among
negotiation agents). Our experiment results show that, with carefully designed data scheme and
sufficiently many training samples, decision tree learning method can be used to effectively learn
decision rules for some e-commerce activities such as negotiation in supply chains.

More interestingly, our experiment showed that the Buyer agent could learn the model of its partner
(the Seller) using only the information available in the messages they exchanged during the
negotiation. Although the model learned is about the behavior of the seller, not about the
underlying mechanism governing the decision making at the Seller agent, it may provide the buyer
some power to predict the response from the seller or to choose the actions that will bring the most
desired responses.

Although our experiment only involves learning how to determine one aspect of the responding
message, namely its message type, it is conceivable that this method can be used to learn other
aspects (e.g., how to set the price) from the same raw data (possibly with different data models). In
other words, a more complete model (with multiple facets) of an agent can be constructed by
simultaneously running multiple decision tree learning processes, one for an aspect of the agent.

In future, more investigation can be considered in the following directions: (1) Experiment decision
tree learning with other, preferably real-world training data. (2) Study the issue of how to
incorporate the learned model of your partner (or opponent) to improve your negotiation decision-
making. (3) Investigate the applicability of incremental decision tree learning and how it can
improve the agent’s performance by making it adaptive to the changing environment. (4) Develop
some hybrid learning architecture that employs different learning techniques for different aspects of
the e-commerce activities.

References


                                                                                                  16
[1]  Arai, S. Sycara, K., and Payne, T.R. (2000). Multi-agent Reinforcement Learning for Scheduling
     Multiple-Goals. In Proceedings of the Fourth International Conference on Multi-Agent Systems
     (ICMAS'2000).
[2] Arrow, K. (1962). The implications of learning by doing. Review of Economic Studies, 29, 166170.
[3] Brazdil, P., Gams, M., Sian, S., Torgo, L., & van de Velde, W. (1991). Learning in distributed systems
     and multi-agent environments. In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 412--423).
     Lecture Notes in Artificial Intelligence, vol. 482. Berlin: SpringerVerlag.
[4] Caglayan, A., et al., (1996). Lessons from Open Sesame!, a User Interface Agent. In Proceedings of
     PAAM ’96.
[5] Haynes, T., Lau, K., and Sen, S. (1996). Learning Cases to Compliment Rules for Conflict Resolution in
     Multiagent Systems. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and
     Learning in Multiagent Systems, Stanford, CA, March, 1996.
[6] Hu, J. and Wellman, M. (1998). Online Learning About Other Agents in a Dynamic Multiagent System.
     In Proceedings of the Second International Conference on Autonomous Agents (Agents98), Minneapolis,
     MN, USA, May 1998
[7] Humphrys, M. (1995). Wlearning: Competition among selfish Q-learners. Technical Report no. 362.
     Computer Laboratory, University of Cambridge.
[8] Kimbrough, S.O., Wu, D.J., and Zhong, F. (2000). Artificial Agents Play the Beer Game, Eliminate the
     Bullwhip Effect, and Whip the MBAs. http://grace.wharton.upenn.edu/~sok/fmec/schedule.html
[9] Maes, P. (1994). Social interface agents: Acquiring competence by learning from users and other agents.
     In O. Etzioni (Ed.), Working Notes of the 1994 AAAI Spring Symposium on Software Agents.
[10] Mor, Y., Goldman, C.V., and Rosenschein, J.S. (1996). Learn Your Opponent's Strategy (in Polynomial
     Time). In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 164-176).
     Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag, 1996.
[11] Mundhe, M. and Sen, S. (1999). Evaluating Concurrent Reinforcement Learners. In Proceedings of
     IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden.
[12] Pannu, A. and Sycara, K. (1996). Learning Personal Agent for Text Filtering and Notification. In
     Proceedings of the International Conference of Knowledge-Based Systems (KBCS 96), Dec., 1996.
[13] Payne, T.R., Edwards, P., & Green, C.L. (1995). Experience with rule induction and k-nearest neighbor
     methods for interface agents that learn. In WSIMLC95).
[14] Quinlan, J.R. (1986). “Induction of Decision Trees”. Machine Learning, 1, 81-106.
                                                                                                          th
[15] Quinlan, J.R. (1993). “Combining Instance-Based and Model-Based Learning”, in Proceedings of 10
     International Conference on Machine Learning, 236-243.
[16] Schmidhuber, J. (1996). A General Method For Multi-Agent Reinforcement Learning In Unrestricted
     Environments. Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and
     Learning in Multiagent Systems, Stanford, CA, March, 1996.
[17] Sian, S.S. (1991). Extending Learning to Multiple Agents: Issues and a Model for Multi-Agent Machine
     Learning (MAML). In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 440--456). Berlin:
     Springer-Verlag.
[18] Stone, P. and Veloso, M. (1996). Collaborative and Adversarial Learning: A Case Study in Robotic
     Soccer. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in
     Multiagent Systems, Stanford, CA, March, 1996.
[19] Suryadi, D. and Gmytrasiewicz, P.J. Learning Models of Other Agents Using Influence Diagrams. In
     Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999,
     Stockholm, Sweden.
[20] Tesauro, G. (1999). Pricing in Agent Economies Using Neural Networks and Multi-Agent Q-learning.
     In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999,
     Stockholm, Sweden.
[21] Vidal, J. and Durfee, E. (1997) Agents Learning about Agents: A Framework and Analysis. In Working
     Papers of the AAAI-97 Workshop on Multiagent Learning.
[22] Weiß, G. and S. Sen (Eds.) (1996). Adaptation and Learning in Multi-Agent Systems (pp. 1-21).
     Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag.
[23] Weiß, G. (1996). Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography,
     In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 1-21). Lecture
     Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag.


                                                                                                         17
Supplier




[24]Zeng, D. and Sycara, K (1996). Bayesian Learning in Negotiation. In Working Notes of the AAAI 1996
     Stanford Spring Symposium Series on Adaptation, Coevolution and Learning in Multiagent Systems.
[25] Zeng, D. and Sycara, K (1997). Benefits of Learning in Negotiation. In Proceedings of AAAI.
[26] http://www.cs.umass.edu/~lrn/iti/ -- Incremental Tree Induced (ITI) By UMASS
[27] http://www.rulequest.com/ -- C5.0/See5.0 for DT learning from Rulequest Research
[28] http://www.research.att.com/~diane/ripper/ripper-2.5.tar.gz -- Ripper for DT learning from ATT
   .
[29] P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,”
Under review for journal publication, February, 1997.
[30] M. Barbuceanu, and M. S. Fox, “The Information Agent: An Infrastructure Agent Supporting
Collaborative Enterprise Architectures,” in Proceedings of Third Workshop on Enabling Technologies:
Infrastructure for Collaborative Enterprises, Morgantown, West Virginia, IEEE Computer Society Press,
1994.
[31] Ye Chen, Yun peng, Tim Finin, Yannis Labrou, Scott Cost, BIll Chu, Rongming Sun and Bob
Willhelm, “A negotiation-based Multi-agent System for Supply Chain Management”, Workshop on supply
chain management, Autonomous Agents '99, Seattle, WA, May 1999.


Acknowledgement
I will like to thank my advisor Dr. Yun Peng for his great help for this Master’s project and Dr.
Charles K Nicholas for the reviewer of this report. I also want to thank Mr. Ye Chen and Dr. Tim
Finin for their pertinent suggestions.




Appendix: Induced decision trees (by ITI)
Data Model: selleri.names

          Terminate, Proposal, Stop, NIL.

          sender: buyer, seller.
          depth: continuous.
          receiver: buyer, seller.
          last_msg: Terminate, Proposal, Stop, NIL.
          response_msg: CFP, CFMP, Terminate, Accept, Reject, NIL.
          item: X, Y.
          bprice: min, low, normalminus, normalplus, high, max, NIL.
          opp_weight: important, unimportant, NIL.
          match_qd: match, unmatch, NIL.



Decision tree for the Seller agent




                                                                                                   18
Data Model: buyeri.names

          CFP, CFMP, Terminate, Accept, Reject, Stop, NIL.

          sender: buyer, seller.
          depth: continuous.
          receiver: buyer, seller.
          last_msg: CFP, CFMP, Terminate, Accept, Reject, Stop, NIL.
          response_msg: Terminate, Proposal, NIL.
          item: X, Y.
          sprice: min, low, normalminus, normalplus, high, max, NIL.
          opp_reputation: bad, good, NIL.
          weight_item: important, unimportant, NIL.
          match_qd: match, unmatch, NIL.

Decision tree for the Buyer agent




                                                                       19
20

Contenu connexe

Tendances (6)

Corporate Policy Governance in Secure MD5 Data Changes and Multi Hand Adminis...
Corporate Policy Governance in Secure MD5 Data Changes and Multi Hand Adminis...Corporate Policy Governance in Secure MD5 Data Changes and Multi Hand Adminis...
Corporate Policy Governance in Secure MD5 Data Changes and Multi Hand Adminis...
 
Feature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through OntologyFeature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through Ontology
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning Models
 
Birthof Relation Database
Birthof Relation DatabaseBirthof Relation Database
Birthof Relation Database
 
A Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for Large Shared Data BanksA Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for Large Shared Data Banks
 
Amazon SimpleDB
Amazon SimpleDBAmazon SimpleDB
Amazon SimpleDB
 

En vedette (7)

EDRG12_Re.doc
EDRG12_Re.docEDRG12_Re.doc
EDRG12_Re.doc
 
[ppt]
[ppt][ppt]
[ppt]
 
From Thoughts to Action
From Thoughts to ActionFrom Thoughts to Action
From Thoughts to Action
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
JENIS – JENIS SISTEM OPERASI PADA KOMPUTER DAN HANDPHONE NAMA ...
JENIS – JENIS SISTEM OPERASI PADA KOMPUTER DAN HANDPHONE NAMA ...JENIS – JENIS SISTEM OPERASI PADA KOMPUTER DAN HANDPHONE NAMA ...
JENIS – JENIS SISTEM OPERASI PADA KOMPUTER DAN HANDPHONE NAMA ...
 
Web Design Course Outline
Web Design Course OutlineWeb Design Course Outline
Web Design Course Outline
 

Similaire à CMSC698.doc

AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
csandit
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
cscpconf
 
Applying user modelling to human computer interaction design
Applying user modelling to human computer interaction designApplying user modelling to human computer interaction design
Applying user modelling to human computer interaction design
Nika Stuard
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
The state of network organization
The state of network organizationThe state of network organization
The state of network organization
Madhu Shridhar
 

Similaire à CMSC698.doc (20)

Recommender-technology-ReColl08
Recommender-technology-ReColl08Recommender-technology-ReColl08
Recommender-technology-ReColl08
 
Ap03402460251
Ap03402460251Ap03402460251
Ap03402460251
 
Multi-Agent Architecture for Distributed IT GRC Platform
 Multi-Agent Architecture for Distributed IT GRC Platform Multi-Agent Architecture for Distributed IT GRC Platform
Multi-Agent Architecture for Distributed IT GRC Platform
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
 
Improving the quality of information in strategic scanning system network app...
Improving the quality of information in strategic scanning system network app...Improving the quality of information in strategic scanning system network app...
Improving the quality of information in strategic scanning system network app...
 
Applying user modelling to human computer interaction design
Applying user modelling to human computer interaction designApplying user modelling to human computer interaction design
Applying user modelling to human computer interaction design
 
D046031927
D046031927D046031927
D046031927
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Metamodel for reputation based agents system – case study for electrical dist...
Metamodel for reputation based agents system – case study for electrical dist...Metamodel for reputation based agents system – case study for electrical dist...
Metamodel for reputation based agents system – case study for electrical dist...
 
The state of network organization
The state of network organizationThe state of network organization
The state of network organization
 
STUDY OF AGENT ASSISTED METHODOLOGIES FOR DEVELOPMENT OF A SYSTEM
STUDY OF AGENT ASSISTED METHODOLOGIES FOR DEVELOPMENT OF A SYSTEMSTUDY OF AGENT ASSISTED METHODOLOGIES FOR DEVELOPMENT OF A SYSTEM
STUDY OF AGENT ASSISTED METHODOLOGIES FOR DEVELOPMENT OF A SYSTEM
 
Organizational security architecture for critical infrastructure
Organizational security architecture for critical infrastructureOrganizational security architecture for critical infrastructure
Organizational security architecture for critical infrastructure
 
D017141823
D017141823D017141823
D017141823
 
journalism research
journalism researchjournalism research
journalism research
 
journalism research
journalism researchjournalism research
journalism research
 
Assignment
AssignmentAssignment
Assignment
 
System Modeling & Simulation Introduction
System Modeling & Simulation  IntroductionSystem Modeling & Simulation  Introduction
System Modeling & Simulation Introduction
 
Model Based Systems Thinking
Model Based Systems ThinkingModel Based Systems Thinking
Model Based Systems Thinking
 
Tam &amp; toe
Tam &amp; toeTam &amp; toe
Tam &amp; toe
 

Plus de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Plus de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

CMSC698.doc

  • 1. Decision-Tree Learning for Negotiation Rules Zhongli Ding A paper submitted to the Computer Science Electrical Engineering Department in partial fulfillment of the requirements for the M.S. degree at University of Maryland Baltimore County January, 2001 CMSC 698 Advisory Committee Dr. Peng Yun (Advisor), Associate Professor in Computer Science Dr. Charles K Nicholas (Reader), Associate Professor in Computer Science Certified by Yun Peng, CMSC698 Advisor 1
  • 2. Abstract The emergency of e-commerce increases the importance of research on various multi-agent systems (MAS). MAS is used loosely to refer to any distributed system whose components (or agents) are designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are suitable for the domains that involve interactions between different people or organizations with different (possibly conflicting) goals and proprietary information. A potential application area of MAS is in the “Supply Chain Management System” to integrate a company's activities across the entire e-supply chain - from acquisition of raw materials and purchased components through fabrication, assembly, test, and distribution of finished goods, and roles of these individual entities in the supply chain be implemented as distinct functional software agent which cooperate with each other in order to implement system functionality in a e-business environment. The major interactions in supply chains are done through negotiation strategically between enterprises and consumers. Correspondingly, automated negotiation interactions between two or more agents (say, buyers and sellers) in a multi-agent SCMS are very important. Much better benefits and profits can be obtained if these autonomous negotiation agents are capable of learning and reasoning based on experience and improving their negotiation behavior incrementally just as human negotiators. Learning can be used either to extract an entire rules set for an agent, or to improve a pre-existing set of rules. In this project, based on a negotiation-based MAS framework for supply chain management and a set of negotiation performatives defined, we are trying to test the possibility of adopt decision-tree learning (or rule-based learning) method in the negotiation process. Experiment results on the effect of using rule-based learning method in a pair-wised negotiation process between one buyer and one seller are presented, which show that with carefully designed data scheme and sufficiently many training samples, decision tree learning method can be used to effectively learn decision rules for some e-commerce activities such as negotiation in supply chains. Keywords E-Commerce, Multi-Agent System, Supply Chain Management System, Negotiation, Negotiation, Negotiation Performatives, Decision-Tree Learning, Rule-based learning 2
  • 3. 1. Introduction The development of computer software and hardware leads to the appearance of non-human software agencies. A software agent is considered as an entity with goals, capable of actions endowed with domain knowledge and situated in an environment [29]. The term of multi-agent systems (MAS) is used loosely to refer to any distributed system whose components (or agents) are designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are suitable for the domains that involve interactions between different people or organizations with different (possibly conflicting) goals and proprietary information [29]. Comparing with monolith single systems and traditional distributed systems, due to insufficient knowledge of the system environment, the required coordination of the activities between multiple agents and the dynamic nature of the MAS, the design and implementation of a MAS is of considerable complexity with respect to both its structure and its functionalities. A supply chain is the process of moving goods from the customer order through the raw materials stage, supply, production, and distribution of products to the customer. More formally, a supply chain is a network of suppliers, factories, warehouses, distribution centers and retailers, through which raw materials are acquired, transformed, produced and delivered to the customer [30]. A supply chain management system (SCMS) manages the cooperation of these system components. In the computational world, roles of individual entities in a supply chain can be implemented as distinct agents. Correspondingly, a SCMS transforms to a MAS, in which functional agents cooperate with each other in order to implement system functionality [31]. In supply chains, enterprises and consumers interact with each other strategically. A great portion of these interactions are done through negotiation. Thus, automated negotiation interactions between two or more agents (say, buyers and sellers) in a multi-agent SCMS are very important. We can get much better benefits and profits if these autonomous negotiation agents are capable of learning and reasoning based on experience and improving their negotiation behavior incrementally just as human negotiators. Moreover, problems stemmed from the complexity of MAS can be avoided or at least reduced by endowing the agents with the ability to adapt and to learn, that is, with the ability to improve the future performance of the total system, of a part of it, or of a single agent [3,17,23]. Learning can be used either to extract an entire rules set for an agent, or to improve a pre-existing set of rules. Two most important issues might be concerned with learning in negotiation are: How to model the overall negotiation process, i.e. the design of the modeling framework of our negotiation-based multi-agent system for supply chain management? What is the learning algorithm or method we might choose for the decision-making of the agents? In [31], researchers propose a negotiation-based MAS framework for supply chain management and describe a number of negotiation performatives, which can be used to construct pair-wise and third party negotiation protocols for functional agent cooperation. It also explain how to formally model the negotiation process by using Colored Petri Nets (CPN) and provide an example of establishing a virtual chain by solving a distributed constraint satisfaction problem. Based on this framework, one main job is trying to test the possibility of adopt decision-tree learning (or rule-based learning) method in the negotiation process and my project experiment is part of this task. There exist a lot machine learning methods that might be useful, but as pointed out later, decision-tree learning is the most suitable one. In this project, I experiment the effect of 3
  • 4. using rule-based learning method in a pair-wised negotiation process between one buyer and one seller. This is the first step of our job and future work might extend this into a negotiation-based multi-agent system that is more complex with a lot functional agents joining in, staying, bargaining, or leaving the system. In section 2, we give a brief summary of past research works and learning techniques. Then in Section 3, we give a simple explanation of the designed MAS framework of the system and a set of negotiation performatives used. In Section 4 we present our experiment results so far and give some simple analysis. Finally, in Section 5, we make a conclusion and give our future research goals. In Appendix, we also provide some sample experiment results. 2. Learning Overview and Decision-Tree Learning In this section, we briefly survey existing research work on learning and adaptation in multi-agent systems, especially those applied in e-commerce activities, and give a simple introduction of the decision-tree learning method along with the reason we choose it, before giving the design framework of our experimental Negotiation Rules Learning (NRL) system and the experiment results in later sections. 2.1 Categories and Objectives of Learning There are a number of different ways in which learning can be used within a MAS for different objectives. • An agent, standing alone, can learn its owner’s intention and decision-making strategies. In this case, the human user is often served as the trainer whose decisions in response to environmental inputs are used as the training samples. The agent along with the accumulation of the training samples can incrementally learn the decision logic of the human, which might be difficult to explicitly encode as decision rules by hand. • An agent, standing alone, can learn to improve its responses to the environment inputs (including those from other agents) as long as some objective functions (e.g., various utility functions) are well defined. In this case, training samples are its previous interactions with the environment, including the corresponding the objective function values. • An agent can learn about other agents in order to compete or cooperate with them. This type of learning is deeper than the previous two in that an agent learns something that other agents used to make their decisions and uses such knowledge to better fine-tune its own strategy. Learning in this category can be as simple as learning a few parameters that other agents used to conduct their operation (e.g., the reservation prices and markup percentages of suppliers in a supply chain [24]), or can be quite complicated as learning models of other agents’ decision strategies [6,9,10,19,23]. • A set of agents can learn to simultaneously adjust their respective decision processes. This type of learning occurs mostly in those MAS whose agents are tightly cooperating with each other to achieve a common goal (e.g., winning a robot soccer game), and the learning inputs often reflect the system’s performance (e.g., scores in a soccer game) rather than performance of individual agents (players) [14,18]. In some applications, MAS learning can be done in the so-called “batch mode”, i.e., the system is trained over a set of pre-assembled training samples while the system is not in user (either before the system’s deployment or when the system is taken offline). In most cases, however, it is 4
  • 5. preferred that learning is conducted in the “incremental model”, i.e., the system is incrementally adjusting/modifying itself by learning from a continuous stream of inputs from the environment while it is in actual use [25]. This is because 1) training samples, which record the interaction history of an agent, can be collected more efficiently and truthfully when the system is in actual use; and 2) incremental learning allows the system or its agents to adapt to the change of environment in a timely fashion. 2.2 Example Techniques in MAS Learning Interest in research on MAS learning has seen a steady increase in the past several years, there are many MAS learning systems with vastly different architectures, different application areas, and different machine learning techniques. What follows are brief descriptions of some examples of these systems and the learning techniques they use. • Reinforcement learning: Reinforcement Learning (RL) is the process by which an agent improves its behavior in an environment via experience. RL is based on the idea that the tendency to perform an action by an agent should be strengthened (reinforced by a reward) if the action produces favorable results, and weakened (punished) if the action produces unfavorable results. One of the most important advantages of RL, in comparison with other learning methods, is that it requires very little prior knowledge of the environment, as it does not require having a target or desirable output for each input when forming a training sample. RL algorithms such as Q-learning [7,11] can incrementally adjust the system toward more favorable outcomes as long as it is provided a feedback judgment (good or back) on the system’s output for a given input. For this reason, RL has been seen as one of the most widely used learning methods in MAS [15,16,11]. Most noted application of RL is perhaps in the domain of robot soccer games [14,18] where the game’s outcome (win or lose) is fed back to train the soccer team. RL has also been applied to other problems, including setting right prices in competitive marketplaces [20], learning agent coordination mechanism [11], learning to schedule multiple goals [1], and in dealing with malicious agent in a market based MAS [21]. • Optimization-based learning techniques: Optimization-based learning methods such as genetic algorithms [8], neural networks [12,14,20], and linear programming [22] have been used in some experimental MAS to train individual agents to optimize their performance as long as their performance can be specified as some forms of objective functions (e.g., their utility functions). One example of such systems is the DragonChain that uses genetic algorithm (GA) approach to improve its performance in playing the MIT “Beer Game”, a game of electronic supply chain for beers [8]. Mimicking the law of biological evolution of the survival of the fittest, the GA learning in DragonChain was able to help the system to obtain good beer order policies for both retailers and wholesalers by search through the huge space of possible order policies. Their experiment showed that this system outperformed those governed by classic business rules by eliminating the Bullwhip phenomenon, and more interestingly, it can dynamically changing its policies to adapt to changing order patterns from the customers. • Probabilistic learning techniques: Probabilistic learning techniques are of particular interest to MAS learning because of their ability to handle the high degree of uncertainty of the learning environment caused by agent interaction. Uncertainty is even more prevalent when an agent tries to learn models of other agents. In probabilistic learning, an agent does not attempt to learn a deterministic model of another agent, but a probability distribution of a set of possible models of that agent. Examples of formalisms that support probabilistic learning include the Bayesian Belief Networks (BBN), which represent probabilistic dependencies among variables of interest 5
  • 6. in a graphic representation, and Influence Diagrams (ID), which further extend BBN to include decision nodes and utility nodes. A decision making model for supply chain management, called Bazaar, was developed based on BBN [24]. In this system, an agent (say a buyer) uses Bayesian learning to incrementally learn the distributions of reservation prices and the markup rates of its negotiation partners (sellers). Work by Suryadi and Gmytrasiewicz [19] uses ID learning for an agent to construct models other agents. An agent in their system maintains a number of possible models for each of the other agents that it is interacting and the probability distribution of these models. When none of these existing models has sufficiently high probability, one of them is modified (the parameters and even the structure of the underlying network are changed) to better reflect the observed behavior. An unspoken assumption for the above probabilistic learning systems is that the learning agent must have some prior knowledge of the behaviors of other agents it is trying to learn. At least, it has to assume the knowledge of the set of parameters with which the behavior of the other agent can be expressed because these parameters are the necessary building blocks for the possible probabilistic models. Unfortunately, this assumption may not hold in many MAS applications. • Supervised learning: Supervised learning covers a class of learning methods that requires a teacher to tell the learning system what is the target/correct/desired output for each training input. The target output is then compared with the current system output, and the discrepancy is used to drive the update of the system. Supervised learning includes backpropogation training in neural networks, K-nearest neighbor, minimum entropy, and some form of decision tree learning. Supervised learning is particularly suitable for learning user models for personal agents and human interface agents [4,9,12,13]. This type of agent works on behalf of human users and tries to best satisfy the users’ need. Instead of provide detailed rules to guide the agent (which may not be feasible for complex tasks), the human user can easily work as the teacher to provide desirable response to each input as a training sample to the agent. Payne at al has used k-nearest neighbor method to train a user interface agent [13], while Pannu and Sycara have used backpropagation method to train a personal agent for text filtering and notification [12]. • Rule-based learning: Learning rules for rule-based reasoning systems has also been reported in the literature [5,13]. Decision tree learning is perhaps the most mature technical for this type of learning. The advantage of rule-based learning lies on the fact that rules are easy for humans to understand. This allows domain experts to inspect and evaluate rules generated by a learning module, and make decision on whether to accept each of these rules. Moreover, since rules are probably the easiest way to represent and encode experts’ knowledge, many learning systems can start with a set of pre-defined rules and then let the rule-based learning module to modify the rule set with additional observations. Learning thus will greatly facilitate the growth, modification, and maintaining consistence of the knowledge base. These are precisely the reasons that we have chosen rule-based learning for our EECOMS Negotiation Rules learning (NRL) task. 2.3 Decision-Tree Learning Simply state, a decision tree is a representation of a decision procedure for determining a class label to associate with a given instance (represented by a set of attribute-value pairs). All non-leaf nodes in the tree are decision nodes. A decision node is associated with a test (question on the value of a particular attribute), and a branch corresponding to each of the possible outcomes of the test. At each leaf node, there is a class label (answer) to an instance. Traversing a path from the root to a leaf is much like playing a game of twenty questions, following the decisions made on each 6
  • 7. decision node on the path. Decision trees can be induced from examples (training samples) that are already labeled. One of the concerns of DT learning is how to construct trees that are as small as possible (measured by the number of distinct paths a tree has from its roots) and at the same time consistent with the training samples. In a worst case, the induced tree can be degenerated in which each sample has its own unique path (the tree size would then be in the order of exponential to the number of attributes involved). Information theoretic approach has been taken by several DT learning algorithms to address this problem, also to a lesser extent, to the problem of generalization [15]. The basic thought of a DT learning algorithm is: For each decision point, If all remaining examples are all positive or all negative, we're done. Else if there are some positive and some negative examples left and attributes left, pick the remaining attribute which is the "most important", the one which tends to divide the remaining examples into homogeneous sets Else if there are no examples left, no such example has been observed; return default Else if there are no attributes left, examples with the same description have different classifications: noise or insufficient attributes or nondeterministic domain Figure 3 below gives a simple example of DT learning. A tree of good size has been induced. It has 6 distinct paths, but it could have in the worst-case 12 distinct paths, each for a unique value assignment pattern of (color, size, shape). This is because some general rules were induced (if color = read, then immediately conclude the class = +; shape will be considered only if color = blue). The figure also shows a set of if-then rules can be generated from the induced tree. Essentially, each distinct path represents a rule: value assignments on the branches on the path constitute the conditional part of the rule, and the value assignment of the leaf node at the end of the path constitutes the conclusion part of the rule. 7
  • 8. 2.4 The Choice of the DT Learning Method An assumption we made when selecting suitable learning method is that decisions included in the training data (extracted from messages exchanged during negotiation sessions) are good decisions. Therefore, the goal of learning is not to attempt to further optimizing the decision process that was used to generate the data, but to learn rules/ strategies that lead to these decisions. In other words, the learned rules would make the decisions which are the same as (or similar to) those in the training set if the same or similar decision parameters are given. This lead us to the choice of supervised learning, instead of unsupervised or reinforcement learning. The training samples serve as instructions from a teacher or supervisor as each sample gives the desired or “correct” value assignment to the target attribute with respect to the pattern of value assignment to the decision parameters in the sample. Among all supervised learning methods, we have chosen to experiment with Decision Tree Learning (DT learning) [14,15,26] for the following reasons: • DT learning is a mature technology. It has been studied for 20+ years, has been applied to various real-world problems, and the learning algorithm has been improved by several significant modifications. • The basic algorithm and its underlying principles are easy to understand. • It is easy to apply DT learning to specific problem domains, including our NRL task. • Several good, easy to use DT learning packages are commercially available (free or with reasonable cost) [26,27, 28]. • It is easy to convert the induced decision tree to a set of rules, which are much easier for human experts to evaluate and manipulate, and to be incorporated into an existing rule based systems than other representations. 3. Negotiation Rules Learning System Framework In a negotiation MAS, learning can be applied to various aspect of agent negotiation: training the agent to make decisions in a way similar to what an experienced human manager would make; 8
  • 9. learning the models of an agent’s negotiation partner at different level of details; and learning negotiation strategies that outsmart its negotiation partners. We have experimented decision tree learning method in our EECOMS negotiation rules learning task. In this section we will give a brief description of the negotiation rule learning system design and a set of negotiation performatives used. 3.1 Objectives Several theoretical issues are concerned with such a MAS system mentioned above: the high time complexity of learning process, the lack of sufficient learning data and prior knowledge, the inherently uncertainty of the learning results, and the stability and convergence of learning MAS. Thus, the overall objective is to study the feasibility of employing machine learning techniques to automatically induce negotiation decision rules for supply chain players (buyers and sellers) from transaction data. Specifically, we aim at investigating the following: • Constructing a rule base by learning the decision strategy of a human expert: Initially the human makes all negotiation decisions. The prospective rules induced from the negotiation transactions are shown to the human for approval/rejection. The approved rules are then incorporated into existing rule base. Including humans in the loop allows us to have quality training samples as they are generated by an experienced human negotiator rather than an agent with set of not yet well-developed rules. It also makes the rule base more trustworthy to humans since every induced rule is inspected by a human before it is included into the rule base. • Learning the model of the negotiation partners’ behaviors: By properly incorporating the learned partner’s decision rules, an agent can make more informed decisions, and in turn improve its performance (reducing the negotiation time/steps and increasing the payoff). 3.2 Outline of NRL System Design Negotiation partners (buyers and sellers), represented by computer programs in a virtual supply chain, constitute a multi-agent system in the broad sense of this term as discussed in Section 1. Figure 1 shows the diagram of such a system, for each side of the negotiation, we have a decision module, and the rules in these modules can be more and more complete since each agent has learning ability that is implemented by the learning module. Initially, we have a human negotiator (or a set of pre-defined rules) to guide the negotiation process. 9
  • 10. 3.3 Negotiation Performatives All of the functional agents in a MAS should have some understanding of system ontology and use a certain Agent Communication Language (ACL) to make conversation, transfer information, share knowledge and negotiate with each other, which offers a minimal set of performatives to describe agent actions and allows users to extend them if the new defined ones conform to the rules of ACL syntax and semantics. Knowledge Query and Manipulation Language (KQML) and the ACL defined by Foundation for Intelligent Physical Agents/Agent Communication Language (FIPA ACL) are the most widely used and studied ACLs. In KQML there are no predefined performatives for agent negotiation actions. In FIPA ACL there are some performatives, such as proposal, CFP and so on, for general agent negotiation processes, but they are not sufficient for our purposes. For example, there are no performatives to handle third party negotiation. The NRL system design presents a negotiation performative set designed for MAS dealing with supply chain management [31]. In the following table, we give the negotiation performatives’ name, their corresponding meaning and the possible performatives a functional agent can use to reply when certain performative comes in: Name Meaning Performative Responsed CFP call for proposal Proposal | Terminate CFMP call for modified proposal Proposal | Terminate Reject reject a proposal Proposal | Terminate Terminate Terminate the negotiation NONE Accept accept a proposal NONE Proposal The action of submit a proposal Accept | Reject | Terminate | CFMP Initially, one agent starts negotiation by sending a CFP message to the other agent. After several rounds of conversation in which proposes and counter-proposes are exchanged, the negotiation between two agents will end when one side accepts (rejects) the other side’s proposal or terminates the negotiation process without any further explanation [31]. 4. Experiments and Results A preliminary experimental learning system form NRL was constructed earlier to evaluate the feasibility of learning decision rules for a buyer agent. A set of 27 training samples was manually generated following the data schema. These samples were fed to C5.0, a decision-tree learning package (a descendant of Ross Quinlan’s classic ID3 decision tree learning algorithm [14,15]) obtained from RuleQuest Research, http://www.rulequest.com/. The learning was successfully completed, a decision tree was constructed, from which a set of eight decision rules were generated by C5.0. These rules suggest to the buyer agent what actions it should take next (i.e., what types of messages should be sent out next), based on factors such as how well the terms such as price, quantity and delivery date in its current “Call-For-Proposal” match those in the returned “Proposal” from the seller it is negotiating with, the reputation of the seller, and how deep the current negotiation is progressing. The initial results from the experiment were encouraging. It is shown that the decision tree learning can be an effective tool to learn the negotiation rules from the data that reflects the past negotiation 10
  • 11. experience. The few rules learned are in general reasonable and consistent with the intuitions we had and used to generate the training data. However, the experiment was restricted by the quality of the training data (only 27 hand made samples were used), and the results were far from convincing (only three rules were learned). Encouraged by the results from the preliminary study, we went forward to conduct a more extensive experiment of NRL by decision tree learning. The main extensions include the following: • A program to automatically generate training samples is developed. This data generator is based on a set of decision rules that take into considerations of all important decision factors for making proposals (and counter proposals). Unlike the small data set generated manually and somewhat arbitrarily in the preliminary study, this data generator allows us to experiment with large quantity of learning samples (hundreds or even thousands of them) that are consistent and more realistic. • A better data schema is selected after a series of experiments to yield good results. • The buyer agent not only learns decision rules for itself but also learns rules that the seller appears to use, thus constructs a (simple) model of the seller. Next we summarize the experiment system and present the results. 4.1 The Training Data Generator More realistic training data may induce better, more meaningful rules. They can also be used to test and validate the learning results. Since no realistic data set of sufficient size is available to us, nor could we obtain help from human experts in supply chain negotiation, we were not able to adequately resolve this problem to our complete satisfaction. As an alternative, we have developed a training sample automatic generator to generate as many samples as we need. This sample generator essentially simulates the negotiation process on both the buyer side and seller side, based on two sets of decision rules encoded in the program, to generate a stream of message exchanges between the buyer and the seller. The actual training samples are then extracted from these messages. By changing the relevant rules, the sample generator can be used to generate samples reflecting different negotiation scenarios.  Message format: (msg_type, sender, receiver, item, price, quantity, delivery_date) For example, a negotiation session may start with the following message from the buyer to the seller (CFP Buyer Seller X 9.25 50 7) that Buyer wishes to entertain a proposal from Seller for 50 pieces of good X at price $9.25 a piece, to be delivered 7 days from this day. Seller may response with a message (Proposal Seller Buyer X 11.25 50 7) that it can sell these good by the given delivery date at the unit price of $11.25. To simplify the experiment, we chose to let the buyer and the seller to negotiate only one type of good, named X.  System parameters: A set of random numbers are used to determine the attribute values for the CFP message for each negotiation session and the values for the initial Proposal message in response to the CFP message. These numbers, to an extent, simulate the dynamic nature of the internal and external environment in which the negotiation is taking place. These parameters include the following: Buyer’s need for good X: 11
  • 12. − quantity: a random number between 1 and 200 (with 0.1 probability in [1,50], 0.8 probability in [51,150], and 0.1 probability in [151, 200]), and it can be mapped to three regions: [1,50] {small}, [51, 150] {ordinary}, [151, 200] {large}. − delivery_date: a random number between 1 and 20, it can also be mapped to three regions: 1~5 days{short}, 6~15 days{regular}, 16~20 days{long}. − asking price: a random number between 7 and 11, and the fair market unit price is $10, the range of all possible prices is partitioned into six regions: min (7t price< 8), low (8( price<9), normalminus(9p price<10), normalplus (10p pricep 11), high (11<price( 12), and max (12<price1 13). − importance of the order: a random number of binary value − Seller’s reputation: a random number of binary value Seller’s capacity to supply good X: − daily production capacity of good X: a random number between 8 and 12 − current inventory: a random number between 20 and 50 − importance of the Buyer as a customer: a random number of binary value With these random numbers, each negotiation starts with a CFP message of different quantity, delivery_date, and asking price. In response, the seller first determines if the requested delivery_date can be met for the given quantity based on the current inventory and the daily production capacities from this day to the delivery_date. To simplify the data generation, we assume that the seller submit its initial proposal (usually with a price higher than the asking price in the CFP message) only if the requested quantity and date can be met. The negotiation then continues with the buyer gradually increasing the asking price and the seller decreasing the bidding price until the session ends. The details of the negotiation are governed by decision rules at either side.  Negotiation rules for Seller agent: The following rules are used in the data generator to form response messages from Seller agent. SR-1: Terminate the negotiation IF Seller cannot meet the quantity-date requested in the incoming CFP message SR-2: Terminate the negotiation IF asking price = min (7 price< 8) SR-3: Terminate the negotiation IF asking price = low (8 price<9) & Buyer is NOT an important customer SR-4: Otherwise, submit a Proposal with the requested quantity and date, and the price is determined by: SR-41: IF the asking price = normalplus or high or max (10S pricep 13) THEN bidding price = asking price SR-42: Otherwise, IF incoming msg-type = CFP THEN IF asking price = low (8I price<9) & Buyer is an important customer THEN propose a higher price (bidding price>asking price) ELSE IF asking price = normalminus (9E price<10) Buyer is NOT an important customer THEN propose a higher price (bidding price>asking price) ELSE IF asking price = normalminus (9E price<10) Buyer is an important customer THEN bidding price = asking price ELSE propose a lower price (bidding price asking price)  Negotiation rules for Buyer agent: The following rules are used in the data generator to form messages from Buyer agent. BR-1: Terminate the negotiation IF bidding price =max (12<pricet 13) BR-2: Terminate the negotiation IF bidding price = high (11<pricet 12) & the current depth of negotiation 1 7 BR-3: Reject the incoming proposal IF bidding price = high (11<price t 12) & either Seller’s reputation is bad or this order is not important & the current depth of negotiation < 7 12
  • 13. BR-4: CFMP for a lower price IF bidding price = high (11<price 12) & Seller’s reputation is good & this order is important & the current depth of negotiation < 7 BR-5: Accept the current proposal IF the bidding pricet 10 BR-6: Accept the current proposal IF the bidding price = asking price (delta_price =0) BR-7: Accept the current proposal IF bidding price = normalplus (10<price 11) & Seller’s reputation is good & The order is important. BR-8: CFMP for a lower price IF bidding price = normalplus (10<price 11) & either Seller’s reputation is bad or this order is not important & the current depth of negotiation < 7 BR-9: Terminate IF bidding price = normalplus (10<price 11) & either Seller’s reputation is bad or this order is not important the current depth of negotiation n 7 The function used in negotiation to reduce and increase price by 'buyer' agent: 1/(1 + e xp(-x)) (x maybe from -3 to +3) The function used in negotiation to reduce and increase price by 'seller' agent: (Depth_Max - depth+1) * tan(0.25) depth 0 1 2 3 4 5 6 7 diff_buyer N/A 0.05 0.12 0.27 0.50 0.73 0.88 0.95 diff_seller 2.043 1.787 1.532 1.277 1.021 0.766 0.511 0.255 4.2 Data Schema for Training Samples The negotiation process in NRL is very complex. Consider just the task of making a counter proposal by a buyer when it receives a new proposal during a negotiation session. This task amounts to optimizing a function (e.g., payoff) based on a high dimensional many-to-many mapping. The input involves parameters reflecting the enterprise’s planning and execution (customer orders, production capacity, inventory, etc.); the distance between asking and bidding values of negotiation terms (prices, quantities, delivery dates, etc.) at the given point of a negotiation session; the trustworthiness of the negotiation partner; how long the current session has been (the longer it lasts, the less likely a good deal can be made); the importance to the buyer for the on-going negotiation to succeed; and the availability of other alternative sellers, etc. The output is also composed of a large number of parameters that gives a detailed description of a (counter) proposal. The training samples for DT learning are composed by these attributes. A training sample is a vector or a list of values for a set of attributes extracted from a message exchanged during the negotiation. A sample can be divided into two parts. The first part involves attributes one hopes that the learned rules can be used to generate. They are thus referred to as learning target. The second part includes those attributes that support the conclusions of the learned rules on assigning values to the target attributions; they are referred to as decision parameters. The data model or data scheme for training samples specifies what is to be learned (the target attribute) and what are the decision parameters (other attributes the target depend on). A training sample is synthesized from three consecutive messages: the current incoming message, the one that precedes it, and the one in response to it. Figure 2 is an example of training samples used in our early learning experiment. 13
  • 14. In this experiment, to simplify the investigation, we have decided to focus on learning rules for determining appropriate message type for response to an incoming message, namely, our learning target is the performative (or message type) that will be used to response the incoming one. The target attribute will be either CFP, CFMP, Terminate, Accept, Reject for Buyer, Terminate and Proposal for Seller. Selection of decision parameters is more complicated. Since the type of each outgoing message from an agent is determined by the content of the incoming message and the content of the previous message from the same agent, a large number of attributes that may potentially affect the new message type can be extracted from the two proceeding two messages and from their differences. For example, consider the situation that Buyer receives a Proposal (msg-2) from Seller after sending a CFP message (msg-1). The new message (msg-3) from Buyer, in responding to msg-2, may depend on: attributes from msg-1: bprice (Buyer’s asking price) bquantity (Buyer’s requested quantity) bdate (Buyer’s requested delivery_date) last_msg (type of the last msg from Buyer to Seller) attributes from msg-2: sprice (Seller’s bidding price) squantity (Seller’s proposed quantity) sdate (Seller’s proposed delivery_date) incoming_msg (type of the incoming msg from Seller) attributes from the difference between msg-1 and msg-2: delta_price delta_quantity delta_date match_dq (true only if both delta_quantity and delta_date are zero) attributes about other properties of Buyer: opp-reputation (Buyer’s evaluation of Seller’s reputation) weight-item (whether this order is important to Buyer) depth (number of msgs Buyer has sent during the current session) 14
  • 15. A small set of decision parameters may be insufficient for the learning module to differentiate training examples, and thus resulting in a decision tree with many ambiguous leave nodes (nodes with multiple labels). On the other hand, a large set of decision parameters may refine the decision tree to a level that is too detailed to be practically useful because the induced tree would have a great height and a large number of branches (rules). For example, one of our experimental run used all the parameters listed above. The induced decision tree for Buyer has height of 14, which means that some rules would have to check as many as 14 conditions before it draws a conclusion. Moreover, a total of more than 100 rules are generated from this tree. It may be possible to obtain a workable smaller set of rules can be obtained from these raw rules by some pruning techniques, but this would require a substantial post-learning process. After several trials, we have chosen the follows decision parameter sets for the Buyer and Seller, respectively. Buyer: sender, depth, receiver, last_msg, incoming_msg, item, sprice, opp_reputation, weight_item, match_qd. Seller: sender, depth, receiver, last_msg, incoming_msg, item, bprice, opp_importance, match_qd. 4.3 The Experiment Results We have experimented three software packages for decision tree learning, they are (1) C5.0/See5.0 from RuleQuest Research [27], (2) Ripper from ATT [28], and (3) ITI (Incremental Tree Induced) from University of Massachusetts [26]. C5.0/See5.0 was not selected for the final experiment because the free-of-charge version we have restricts the dataset to have no more than 200 training samples, which are not sufficient to make the learning process converge. Ripper, although not restricting the size of the training set, was rejected because it always produces a very small number of rules (possibly due to a severe pruning process it uses to generate the final output). ITI was selected not only because it works well with our learning task but also because it supports incremental learning, a valuable feature we plan to further explore in the future. 3000 randomly generated negotiation sessions were generated by the automatic data generator described in Section 3.1. Each session includes a sequence of message exchanges between Buyer and Seller, starting with a CFP message from Buyer. The experiment showed that this amount of training samples is sufficient for the learning process to converge (the induced tree becomes stable). Datasets of smaller size may be used to learn most, but not all decision rules, because they do not contain all possible scenarios, especially those with small probabilities. These samples were fed into ITI under the data model described in Section 3.2. Two induced decision trees and their corresponding data model files, for Buyer and Seller, respectively, were included in Appendix. Learned rules for Buyer: 12 rules can be generated from the induced decision tree, corresponding to the 12 paths in the tree. Two rules (the first and last, counting from left to right) are related to starting and ending a session, the 10 others are rules for determining the new message types. These rules match very well with the rules used to generate the training data. There is no apparent inconsistency between these two sets of rules. For example, the second rule IF sprice = max THEN Terminate is the same as BR-1 in Section 3.1. The next three rules (all under the condition that quantity and delivery-date match) IF sprice = high AND weight_item = unimportant THEN Reject IF sprice = high AND weight_item = important AND opp_reputation = bad THEN Reject IF sprice = high AND weight_item = important AND opp_reputation = good THEN CFMP 15
  • 16. jointly match the rules of BR-3 and BR-4. All other induced rules also match the data generation rules well. Learned rules for Seller: 6 rules can be generated from the decision tree, corresponding to the 6 paths in the tree. The last two rules are related to starting and end a session, the first 4 for determining the new message types. These rules again match very well with the rules used to generate the training data. The first two rules from the tree IF bprice = min THEN Terminate IF bprice = low AND opp_weight = unimportant THEN Terminate are the same as SR-2 and SR-3 in Section 3.1, the next two rules IF bprice = low AND opp_weight = important THEN Proposal IF bprice > low THEN Proposal jointly match the rules of SR-4. 5. Conclusions and Future Work Due to the inherent complex, uncertain, and dynamic nature of the multi-agent systems, it is very difficult, if not impossible to encode agents’ decision strategies a priori. Learning while doing becomes imperative for constructing MAS in applications. This is also true for automatic negotiation systems for supply chain management where each entity operates autonomously and interacts with others (its negotiating partners) to reach deals. We have begun an empirical investigation of the feasibility of adopting some existing machine learning techniques to learn negotiation rules (NRL) from transaction data (mainly from the messages exchanged among negotiation agents). Our experiment results show that, with carefully designed data scheme and sufficiently many training samples, decision tree learning method can be used to effectively learn decision rules for some e-commerce activities such as negotiation in supply chains. More interestingly, our experiment showed that the Buyer agent could learn the model of its partner (the Seller) using only the information available in the messages they exchanged during the negotiation. Although the model learned is about the behavior of the seller, not about the underlying mechanism governing the decision making at the Seller agent, it may provide the buyer some power to predict the response from the seller or to choose the actions that will bring the most desired responses. Although our experiment only involves learning how to determine one aspect of the responding message, namely its message type, it is conceivable that this method can be used to learn other aspects (e.g., how to set the price) from the same raw data (possibly with different data models). In other words, a more complete model (with multiple facets) of an agent can be constructed by simultaneously running multiple decision tree learning processes, one for an aspect of the agent. In future, more investigation can be considered in the following directions: (1) Experiment decision tree learning with other, preferably real-world training data. (2) Study the issue of how to incorporate the learned model of your partner (or opponent) to improve your negotiation decision- making. (3) Investigate the applicability of incremental decision tree learning and how it can improve the agent’s performance by making it adaptive to the changing environment. (4) Develop some hybrid learning architecture that employs different learning techniques for different aspects of the e-commerce activities. References 16
  • 17. [1] Arai, S. Sycara, K., and Payne, T.R. (2000). Multi-agent Reinforcement Learning for Scheduling Multiple-Goals. In Proceedings of the Fourth International Conference on Multi-Agent Systems (ICMAS'2000). [2] Arrow, K. (1962). The implications of learning by doing. Review of Economic Studies, 29, 166170. [3] Brazdil, P., Gams, M., Sian, S., Torgo, L., & van de Velde, W. (1991). Learning in distributed systems and multi-agent environments. In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 412--423). Lecture Notes in Artificial Intelligence, vol. 482. Berlin: SpringerVerlag. [4] Caglayan, A., et al., (1996). Lessons from Open Sesame!, a User Interface Agent. In Proceedings of PAAM ’96. [5] Haynes, T., Lau, K., and Sen, S. (1996). Learning Cases to Compliment Rules for Conflict Resolution in Multiagent Systems. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in Multiagent Systems, Stanford, CA, March, 1996. [6] Hu, J. and Wellman, M. (1998). Online Learning About Other Agents in a Dynamic Multiagent System. In Proceedings of the Second International Conference on Autonomous Agents (Agents98), Minneapolis, MN, USA, May 1998 [7] Humphrys, M. (1995). Wlearning: Competition among selfish Q-learners. Technical Report no. 362. Computer Laboratory, University of Cambridge. [8] Kimbrough, S.O., Wu, D.J., and Zhong, F. (2000). Artificial Agents Play the Beer Game, Eliminate the Bullwhip Effect, and Whip the MBAs. http://grace.wharton.upenn.edu/~sok/fmec/schedule.html [9] Maes, P. (1994). Social interface agents: Acquiring competence by learning from users and other agents. In O. Etzioni (Ed.), Working Notes of the 1994 AAAI Spring Symposium on Software Agents. [10] Mor, Y., Goldman, C.V., and Rosenschein, J.S. (1996). Learn Your Opponent's Strategy (in Polynomial Time). In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 164-176). Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag, 1996. [11] Mundhe, M. and Sen, S. (1999). Evaluating Concurrent Reinforcement Learners. In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden. [12] Pannu, A. and Sycara, K. (1996). Learning Personal Agent for Text Filtering and Notification. In Proceedings of the International Conference of Knowledge-Based Systems (KBCS 96), Dec., 1996. [13] Payne, T.R., Edwards, P., & Green, C.L. (1995). Experience with rule induction and k-nearest neighbor methods for interface agents that learn. In WSIMLC95). [14] Quinlan, J.R. (1986). “Induction of Decision Trees”. Machine Learning, 1, 81-106. th [15] Quinlan, J.R. (1993). “Combining Instance-Based and Model-Based Learning”, in Proceedings of 10 International Conference on Machine Learning, 236-243. [16] Schmidhuber, J. (1996). A General Method For Multi-Agent Reinforcement Learning In Unrestricted Environments. Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in Multiagent Systems, Stanford, CA, March, 1996. [17] Sian, S.S. (1991). Extending Learning to Multiple Agents: Issues and a Model for Multi-Agent Machine Learning (MAML). In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 440--456). Berlin: Springer-Verlag. [18] Stone, P. and Veloso, M. (1996). Collaborative and Adversarial Learning: A Case Study in Robotic Soccer. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in Multiagent Systems, Stanford, CA, March, 1996. [19] Suryadi, D. and Gmytrasiewicz, P.J. Learning Models of Other Agents Using Influence Diagrams. In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden. [20] Tesauro, G. (1999). Pricing in Agent Economies Using Neural Networks and Multi-Agent Q-learning. In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden. [21] Vidal, J. and Durfee, E. (1997) Agents Learning about Agents: A Framework and Analysis. In Working Papers of the AAAI-97 Workshop on Multiagent Learning. [22] Weiß, G. and S. Sen (Eds.) (1996). Adaptation and Learning in Multi-Agent Systems (pp. 1-21). Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag. [23] Weiß, G. (1996). Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography, In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 1-21). Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag. 17
  • 18. Supplier [24]Zeng, D. and Sycara, K (1996). Bayesian Learning in Negotiation. In Working Notes of the AAAI 1996 Stanford Spring Symposium Series on Adaptation, Coevolution and Learning in Multiagent Systems. [25] Zeng, D. and Sycara, K (1997). Benefits of Learning in Negotiation. In Proceedings of AAAI. [26] http://www.cs.umass.edu/~lrn/iti/ -- Incremental Tree Induced (ITI) By UMASS [27] http://www.rulequest.com/ -- C5.0/See5.0 for DT learning from Rulequest Research [28] http://www.research.att.com/~diane/ripper/ripper-2.5.tar.gz -- Ripper for DT learning from ATT . [29] P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,” Under review for journal publication, February, 1997. [30] M. Barbuceanu, and M. S. Fox, “The Information Agent: An Infrastructure Agent Supporting Collaborative Enterprise Architectures,” in Proceedings of Third Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Morgantown, West Virginia, IEEE Computer Society Press, 1994. [31] Ye Chen, Yun peng, Tim Finin, Yannis Labrou, Scott Cost, BIll Chu, Rongming Sun and Bob Willhelm, “A negotiation-based Multi-agent System for Supply Chain Management”, Workshop on supply chain management, Autonomous Agents '99, Seattle, WA, May 1999. Acknowledgement I will like to thank my advisor Dr. Yun Peng for his great help for this Master’s project and Dr. Charles K Nicholas for the reviewer of this report. I also want to thank Mr. Ye Chen and Dr. Tim Finin for their pertinent suggestions. Appendix: Induced decision trees (by ITI) Data Model: selleri.names Terminate, Proposal, Stop, NIL. sender: buyer, seller. depth: continuous. receiver: buyer, seller. last_msg: Terminate, Proposal, Stop, NIL. response_msg: CFP, CFMP, Terminate, Accept, Reject, NIL. item: X, Y. bprice: min, low, normalminus, normalplus, high, max, NIL. opp_weight: important, unimportant, NIL. match_qd: match, unmatch, NIL. Decision tree for the Seller agent 18
  • 19. Data Model: buyeri.names CFP, CFMP, Terminate, Accept, Reject, Stop, NIL. sender: buyer, seller. depth: continuous. receiver: buyer, seller. last_msg: CFP, CFMP, Terminate, Accept, Reject, Stop, NIL. response_msg: Terminate, Proposal, NIL. item: X, Y. sprice: min, low, normalminus, normalplus, high, max, NIL. opp_reputation: bad, good, NIL. weight_item: important, unimportant, NIL. match_qd: match, unmatch, NIL. Decision tree for the Buyer agent 19
  • 20. 20