CMSC698.doc

Decision-Tree Learning for Negotiation Rules

Zhongli Ding

A paper submitted to the
Computer Science Electrical Engineering Department
in partial fulfillment of the requirements for the M.S. degree at
University of Maryland Baltimore County

January, 2001

CMSC 698 Advisory Committee
Dr. Peng Yun (Advisor), Associate Professor in Computer Science
Dr. Charles K Nicholas (Reader), Associate Professor in Computer Science

Certified by
Yun Peng, CMSC698 Advisor

1

Abstract

The emergency of e-commerce increases the importance of research on various multi-agent systems
(MAS). MAS is used loosely to refer to any distributed system whose components (or agents) are
designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are
suitable for the domains that involve interactions between different people or organizations with
different (possibly conflicting) goals and proprietary information. A potential application area of
MAS is in the “Supply Chain Management System” to integrate a company's activities across the
entire e-supply chain - from acquisition of raw materials and purchased components through
fabrication, assembly, test, and distribution of finished goods, and roles of these individual entities
in the supply chain be implemented as distinct functional software agent which cooperate with each
other in order to implement system functionality in a e-business environment. The major
interactions in supply chains are done through negotiation strategically between enterprises and
consumers. Correspondingly, automated negotiation interactions between two or more agents (say,
buyers and sellers) in a multi-agent SCMS are very important. Much better benefits and profits can
be obtained if these autonomous negotiation agents are capable of learning and reasoning based on
experience and improving their negotiation behavior incrementally just as human negotiators.
Learning can be used either to extract an entire rules set for an agent, or to improve a pre-existing
set of rules. In this project, based on a negotiation-based MAS framework for supply chain
management and a set of negotiation performatives defined, we are trying to test the possibility of
adopt decision-tree learning (or rule-based learning) method in the negotiation process. Experiment
results on the effect of using rule-based learning method in a pair-wised negotiation process
between one buyer and one seller are presented, which show that with carefully designed data
scheme and sufficiently many training samples, decision tree learning method can be used to
effectively learn decision rules for some e-commerce activities such as negotiation in supply
chains.

Keywords E-Commerce, Multi-Agent System, Supply Chain Management System,
Negotiation, Negotiation, Negotiation Performatives, Decision-Tree Learning, Rule-based learning

2

1. Introduction
The development of computer software and hardware leads to the appearance of non-human
software agencies. A software agent is considered as an entity with goals, capable of actions
endowed with domain knowledge and situated in an environment [29]. The term of multi-agent
systems (MAS) is used loosely to refer to any distributed system whose components (or agents) are
designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are
suitable for the domains that involve interactions between different people or organizations with
different (possibly conflicting) goals and proprietary information [29]. Comparing with monolith
single systems and traditional distributed systems, due to insufficient knowledge of the system
environment, the required coordination of the activities between multiple agents and the dynamic
nature of the MAS, the design and implementation of a MAS is of considerable complexity with
respect to both its structure and its functionalities.

A supply chain is the process of moving goods from the customer order through the raw materials
stage, supply, production, and distribution of products to the customer. More formally, a supply
chain is a network of suppliers, factories, warehouses, distribution centers and retailers, through
which raw materials are acquired, transformed, produced and delivered to the customer [30]. A
supply chain management system (SCMS) manages the cooperation of these system components.
In the computational world, roles of individual entities in a supply chain can be implemented as
distinct agents. Correspondingly, a SCMS transforms to a MAS, in which functional agents
cooperate with each other in order to implement system functionality [31].

In supply chains, enterprises and consumers interact with each other strategically. A great portion
of these interactions are done through negotiation. Thus, automated negotiation interactions
between two or more agents (say, buyers and sellers) in a multi-agent SCMS are very important.
We can get much better benefits and profits if these autonomous negotiation agents are capable of
learning and reasoning based on experience and improving their negotiation behavior incrementally
just as human negotiators. Moreover, problems stemmed from the complexity of MAS can be
avoided or at least reduced by endowing the agents with the ability to adapt and to learn, that is,
with the ability to improve the future performance of the total system, of a part of it, or of a single
agent [3,17,23]. Learning can be used either to extract an entire rules set for an agent, or to improve
a pre-existing set of rules.

Two most important issues might be concerned with learning in negotiation are: How to model the
overall negotiation process, i.e. the design of the modeling framework of our negotiation-based
multi-agent system for supply chain management? What is the learning algorithm or method we
might choose for the decision-making of the agents?

In [31], researchers propose a negotiation-based MAS framework for supply chain management
and describe a number of negotiation performatives, which can be used to construct pair-wise and
third party negotiation protocols for functional agent cooperation. It also explain how to formally
model the negotiation process by using Colored Petri Nets (CPN) and provide an example of
establishing a virtual chain by solving a distributed constraint satisfaction problem.

Based on this framework, one main job is trying to test the possibility of adopt decision-tree
learning (or rule-based learning) method in the negotiation process and my project experiment is
part of this task. There exist a lot machine learning methods that might be useful, but as pointed
out later, decision-tree learning is the most suitable one. In this project, I experiment the effect of

3

using rule-based learning method in a pair-wised negotiation process between one buyer and one
seller. This is the first step of our job and future work might extend this into a negotiation-based
multi-agent system that is more complex with a lot functional agents joining in, staying, bargaining,
or leaving the system.

In section 2, we give a brief summary of past research works and learning techniques. Then in
Section 3, we give a simple explanation of the designed MAS framework of the system and a set of
negotiation performatives used. In Section 4 we present our experiment results so far and give some
simple analysis. Finally, in Section 5, we make a conclusion and give our future research goals. In
Appendix, we also provide some sample experiment results.

2. Learning Overview and Decision-Tree Learning
In this section, we briefly survey existing research work on learning and adaptation in multi-agent
systems, especially those applied in e-commerce activities, and give a simple introduction of the
decision-tree learning method along with the reason we choose it, before giving the design
framework of our experimental Negotiation Rules Learning (NRL) system and the experiment
results in later sections.

2.1 Categories and Objectives of Learning

There are a number of different ways in which learning can be used within a MAS for different
objectives.

• An agent, standing alone, can learn its owner’s intention and decision-making strategies. In this
case, the human user is often served as the trainer whose decisions in response to environmental
inputs are used as the training samples. The agent along with the accumulation of the training
samples can incrementally learn the decision logic of the human, which might be difficult to
explicitly encode as decision rules by hand.
• An agent, standing alone, can learn to improve its responses to the environment inputs
(including those from other agents) as long as some objective functions (e.g., various utility
functions) are well defined. In this case, training samples are its previous interactions with the
environment, including the corresponding the objective function values.
• An agent can learn about other agents in order to compete or cooperate with them. This type of
learning is deeper than the previous two in that an agent learns something that other agents used
to make their decisions and uses such knowledge to better fine-tune its own strategy. Learning
in this category can be as simple as learning a few parameters that other agents used to conduct
their operation (e.g., the reservation prices and markup percentages of suppliers in a supply
chain [24]), or can be quite complicated as learning models of other agents’ decision strategies
[6,9,10,19,23].
• A set of agents can learn to simultaneously adjust their respective decision processes. This type
of learning occurs mostly in those MAS whose agents are tightly cooperating with each other to
achieve a common goal (e.g., winning a robot soccer game), and the learning inputs often
reflect the system’s performance (e.g., scores in a soccer game) rather than performance of
individual agents (players) [14,18].

In some applications, MAS learning can be done in the so-called “batch mode”, i.e., the system is
trained over a set of pre-assembled training samples while the system is not in user (either before
the system’s deployment or when the system is taken offline). In most cases, however, it is

4

preferred that learning is conducted in the “incremental model”, i.e., the system is incrementally
adjusting/modifying itself by learning from a continuous stream of inputs from the environment
while it is in actual use [25]. This is because 1) training samples, which record the interaction
history of an agent, can be collected more efficiently and truthfully when the system is in actual
use; and 2) incremental learning allows the system or its agents to adapt to the change of
environment in a timely fashion.

2.2 Example Techniques in MAS Learning

Interest in research on MAS learning has seen a steady increase in the past several years, there are
many MAS learning systems with vastly different architectures, different application areas, and
different machine learning techniques. What follows are brief descriptions of some examples of
these systems and the learning techniques they use.

• Reinforcement learning: Reinforcement Learning (RL) is the process by which an agent
improves its behavior in an environment via experience. RL is based on the idea that the
tendency to perform an action by an agent should be strengthened (reinforced by a reward) if
the action produces favorable results, and weakened (punished) if the action produces
unfavorable results. One of the most important advantages of RL, in comparison with other
learning methods, is that it requires very little prior knowledge of the environment, as it does
not require having a target or desirable output for each input when forming a training sample.
RL algorithms such as Q-learning [7,11] can incrementally adjust the system toward more
favorable outcomes as long as it is provided a feedback judgment (good or back) on the
system’s output for a given input. For this reason, RL has been seen as one of the most widely
used learning methods in MAS [15,16,11]. Most noted application of RL is perhaps in the
domain of robot soccer games [14,18] where the game’s outcome (win or lose) is fed back to
train the soccer team. RL has also been applied to other problems, including setting right prices
in competitive marketplaces [20], learning agent coordination mechanism [11], learning to
schedule multiple goals [1], and in dealing with malicious agent in a market based MAS [21].

• Optimization-based learning techniques: Optimization-based learning methods such as genetic
algorithms [8], neural networks [12,14,20], and linear programming [22] have been used in
some experimental MAS to train individual agents to optimize their performance as long as
their performance can be specified as some forms of objective functions (e.g., their utility
functions). One example of such systems is the DragonChain that uses genetic algorithm (GA)
approach to improve its performance in playing the MIT “Beer Game”, a game of electronic
supply chain for beers [8]. Mimicking the law of biological evolution of the survival of the
fittest, the GA learning in DragonChain was able to help the system to obtain good beer order
policies for both retailers and wholesalers by search through the huge space of possible order
policies. Their experiment showed that this system outperformed those governed by classic
business rules by eliminating the Bullwhip phenomenon, and more interestingly, it can
dynamically changing its policies to adapt to changing order patterns from the customers.

• Probabilistic learning techniques: Probabilistic learning techniques are of particular interest to
MAS learning because of their ability to handle the high degree of uncertainty of the learning
environment caused by agent interaction. Uncertainty is even more prevalent when an agent
tries to learn models of other agents. In probabilistic learning, an agent does not attempt to learn
a deterministic model of another agent, but a probability distribution of a set of possible models
of that agent. Examples of formalisms that support probabilistic learning include the Bayesian
Belief Networks (BBN), which represent probabilistic dependencies among variables of interest

5

in a graphic representation, and Influence Diagrams (ID), which further extend BBN to include
decision nodes and utility nodes. A decision making model for supply chain management,
called Bazaar, was developed based on BBN [24]. In this system, an agent (say a buyer) uses
Bayesian learning to incrementally learn the distributions of reservation prices and the markup
rates of its negotiation partners (sellers). Work by Suryadi and Gmytrasiewicz [19] uses ID
learning for an agent to construct models other agents. An agent in their system maintains a
number of possible models for each of the other agents that it is interacting and the probability
distribution of these models. When none of these existing models has sufficiently high
probability, one of them is modified (the parameters and even the structure of the underlying
network are changed) to better reflect the observed behavior. An unspoken assumption for the
above probabilistic learning systems is that the learning agent must have some prior knowledge
of the behaviors of other agents it is trying to learn. At least, it has to assume the knowledge of
the set of parameters with which the behavior of the other agent can be expressed because these
parameters are the necessary building blocks for the possible probabilistic models.
Unfortunately, this assumption may not hold in many MAS applications.

• Supervised learning: Supervised learning covers a class of learning methods that requires a
teacher to tell the learning system what is the target/correct/desired output for each training
input. The target output is then compared with the current system output, and the discrepancy is
used to drive the update of the system. Supervised learning includes backpropogation training
in neural networks, K-nearest neighbor, minimum entropy, and some form of decision tree
learning. Supervised learning is particularly suitable for learning user models for personal
agents and human interface agents [4,9,12,13]. This type of agent works on behalf of human
users and tries to best satisfy the users’ need. Instead of provide detailed rules to guide the agent
(which may not be feasible for complex tasks), the human user can easily work as the teacher to
provide desirable response to each input as a training sample to the agent. Payne at al has used
k-nearest neighbor method to train a user interface agent [13], while Pannu and Sycara have
used backpropagation method to train a personal agent for text filtering and notification [12].

• Rule-based learning: Learning rules for rule-based reasoning systems has also been reported in
the literature [5,13]. Decision tree learning is perhaps the most mature technical for this type of
learning. The advantage of rule-based learning lies on the fact that rules are easy for humans to
understand. This allows domain experts to inspect and evaluate rules generated by a learning
module, and make decision on whether to accept each of these rules. Moreover, since rules are
probably the easiest way to represent and encode experts’ knowledge, many learning systems
can start with a set of pre-defined rules and then let the rule-based learning module to modify
the rule set with additional observations. Learning thus will greatly facilitate the growth,
modification, and maintaining consistence of the knowledge base. These are precisely the
reasons that we have chosen rule-based learning for our EECOMS Negotiation Rules learning
(NRL) task.

2.3 Decision-Tree Learning

Simply state, a decision tree is a representation of a decision procedure for determining a class label
to associate with a given instance (represented by a set of attribute-value pairs). All non-leaf nodes
in the tree are decision nodes. A decision node is associated with a test (question on the value of a
particular attribute), and a branch corresponding to each of the possible outcomes of the test. At
each leaf node, there is a class label (answer) to an instance. Traversing a path from the root to a
leaf is much like playing a game of twenty questions, following the decisions made on each

6

decision node on the path. Decision trees can be induced from examples (training samples) that are
already labeled.

One of the concerns of DT learning is how to construct trees that are as small as possible (measured
by the number of distinct paths a tree has from its roots) and at the same time consistent with the
training samples. In a worst case, the induced tree can be degenerated in which each sample has its
own unique path (the tree size would then be in the order of exponential to the number of attributes
involved). Information theoretic approach has been taken by several DT learning algorithms to
address this problem, also to a lesser extent, to the problem of generalization [15].

The basic thought of a DT learning algorithm is:

For each decision point,
If all remaining examples are all positive or all negative, we're done.
Else if there are some positive and some negative examples left
and attributes left, pick the remaining attribute which is
the "most important", the one which tends to divide the
remaining examples into homogeneous sets
Else if there are no examples left, no such example has been
observed; return default
Else if there are no attributes left, examples with the same
description have different classifications: noise or insufficient
attributes or nondeterministic domain

Figure 3 below gives a simple example of DT learning. A tree of good size has been induced. It has
6 distinct paths, but it could have in the worst-case 12 distinct paths, each for a unique value
assignment pattern of (color, size, shape). This is because some general rules were induced (if color
= read, then immediately conclude the class = +; shape will be considered only if color = blue). The
figure also shows a set of if-then rules can be generated from the induced tree. Essentially, each
distinct path represents a rule: value assignments on the branches on the path constitute the
conditional part of the rule, and the value assignment of the leaf node at the end of the path
constitutes the conclusion part of the rule.

7

2.4
The Choice of the DT Learning Method

An assumption we made when selecting suitable learning method is that decisions included in the
training data (extracted from messages exchanged during negotiation sessions) are good decisions.
Therefore, the goal of learning is not to attempt to further optimizing the decision process that was
used to generate the data, but to learn rules/ strategies that lead to these decisions. In other words,
the learned rules would make the decisions which are the same as (or similar to) those in the
training set if the same or similar decision parameters are given. This lead us to the choice of
supervised learning, instead of unsupervised or reinforcement learning. The training samples serve
as instructions from a teacher or supervisor as each sample gives the desired or “correct” value
assignment to the target attribute with respect to the pattern of value assignment to the decision
parameters in the sample.

Among all supervised learning methods, we have chosen to experiment with Decision Tree
Learning (DT learning) [14,15,26] for the following reasons:

• DT learning is a mature technology. It has been studied for 20+ years, has been applied to
various real-world problems, and the learning algorithm has been improved by several
significant modifications.
• The basic algorithm and its underlying principles are easy to understand.
• It is easy to apply DT learning to specific problem domains, including our NRL task.
• Several good, easy to use DT learning packages are commercially available (free or with
reasonable cost) [26,27, 28].
• It is easy to convert the induced decision tree to a set of rules, which are much easier for human
experts to evaluate and manipulate, and to be incorporated into an existing rule based systems
than other representations.

3. Negotiation Rules Learning System Framework
In a negotiation MAS, learning can be applied to various aspect of agent negotiation: training the
agent to make decisions in a way similar to what an experienced human manager would make;

8

learning the models of an agent’s negotiation partner at different level of details; and learning
negotiation strategies that outsmart its negotiation partners. We have experimented decision tree
learning method in our EECOMS negotiation rules learning task. In this section we will give a brief
description of the negotiation rule learning system design and a set of negotiation performatives
used.

3.1 Objectives

Several theoretical issues are concerned with such a MAS system mentioned above: the high time
complexity of learning process, the lack of sufficient learning data and prior knowledge, the
inherently uncertainty of the learning results, and the stability and convergence of learning MAS.
Thus, the overall objective is to study the feasibility of employing machine learning techniques to
automatically induce negotiation decision rules for supply chain players (buyers and sellers) from
transaction data. Specifically, we aim at investigating the following:

• Constructing a rule base by learning the decision strategy of a human expert: Initially the
human makes all negotiation decisions. The prospective rules induced from the negotiation
transactions are shown to the human for approval/rejection. The approved rules are then
incorporated into existing rule base. Including humans in the loop allows us to have quality
training samples as they are generated by an experienced human negotiator rather than an agent
with set of not yet well-developed rules. It also makes the rule base more trustworthy to humans
since every induced rule is inspected by a human before it is included into the rule base.

• Learning the model of the negotiation partners’ behaviors: By properly incorporating the
learned partner’s decision rules, an agent can make more informed decisions, and in turn
improve its performance (reducing the negotiation time/steps and increasing the payoff).

3.2 Outline of NRL System Design

Negotiation partners (buyers and sellers), represented by computer programs in a virtual supply
chain, constitute a multi-agent system in the broad sense of this term as discussed in Section 1.
Figure 1 shows the diagram of such a system, for each side of the negotiation, we have a decision
module, and the rules in these modules can be more and more complete since each agent has
learning ability that is implemented by the learning module. Initially, we have a human negotiator
(or a set of pre-defined rules) to guide the negotiation process.

9

3.3 Negotiation Performatives

All of the functional agents in a MAS should have some understanding of system ontology and use
a certain Agent Communication Language (ACL) to make conversation, transfer information, share
knowledge and negotiate with each other, which offers a minimal set of performatives to describe
agent actions and allows users to extend them if the new defined ones conform to the rules of ACL
syntax and semantics. Knowledge Query and Manipulation Language (KQML) and the ACL
defined by Foundation for Intelligent Physical Agents/Agent Communication Language (FIPA
ACL) are the most widely used and studied ACLs. In KQML there are no predefined performatives
for agent negotiation actions. In FIPA ACL there are some performatives, such as proposal, CFP
and so on, for general agent negotiation processes, but they are not sufficient for our purposes. For
example, there are no performatives to handle third party negotiation. The NRL system design
presents a negotiation performative set designed for MAS dealing with supply chain management
[31].

In the following table, we give the negotiation performatives’ name, their corresponding meaning
and the possible performatives a functional agent can use to reply when certain performative comes
in:

Name Meaning Performative Responsed
CFP call for proposal Proposal | Terminate
CFMP call for modified proposal Proposal | Terminate
Reject reject a proposal Proposal | Terminate
Terminate Terminate the negotiation NONE
Accept accept a proposal NONE
Proposal The action of submit a proposal Accept | Reject | Terminate | CFMP

Initially, one agent starts negotiation by sending a CFP message to the other agent. After several
rounds of conversation in which proposes and counter-proposes are exchanged, the negotiation
between two agents will end when one side accepts (rejects) the other side’s proposal or terminates
the negotiation process without any further explanation [31].

4. Experiments and Results
A preliminary experimental learning system form NRL was constructed earlier to evaluate the
feasibility of learning decision rules for a buyer agent. A set of 27 training samples was manually
generated following the data schema. These samples were fed to C5.0, a decision-tree learning
package (a descendant of Ross Quinlan’s classic ID3 decision tree learning algorithm [14,15])
obtained from RuleQuest Research, http://www.rulequest.com/.
The learning was successfully completed, a decision tree was constructed, from which a set of eight
decision rules were generated by C5.0. These rules suggest to the buyer agent what actions it
should take next (i.e., what types of messages should be sent out next), based on factors such as
how well the terms such as price, quantity and delivery date in its current “Call-For-Proposal”
match those in the returned “Proposal” from the seller it is negotiating with, the reputation of the
seller, and how deep the current negotiation is progressing.

The initial results from the experiment were encouraging. It is shown that the decision tree learning
can be an effective tool to learn the negotiation rules from the data that reflects the past negotiation

10

experience. The few rules learned are in general reasonable and consistent with the intuitions we
had and used to generate the training data. However, the experiment was restricted by the quality
of the training data (only 27 hand made samples were used), and the results were far from
convincing (only three rules were learned).

Encouraged by the results from the preliminary study, we went forward to conduct a more
extensive experiment of NRL by decision tree learning. The main extensions include the following:
• A program to automatically generate training samples is developed. This data generator is based
on a set of decision rules that take into considerations of all important decision factors for
making proposals (and counter proposals). Unlike the small data set generated manually and
somewhat arbitrarily in the preliminary study, this data generator allows us to experiment with
large quantity of learning samples (hundreds or even thousands of them) that are consistent and
more realistic.
• A better data schema is selected after a series of experiments to yield good results.
• The buyer agent not only learns decision rules for itself but also learns rules that the seller
appears to use, thus constructs a (simple) model of the seller.

Next we summarize the experiment system and present the results.

4.1 The Training Data Generator

More realistic training data may induce better, more meaningful rules. They can also be used to test
and validate the learning results. Since no realistic data set of sufficient size is available to us, nor
could we obtain help from human experts in supply chain negotiation, we were not able to
adequately resolve this problem to our complete satisfaction. As an alternative, we have developed
a training sample automatic generator to generate as many samples as we need. This sample
generator essentially simulates the negotiation process on both the buyer side and seller side, based
on two sets of decision rules encoded in the program, to generate a stream of message exchanges
between the buyer and the seller. The actual training samples are then extracted from these
messages. By changing the relevant rules, the sample generator can be used to generate samples
reflecting different negotiation scenarios.

 Message format: (msg_type, sender, receiver, item, price, quantity, delivery_date)
For example, a negotiation session may start with the following message from the buyer to the
seller (CFP Buyer Seller X 9.25 50 7) that Buyer wishes to entertain a proposal from Seller for
50 pieces of good X at price $9.25 a piece, to be delivered 7 days from this day. Seller may
response with a message (Proposal Seller Buyer X 11.25 50 7) that it can sell these good by the
given delivery date at the unit price of $11.25. To simplify the experiment, we chose to let the
buyer and the seller to negotiate only one type of good, named X.

 System parameters: A set of random numbers are used to determine the attribute values for the
CFP message for each negotiation session and the values for the initial Proposal message in
response to the CFP message. These numbers, to an extent, simulate the dynamic nature of the
internal and external environment in which the negotiation is taking place. These parameters
include the following:
Buyer’s need for good X:

11

− quantity: a random number between 1 and 200 (with 0.1 probability in [1,50], 0.8
probability in [51,150], and 0.1 probability in [151, 200]), and it can be mapped to three
regions: [1,50] {small}, [51, 150] {ordinary}, [151, 200] {large}.
− delivery_date: a random number between 1 and 20, it can also be mapped to three
regions: 1~5 days{short}, 6~15 days{regular}, 16~20 days{long}.
− asking price: a random number between 7 and 11, and the fair market unit price is $10,
the range of all possible prices is partitioned into six regions: min (7t price< 8), low
(8( price<9), normalminus(9p price<10), normalplus (10p pricep 11), high
(11<price( 12), and max (12<price1 13).
− importance of the order: a random number of binary value
− Seller’s reputation: a random number of binary value
Seller’s capacity to supply good X:
− daily production capacity of good X: a random number between 8 and 12
− current inventory: a random number between 20 and 50
− importance of the Buyer as a customer: a random number of binary value

With these random numbers, each negotiation starts with a CFP message of different quantity,
delivery_date, and asking price. In response, the seller first determines if the requested
delivery_date can be met for the given quantity based on the current inventory and the daily
production capacities from this day to the delivery_date. To simplify the data generation, we
assume that the seller submit its initial proposal (usually with a price higher than the asking price in
the CFP message) only if the requested quantity and date can be met. The negotiation then
continues with the buyer gradually increasing the asking price and the seller decreasing the bidding
price until the session ends. The details of the negotiation are governed by decision rules at either
side.

 Negotiation rules for Seller agent: The following rules are used in the data generator to form
response messages from Seller agent.

SR-1: Terminate the negotiation IF Seller cannot meet the quantity-date requested in the incoming CFP message
SR-2: Terminate the negotiation IF asking price = min (7 price< 8)
SR-3: Terminate the negotiation IF asking price = low (8 price<9) & Buyer is NOT an important customer
SR-4: Otherwise, submit a Proposal with the requested quantity and date, and the price is determined by:
SR-41: IF the asking price = normalplus or high or max (10S pricep 13) THEN bidding price = asking price
SR-42: Otherwise,
IF incoming msg-type = CFP THEN
IF asking price = low (8I price<9) & Buyer is an important customer
THEN propose a higher price (bidding price>asking price)
ELSE IF asking price = normalminus (9E price<10) Buyer is NOT an important customer
THEN propose a higher price (bidding price>asking price)
ELSE IF asking price = normalminus (9E price<10) Buyer is an important customer
THEN bidding price = asking price
ELSE propose a lower price (bidding price asking price)

 Negotiation rules for Buyer agent: The following rules are used in the data generator to form
messages from Buyer agent.

BR-1: Terminate the negotiation IF bidding price =max (12<pricet 13)
BR-2: Terminate the negotiation IF bidding price = high (11<pricet 12) & the current depth of negotiation 1 7
BR-3: Reject the incoming proposal IF bidding price = high (11<price t 12) & either Seller’s reputation is bad or
this order is not important & the current depth of negotiation < 7

12

BR-4: CFMP for a lower price IF bidding price = high (11<price 12) & Seller’s reputation is good & this order is
important & the current depth of negotiation < 7
BR-5: Accept the current proposal IF the bidding pricet 10
BR-6: Accept the current proposal IF the bidding price = asking price (delta_price =0)
BR-7: Accept the current proposal IF bidding price = normalplus (10<price 11) & Seller’s reputation is good &
The order is important.
BR-8: CFMP for a lower price IF bidding price = normalplus (10<price 11) & either Seller’s reputation is bad or
this order is not important & the current depth of negotiation < 7
BR-9: Terminate IF bidding price = normalplus (10<price 11) & either Seller’s reputation is bad or this order is
not important the current depth of negotiation n 7

The function used in negotiation to reduce and increase price by 'buyer' agent: 1/(1 + e xp(-x)) (x maybe from -3 to +3)
The function used in negotiation to reduce and increase price by 'seller' agent: (Depth_Max - depth+1) * tan(0.25)

depth 0 1 2 3 4 5 6 7
diff_buyer N/A 0.05 0.12 0.27 0.50 0.73 0.88 0.95
diff_seller 2.043 1.787 1.532 1.277 1.021 0.766 0.511 0.255

4.2 Data Schema for Training Samples

The negotiation process in NRL is very complex. Consider just the task of making a counter
proposal by a buyer when it receives a new proposal during a negotiation session. This task
amounts to optimizing a function (e.g., payoff) based on a high dimensional many-to-many
mapping. The input involves parameters reflecting the enterprise’s planning and execution
(customer orders, production capacity, inventory, etc.); the distance between asking and bidding
values of negotiation terms (prices, quantities, delivery dates, etc.) at the given point of a
negotiation session; the trustworthiness of the negotiation partner; how long the current session has
been (the longer it lasts, the less likely a good deal can be made); the importance to the buyer for
the on-going negotiation to succeed; and the availability of other alternative sellers, etc. The output
is also composed of a large number of parameters that gives a detailed description of a (counter)
proposal. The training samples for DT learning are composed by these attributes.

A training sample is a vector or a list of values for a set of attributes extracted from a message
exchanged during the negotiation. A sample can be divided into two parts. The first part involves
attributes one hopes that the learned rules can be used to generate. They are thus referred to as
learning target. The second part includes those attributes that support the conclusions of the
learned rules on assigning values to the target attributions; they are referred to as decision
parameters. The data model or data scheme for training samples specifies what is to be learned (the
target attribute) and what are the decision parameters (other attributes the target depend on).

A training sample is synthesized from three consecutive messages: the current incoming message,
the one that precedes it, and the one in response to it. Figure 2 is an example of training samples
used in our early learning experiment.

13

In this experiment, to simplify the investigation, we have decided to focus on learning rules for
determining appropriate message type for response to an incoming message, namely, our learning
target is the performative (or message type) that will be used to response the incoming one. The
target attribute will be either CFP, CFMP, Terminate, Accept, Reject for Buyer, Terminate and
Proposal for Seller.

Selection of decision parameters is more complicated. Since the type of each outgoing message
from an agent is determined by the content of the incoming message and the content of the previous
message from the same agent, a large number of attributes that may potentially affect the new
message type can be extracted from the two proceeding two messages and from their differences.
For example, consider the situation that Buyer receives a Proposal (msg-2) from Seller after
sending a CFP message (msg-1). The new message (msg-3) from Buyer, in responding to msg-2,
may depend on:

attributes from msg-1:
bprice (Buyer’s asking price)
bquantity (Buyer’s requested quantity)
bdate (Buyer’s requested delivery_date)
last_msg (type of the last msg from Buyer to Seller)
attributes from msg-2:
sprice (Seller’s bidding price)
squantity (Seller’s proposed quantity)
sdate (Seller’s proposed delivery_date)
incoming_msg (type of the incoming msg from Seller)
attributes from the difference between msg-1 and msg-2:
delta_price
delta_quantity
delta_date
match_dq (true only if both delta_quantity and delta_date are zero)
attributes about other properties of Buyer:
opp-reputation (Buyer’s evaluation of Seller’s reputation)
weight-item (whether this order is important to Buyer)
depth (number of msgs Buyer has sent during the current session)

14

A small set of decision parameters may be insufficient for the learning module to differentiate
training examples, and thus resulting in a decision tree with many ambiguous leave nodes (nodes
with multiple labels). On the other hand, a large set of decision parameters may refine the decision
tree to a level that is too detailed to be practically useful because the induced tree would have a
great height and a large number of branches (rules). For example, one of our experimental run used
all the parameters listed above. The induced decision tree for Buyer has height of 14, which means
that some rules would have to check as many as 14 conditions before it draws a conclusion.
Moreover, a total of more than 100 rules are generated from this tree. It may be possible to obtain a
workable smaller set of rules can be obtained from these raw rules by some pruning techniques, but
this would require a substantial post-learning process.

After several trials, we have chosen the follows decision parameter sets for the Buyer and Seller,
respectively.

Buyer: sender, depth, receiver, last_msg, incoming_msg, item, sprice, opp_reputation, weight_item, match_qd.
Seller: sender, depth, receiver, last_msg, incoming_msg, item, bprice, opp_importance, match_qd.

4.3 The Experiment Results

We have experimented three software packages for decision tree learning, they are (1) C5.0/See5.0
from RuleQuest Research [27], (2) Ripper from ATT [28], and (3) ITI (Incremental Tree Induced)
from University of Massachusetts [26]. C5.0/See5.0 was not selected for the final experiment
because the free-of-charge version we have restricts the dataset to have no more than 200 training
samples, which are not sufficient to make the learning process converge. Ripper, although not
restricting the size of the training set, was rejected because it always produces a very small number
of rules (possibly due to a severe pruning process it uses to generate the final output). ITI was
selected not only because it works well with our learning task but also because it supports
incremental learning, a valuable feature we plan to further explore in the future.

3000 randomly generated negotiation sessions were generated by the automatic data generator
described in Section 3.1. Each session includes a sequence of message exchanges between Buyer
and Seller, starting with a CFP message from Buyer. The experiment showed that this amount of
training samples is sufficient for the learning process to converge (the induced tree becomes
stable). Datasets of smaller size may be used to learn most, but not all decision rules, because they
do not contain all possible scenarios, especially those with small probabilities. These samples were
fed into ITI under the data model described in Section 3.2. Two induced decision trees and their
corresponding data model files, for Buyer and Seller, respectively, were included in Appendix.

Learned rules for Buyer: 12 rules can be generated from the induced decision tree, corresponding
to the 12 paths in the tree. Two rules (the first and last, counting from left to right) are related to
starting and ending a session, the 10 others are rules for determining the new message types. These
rules match very well with the rules used to generate the training data. There is no apparent
inconsistency between these two sets of rules. For example, the second rule
IF sprice = max THEN Terminate
is the same as BR-1 in Section 3.1. The next three rules (all under the condition that quantity and
delivery-date match)
IF sprice = high AND weight_item = unimportant THEN Reject
IF sprice = high AND weight_item = important AND opp_reputation = bad THEN Reject
IF sprice = high AND weight_item = important AND opp_reputation = good THEN CFMP

15

jointly match the rules of BR-3 and BR-4. All other induced rules also match the data generation
rules well.

Learned rules for Seller: 6 rules can be generated from the decision tree, corresponding to the 6
paths in the tree. The last two rules are related to starting and end a session, the first 4 for
determining the new message types. These rules again match very well with the rules used to
generate the training data. The first two rules from the tree
IF bprice = min THEN Terminate
IF bprice = low AND opp_weight = unimportant THEN Terminate
are the same as SR-2 and SR-3 in Section 3.1, the next two rules
IF bprice = low AND opp_weight = important THEN Proposal
IF bprice > low THEN Proposal
jointly match the rules of SR-4.

5. Conclusions and Future Work
Due to the inherent complex, uncertain, and dynamic nature of the multi-agent systems, it is very
difficult, if not impossible to encode agents’ decision strategies a priori. Learning while doing
becomes imperative for constructing MAS in applications. This is also true for automatic
negotiation systems for supply chain management where each entity operates autonomously and
interacts with others (its negotiating partners) to reach deals. We have begun an empirical
investigation of the feasibility of adopting some existing machine learning techniques to learn
negotiation rules (NRL) from transaction data (mainly from the messages exchanged among
negotiation agents). Our experiment results show that, with carefully designed data scheme and
sufficiently many training samples, decision tree learning method can be used to effectively learn
decision rules for some e-commerce activities such as negotiation in supply chains.

More interestingly, our experiment showed that the Buyer agent could learn the model of its partner
(the Seller) using only the information available in the messages they exchanged during the
negotiation. Although the model learned is about the behavior of the seller, not about the
underlying mechanism governing the decision making at the Seller agent, it may provide the buyer
some power to predict the response from the seller or to choose the actions that will bring the most
desired responses.

Although our experiment only involves learning how to determine one aspect of the responding
message, namely its message type, it is conceivable that this method can be used to learn other
aspects (e.g., how to set the price) from the same raw data (possibly with different data models). In
other words, a more complete model (with multiple facets) of an agent can be constructed by
simultaneously running multiple decision tree learning processes, one for an aspect of the agent.

In future, more investigation can be considered in the following directions: (1) Experiment decision
tree learning with other, preferably real-world training data. (2) Study the issue of how to
incorporate the learned model of your partner (or opponent) to improve your negotiation decision-
making. (3) Investigate the applicability of incremental decision tree learning and how it can
improve the agent’s performance by making it adaptive to the changing environment. (4) Develop
some hybrid learning architecture that employs different learning techniques for different aspects of
the e-commerce activities.

References

16

[1] Arai, S. Sycara, K., and Payne, T.R. (2000). Multi-agent Reinforcement Learning for Scheduling
Multiple-Goals. In Proceedings of the Fourth International Conference on Multi-Agent Systems
(ICMAS'2000).
[2] Arrow, K. (1962). The implications of learning by doing. Review of Economic Studies, 29, 166170.
[3] Brazdil, P., Gams, M., Sian, S., Torgo, L., & van de Velde, W. (1991). Learning in distributed systems
and multi-agent environments. In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 412--423).
Lecture Notes in Artificial Intelligence, vol. 482. Berlin: SpringerVerlag.
[4] Caglayan, A., et al., (1996). Lessons from Open Sesame!, a User Interface Agent. In Proceedings of
PAAM ’96.
[5] Haynes, T., Lau, K., and Sen, S. (1996). Learning Cases to Compliment Rules for Conflict Resolution in
Multiagent Systems. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and
Learning in Multiagent Systems, Stanford, CA, March, 1996.
[6] Hu, J. and Wellman, M. (1998). Online Learning About Other Agents in a Dynamic Multiagent System.
In Proceedings of the Second International Conference on Autonomous Agents (Agents98), Minneapolis,
MN, USA, May 1998
[7] Humphrys, M. (1995). Wlearning: Competition among selfish Q-learners. Technical Report no. 362.
Computer Laboratory, University of Cambridge.
[8] Kimbrough, S.O., Wu, D.J., and Zhong, F. (2000). Artificial Agents Play the Beer Game, Eliminate the
Bullwhip Effect, and Whip the MBAs. http://grace.wharton.upenn.edu/~sok/fmec/schedule.html
[9] Maes, P. (1994). Social interface agents: Acquiring competence by learning from users and other agents.
In O. Etzioni (Ed.), Working Notes of the 1994 AAAI Spring Symposium on Software Agents.
[10] Mor, Y., Goldman, C.V., and Rosenschein, J.S. (1996). Learn Your Opponent's Strategy (in Polynomial
Time). In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 164-176).
Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag, 1996.
[11] Mundhe, M. and Sen, S. (1999). Evaluating Concurrent Reinforcement Learners. In Proceedings of
IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden.
[12] Pannu, A. and Sycara, K. (1996). Learning Personal Agent for Text Filtering and Notification. In
Proceedings of the International Conference of Knowledge-Based Systems (KBCS 96), Dec., 1996.
[13] Payne, T.R., Edwards, P., & Green, C.L. (1995). Experience with rule induction and k-nearest neighbor
methods for interface agents that learn. In WSIMLC95).
[14] Quinlan, J.R. (1986). “Induction of Decision Trees”. Machine Learning, 1, 81-106.
th
[15] Quinlan, J.R. (1993). “Combining Instance-Based and Model-Based Learning”, in Proceedings of 10
International Conference on Machine Learning, 236-243.
[16] Schmidhuber, J. (1996). A General Method For Multi-Agent Reinforcement Learning In Unrestricted
Environments. Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and
Learning in Multiagent Systems, Stanford, CA, March, 1996.
[17] Sian, S.S. (1991). Extending Learning to Multiple Agents: Issues and a Model for Multi-Agent Machine
Learning (MAML). In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 440--456). Berlin:
Springer-Verlag.
[18] Stone, P. and Veloso, M. (1996). Collaborative and Adversarial Learning: A Case Study in Robotic
Soccer. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in
Multiagent Systems, Stanford, CA, March, 1996.
[19] Suryadi, D. and Gmytrasiewicz, P.J. Learning Models of Other Agents Using Influence Diagrams. In
Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999,
Stockholm, Sweden.
[20] Tesauro, G. (1999). Pricing in Agent Economies Using Neural Networks and Multi-Agent Q-learning.
In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999,
Stockholm, Sweden.
[21] Vidal, J. and Durfee, E. (1997) Agents Learning about Agents: A Framework and Analysis. In Working
Papers of the AAAI-97 Workshop on Multiagent Learning.
[22] Weiß, G. and S. Sen (Eds.) (1996). Adaptation and Learning in Multi-Agent Systems (pp. 1-21).
Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag.
[23] Weiß, G. (1996). Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography,
In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 1-21). Lecture
Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag.

17

Supplier

[24]Zeng, D. and Sycara, K (1996). Bayesian Learning in Negotiation. In Working Notes of the AAAI 1996
Stanford Spring Symposium Series on Adaptation, Coevolution and Learning in Multiagent Systems.
[25] Zeng, D. and Sycara, K (1997). Benefits of Learning in Negotiation. In Proceedings of AAAI.
[26] http://www.cs.umass.edu/~lrn/iti/ -- Incremental Tree Induced (ITI) By UMASS
[27] http://www.rulequest.com/ -- C5.0/See5.0 for DT learning from Rulequest Research
[28] http://www.research.att.com/~diane/ripper/ripper-2.5.tar.gz -- Ripper for DT learning from ATT
.
[29] P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,”
Under review for journal publication, February, 1997.
[30] M. Barbuceanu, and M. S. Fox, “The Information Agent: An Infrastructure Agent Supporting
Collaborative Enterprise Architectures,” in Proceedings of Third Workshop on Enabling Technologies:
Infrastructure for Collaborative Enterprises, Morgantown, West Virginia, IEEE Computer Society Press,
1994.
[31] Ye Chen, Yun peng, Tim Finin, Yannis Labrou, Scott Cost, BIll Chu, Rongming Sun and Bob
Willhelm, “A negotiation-based Multi-agent System for Supply Chain Management”, Workshop on supply
chain management, Autonomous Agents '99, Seattle, WA, May 1999.

Acknowledgement
I will like to thank my advisor Dr. Yun Peng for his great help for this Master’s project and Dr.
Charles K Nicholas for the reviewer of this report. I also want to thank Mr. Ye Chen and Dr. Tim
Finin for their pertinent suggestions.

Appendix: Induced decision trees (by ITI)
Data Model: selleri.names

Terminate, Proposal, Stop, NIL.

sender: buyer, seller.
depth: continuous.
receiver: buyer, seller.
last_msg: Terminate, Proposal, Stop, NIL.
response_msg: CFP, CFMP, Terminate, Accept, Reject, NIL.
item: X, Y.
bprice: min, low, normalminus, normalplus, high, max, NIL.
opp_weight: important, unimportant, NIL.
match_qd: match, unmatch, NIL.

Decision tree for the Seller agent

18

Data Model: buyeri.names

CFP, CFMP, Terminate, Accept, Reject, Stop, NIL.

sender: buyer, seller.
depth: continuous.
receiver: buyer, seller.
last_msg: CFP, CFMP, Terminate, Accept, Reject, Stop, NIL.
response_msg: Terminate, Proposal, NIL.
item: X, Y.
sprice: min, low, normalminus, normalplus, high, max, NIL.
opp_reputation: bad, good, NIL.
weight_item: important, unimportant, NIL.
match_qd: match, unmatch, NIL.

Decision tree for the Buyer agent

19

CMSC698.doc

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (6)

En vedette

En vedette (7)

Similaire à CMSC698.doc

Similaire à CMSC698.doc (20)

Plus de butest

Plus de butest (20)

CMSC698.doc