2. Introduction
• A resource can be logical, such as a
shared file or physical such as CPU.
• The set of available resources in a
distributed system acts like a single
virtual system.
3. Cont…
• Resource manager:
– Controls the assignment of resources to
processes.
– Routes the processes to suitable nodes of
the system in such a manner that resource
usage, response time, network congestion,
and scheduling overhead are optimized.
4. Techniques
Task assignment approach:
• Each process submitted by a user for
processing is viewed as a collection of
related tasks.
• Tasks are scheduled to suitable nodes to
improve performance.
5. Cont…
Load-balancing approach:
• All the processes submitted by the users are
distributed among the nodes of the system.
• Equalizes the workload among the nodes.
7. Desirable features of a good
Scheduling Algorithms
• No a priori knowledge about the processes.
• Dynamic in nature.
• Quick decision-making capability.
• Balanced system performance.
• Stability.
• Scalability.
• Fault tolerance.
• Fairness of service.
8. Task assignment approach
• A process is considered to be composed
of multiple task.
• Goal is to find an optimal assignment
policy for the task of an individual
process.
9. Cont…
Assumptions:
1. A process has already been split into
pieces called tasks.
2. Amount of computation required by each
task and speed of each processor are
known.
3. The cost of processing each task on
every node of the system is known.
4. The IPC costs between every pair of task
is known.
10. Cont…
5. Other constraints, like Resource
requirements of the tasks and the
available resources at each node are
also known.
6. Reassignment of the tasks is generally
not possible.
11. Cont…
• Goals:
Minimization of IPC costs
Quick turnaround time for the complete process
A high degree of parallelism
Efficient utilization of system resources in general
• These goals often conflict with each other.
12. Cont…
• Two task assignment parameters
– Task execution cost and
– Inter-task communication cost
16. Cont…
Serial assignment execution cost (x)
= x11+x21+x31+x42+x52+x62
= 5+2+4+3+2+4 = 20
Serial assignment communication cost (c)
= c14+c15+c16+c24+c25+c26+c34+c35+c36
= 0+0+12+12+3+0+0+11+0 = 38
Total Serial assignment cost
= x + c = 20 + 38 = 58
17. Cont…
Optimal assignment execution cost (x)
= x11+x21+x31+x41+x51+x62
= 5+2+4+6+5+4 = 26
Optimal assignment communication cost (c)
= c16+c26+c36+c46+c56
= 12+0+0+0+0 = 12
Total Optimal assignment cost
= x + c = 26 +12 = 38
18. Load balancing approach
• Load balancing Algorithms are also known as
load-leveling algorithms.
– Based on the intuition of better resource utilization.
• Algorithm tries to balance the total system load
by transparently transferring the workload from
heavily loaded nodes to lightly loaded nodes.
• Goal: maximize the total system throughput.
20. Static vs. Dynamic
• Static algorithms:
• Use only information about the average
behavior of the system, ignoring the current
state of the system.
• Simpler because no need to maintain and
process system state information.
• Do not react to the current system state.
21. Cont…
• Dynamic algorithms:
• React to the system state that changes
dynamically.
• Able to avoid those state with unnecessarily
poor performance.
• More complex than static algorithms.
22. Deterministic vs. Probabilistic
• Both are Static load balancing algorithms.
• Deterministic algorithms:
– Use the information about the properties of
the node and characteristics of the processes.
– Difficult to optimize and cost more to
implement.
23. Cont…
• Probabilistic algorithms:
– Use information regarding static attributes of
the system.
– Easier to implement.
– Suffer from having poor performance.
24. Centralized vs. Distributed
• Centralized algorithm:
– The responsibility of scheduling physically
resides on a single node.
– System state information is collected at a single
node at which all the scheduling decisions are
made.
• Known as Centralized server node.
25. Cont…
– Problem : reliability
• If the centralized server fails, all scheduling in the
system would cease.
– Solution : replicate the server on K+1 nodes if it
is to survive k faults.
26. Cont…
• Distributed algorithms:
– The work involved in making process assignment
decisions is physically distributed among the
various nodes of the system.
– Avoids the bottleneck of collecting state
information at a single node.
– Allows the scheduler to react quickly to dynamic
changes in the state.
27. Cont…
– Algorithm is composed of entities known as local
controllers.
– Each entity is responsible for making scheduling
decisions for the processes of its own node.
28. Co-operative vs. Non-Cooperative
• Non-cooperative algorithms :
• Individual entities act as autonomous entities
and make scheduling decisions independently of
the actions of other entities.
29. Cont…
• Cooperative algorithms:
• Distributed entities cooperate with each other to
make scheduling decisions.
• More complex and involve larger overhead than
non-cooperative.
30. Issues in designing load
balancing algorithms
• Load estimation policy
• Process transfer policy
• State information exchange policy
• Location policy
• Priority assignment policy
• Migration limiting policy
31. Cont…
• Local Process
– A process which is processed at its
originated node.
• Remote Process
– A process which is processed at a node
different than the one on which it
originated.
32. Load Estimation Policies
• Estimation based on parameters like:
1. Total no. of processes on the node at the
time of load estimation.
2. Resource demands of these processes.
3. Instruction mixes of these processes.
4. Architecture and speed of the node’s
processor.
33. Cont…
• Sum of the remaining service times of
all the processes on a node can be a
measure for estimating a node’s
workload.
• Issue: how to estimate the remaining
service time of the processes?
34. Cont…
• Solutions:
1. Memoryless method
• This method assumes that all processes have
the same expected remaining service time,
independent of the time used so far.
• It reduces the load estimation method to that
of total number of processes.
35. Cont…
2.Pastrepeats
• This method assumes that the remaining
service time of a process is equal to the time
used so far by it.
3.Distribution method
• If the distribution of service times is known,
the associated process’s remaining service
time is the expected remaining time
conditioned by the time already used.
36. Process Transfer Policies
• Load balancing algorithms use the threshold
policy to decide whether a node is lightly or
heavily loaded.
• The threshold value of a node:
– the limiting value of its workload,
– used to decide whether a node is lightly or heavily
loaded.
37. Cont…
• Methods to determine the threshold
value of a node:
1. Static policy
• Each node has a predefined threshold value
depending on its processing capability.
• This value does not vary with the dynamic
changes in workload at local or remote nodes.
• No exchange of state information among the
nodes to decide this value.
38. Cont…
2. Dynamic policy
• The threshold value of a node is calculated as a
product of the average workload of all the nodes
and a predefined constant (ci).
• Nodes exchange state information by using one of
the state information exchange policies.
• It gives a more realistic value of threshold for each
node.
• Involves overhead in exchange of state information.
39. Cont…
• Most load balancing algorithms uses a single
threshold and thus only have overloaded and
under loaded regions.
41. Cont… : Single-threshold policy
• A node accepts new processes (either local or
remote) based on its load.
– Accepts if load is below the threshold value.
– Rejects if load is above the threshold value.
• It makes scheduling algorithms unstable.
42. Cont…
– A node should only transfer one or more of its
processes to another node if such transfers
greatly improves the performance of the rest of
its local processes.
– A node should accept remote processes if its
load is such that the added workload of
processing these incoming processes does not
significantly affect the service to the local ones.
43. Cont… : Double-threshold policy
• Also known as high-low policy.
• Use of two threshold values: high mark
and low mark.
• Three regions :
– Overloaded
– Normal
– Under loaded
44. Cont…
• For overloaded region:
– New local processes are sent to be run
remotely and requests to accept remote
processes are rejected.
• For normal region:
– New local processes run locally and
requests to accept remote processes are
rejected.
45. Cont…
• For underloaded region:
– New local processes run locally and
requests to accept remote processes are
accepted.
47. Cont… : Threshold
• A destination node is selected at random.
• A check is made to determine
– whether the transfer of the process to that
node would place it in a state that prohibits
the node to accept remote processes.
– If not, the process is transferred to the
selected node, which must execute the
process regardless of its state when the
process actually arrives.
48. Cont…
• If the check indicates that the selected node is
in a state that prohibits it to accept remote
processes, another node is selected at random
and probed in the same manner.
• A static probe limit LP is used here.
• The performance with a small probe limit (e.g.
3 or 5) is almost as good as the performance
with a large probe limit (e.g. 20).
49. Cont… : Shortest
• LP distinct nodes are chosen at random and
each is polled in turn to determine its load.
• The process is transferred to the node having
the minimum load value, unless that node’s
load is such that it prohibits the node to
accept remote processes.
50. Cont…
• If none of the polled node can accept the
process, it is executed at its originating node.
• Discontinue probing whenever a node with
zero load is encountered.
51. Cont… : Bidding
• Each node in the network is responsible for
two roles : manager and contractor.
• The Manager represents a node having a
process in need of a location to execute.
• The Contractor represents a node that is able
to accept remote processes.
52. Cont…
• To select a node for its processes, a manager
broadcasts a request-for-bids message to all
other nodes in the system.
• The contractors return bids to the manager
node.
• Manager transfers the process to the node
with best bid.
53. Cont…
• Problem:
– A contractor may win many bids from many
other manager nodes and thus becomes
overloaded.
• Solution:
– On choosing best bid, manager node may
send a message to the owner of that bid
and send process on acknowledgement.
54. Cont…
• Both manager and contractor are free
to take decisions.
• Drawback of bidding policy:
– Communication overhead
– Difficult to decide a good pricing policy.
55. Cont… : Pairing
• This policy reduces the variance of loads only
between pairs of nodes of the system.
• Two nodes that differ greatly in load are
temporarily paired with each other.
• The load-balancing operation is carried out
between the nodes belonging to the same
pair.
56. Cont…
• A node only tries to find a partner if it has at
least two processes; otherwise migration from
this node is never reasonable.
• Use of random selection of pair.
• The pair is broken as soon as the process
migration is over.
57. State information exchange
policies
1. Periodic broadcast
2. Broadcast when state changes
3. On- demand exchange
4. Exchange by polling
58. Cont…: Periodic broadcast
• Each node broadcasts its state information
after the elapse of every t units of time.
• Generates heavy traffic.
• Possibility of fruitless messages being
broadcast.
• Poor scalability problem.
59. Cont…: Broadcast when state
changes
• A node broadcasts its state information
only when its state changes.
– When a process arrives at that node or
when a process departs from that node.
– When its state switches from the normal
load region to either the underloaded
region or the overloaded region.
• Works with two-threshold transfer policy.
60. Cont…: On- demand exchange
• A node broadcasts a StateInformationRequest
message when its state switches from the
normal load region to either the underloaded
region or the overloaded region.
• Receiving nodes send their current state to the
requesting node.
• Policy works with two-threshold transfer policy.
61. Cont…
• The status of the requesting node is included
in the StateInformationRequest message.
• If this status is
– Underloaded, only overloaded nodes will
respond to it.
– Overloaded, only underloaded nodes will
respond to it.
62. Cont…: Exchange by polling
• No need for a node to exchange its state
information with all other nodes in the system.
• When a node needs the cooperation of some
other node for load balancing, it can search
for a suitable partner by randomly polling the
other nodes one by one.
64. Cont…
• Selfish:
– Local processes are given higher priority
than remote processes.
– Yields the worst response time
performance among other policies.
• Poor performance of remote processes.
• Best response time for local processes.
65. Cont…
• Altruistic:
– Remote processes are given higher priority
than local processes.
– Policy has best response time of the three
policies.
66. Cont…
• Intermediate:
– The priority of processes depends on the
number of local processes and the number
of remote processes at the concerned
node.
– If no. of local nodes is greater than or
equal to the no. of remote processes,
priority will be given to local processes
otherwise to remote processes.
– Overall response time performance is
much closer to that of the altruistic policy.
67. Migration- limiting policies
• A decision about the total no. of times a
process should be allowed to migrate.
• Two migration-limiting policies:
– Uncontrolled
– Controlled
68. Cont…
• Uncontrolled
– A remote process arriving at a node is treated just
as a process originating at the node.
– A process may be migrated any no of times.
69. Cont…
• Controlled
– To overcome the instability problem of the
uncontrolled policy, most system treat remote
processes different from local processes and
use a migration count parameter.
70. Load sharing approach
• For the proper utilization of the resources of
a distributed system it is not required to
balance the load on all the nodes.
• It is necessary and sufficient to prevent the
nodes from being idle while some other
node have more than two processes.
72. Issues in designing load-sharing
algorithms
• Load sharing algorithms do not attempt to balance the
average workload on all the nodes of the system,
rather they only attempt to ensure that no node is idle
when a node is heavily loaded.
• The priority assignment policies and migration limiting
policies are same as that for the load-balancing
algorithms. Other policies are described here.
73. Cont… : Load Estimation Policy
• It is sufficient to know whether a node is busy
or idle.
• Methods for estimating load:
– Count the total number of processes on a node.
– Measure CPU utilization.
74. Cont… : Process transfer policies
• All-or-nothing strategy:
– Uses the single threshold policy with the threshold
value of all the node fixed at 1 and some uses 2.
– Drawback : Loss of available processing power in
the system.
– Solution : use a threshold value of 2 instead of 1.
76. Sender initiated location policy
• Heavily loaded nodes search for lightly loaded
node to which work may be transferred.
• When load becomes more than the threshold
value, it either broadcasts a message or randomly
probes the other nodes one by one to find a lightly
loaded node.
77. Cont…
• In the broadcasting method, the presence or
absence of a suitable receiver node is known as
soon as the sender node receives reply messages
from the other nodes.
78. Cont…
• In the random probing method, the probing
continues until either a suitable node is found or
the no. of probes reaches a static probe limit, Lp.
• Fixed limit has better scalability than broadcast
method.
79. Receiver-initiated location policy
• Lightly loaded node search for heavily loaded nodes
from which work may be transferred.
• When a node’s load falls below the threshold value
either it broadcasts a message indicating its
willingness to receive processes or randomly probes
the other nodes one by one to find a heavily loaded
node.
• In the broadcast method, a suitable node is found as
soon as the receiver node receives reply messages
from the other nodes.
80. Cont…
• In the random probing method, the probing
continues until either a suitable node is found
or the no. of probes reaches a static probe
limit, Lp.
81. Cont…
• Both sender-initiated and receiver-initiated
policies offer substantial performance
advantages over the situation in which no
load sharing is attempted.
• Sender-initiated policies are preferable at
light to moderate system loads, while
receiver-initiated policies are preferable at
high system loads.
82. Cont…
• If the cost of process transfer under receiver-
initiated policies is significantly greater than
under the sender-initiated policies due to the
preemptive transfer of processes, sender-
initiated policies provide uniformly better
performance.