SlideShare une entreprise Scribd logo
1  sur  92
Télécharger pour lire hors ligne
DISTRIBUTED OPERATING
SYSTEMS
Sandeep Kumar Poonia
CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS
 Time ordering and clock synchronization
 Leader election
 Mutual exclusion
 Distributed transactions
 Deadlock detection
THE IMPORTANCE OF SYNCHRONIZATION
 Because various components of a distributed
system must cooperate and exchange information,
synchronization is a necessity.
 Various components of the system must agree on
the timing and ordering of events. Imagine a
banking system that did not track the timing and
ordering of financial transactions. Similar chaos
would ensure if distributed systems were not
synchronized.
 Constraints, both implicit and explicit, are therefore
enforced to ensure synchronization of components.
CLOCK SYNCHRONIZATION
 As in non-distributed systems, the knowledge
of “when events occur” is necessary.
 However, clock synchronization is often more
difficult in distributed systems because there
is no ideal time source, and because
distributed algorithms must sometimes be
used.
 Distributed algorithms must overcome:
 Scattering of information
 Local, rather than global, decision-making
CLOCK SYNCHRONIZATION
 Time is unambiguous in centralized systems
 System clock keeps time, all entities use this for
time
 Distributed systems: each node has own
system clock
 Crystal-based clocks are less accurate (1 part in
million)
 Problem: An event that occurred after another
may be assigned an earlier time
LACK OF GLOBAL TIME IN DS
 It is impossible to guarantee that
physical clocks run at the same
frequency
 Lack of global time, can cause problems
 Example: UNIX make
 Edit output.c at a client
 output.o is at a server (compile at server)
 Client machine clock can be lagging behind
the server machine clock
LACK OF GLOBAL TIME – EXAMPLE
When each machine has its own clock, an
event that occurred after another event may
nevertheless be assigned an earlier time.
LOGICAL CLOCKS
 For many problems, internal consistency of
clocks is important
 Absolute time is less important
 Use logical clocks
 Key idea:
 Clock synchronization need not be absolute
 If two machines do not interact, no need to
synchronize them
 More importantly, processes need to agree on
the order in which events occur rather than the
time at which they occurred
EVENT ORDERING
 Problem: define a total ordering of all events that
occur in a system
 Events in a single processor machine are totally
ordered
 In a distributed system:
 No global clock, local clocks may be unsynchronized
 Can not order events on different machines using local
times
 Key idea [Lamport ]
 Processes exchange messages
 Message must be sent before received
 Send/receive used to order events (and synchronize
LOGICAL CLOCKS
 Often, it is not necessary for a computer to know the exact
time, only relative time. This is known as “logical time”.
 Logical time is not based on timing but on the ordering of
events.
 Logical clocks can only advance forward, not in reverse.
 Non-interacting processes cannot share a logical clock.
 Computers generally obtain logical time using interrupts to
update a software clock. The more interrupts (the more
frequently time is updated), the higher the overhead.
LAMPORT’S LOGICAL CLOCK
SYNCHRONIZATION ALGORITHM
 The most common logical clock synchronization algorithm
for distributed systems is Lamport‟s Algorithm. It is used in
situations where ordering is important but global time is not
required.
 Based on the “happens-before” relation:
 Event A “happens-before” Event B (A→B) when all
processes involved in a distributed system agree that
event A occurred first, and B subsequently occurred.
 This DOES NOT mean that Event A actually occurred
before Event B in absolute clock time.
LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION
ALGORITHM
 A distributed system can use the “happens-before”
relation when:
 Events A and B are observed by the same
process, or by multiple processes with the same
global clock
 Event A acknowledges sending a message and
Event B acknowledges receiving it, since a
message cannot be received before it is sent
 If two events do not communicate via messages,
they are considered concurrent – because order
cannot be determined and it does not matter.
Concurrent events can be ignored.
LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION
ALGORITHM (CONT.)
 In the previous examples, Clock C(a) < C(b)
 If they are concurrent, C(a) = C(b)
 Concurrent events can only occur on the same system,
because every message transfer between two systems
takes at least one clock tick.
 In Lamport‟s Algorithm, logical clock values for events may
be changed, but always by moving the clock forward. Time
values can never be decreased.
 An additional refinement in the algorithm is often used:
 If Event A and Event B are concurrent. C(a) = C(b), some unique
property of the processes associated with these events can be used
to choose a winner. This establishes a total ordering of all events.
 Process ID is often used as the tiebreaker.
LAMPORT’S LOGICAL CLOCK
SYNCHRONIZATION ALGORITHM (CONT.)
 Lamport‟s Algorithm can thus be used in distributed
systems to ensure synchronization:
 A logical clock is implemented in each node in
the system.
 Each node can determine the order in which
events have occurred in that system’s own point
of view.
 The logical clock of one node does not need to
have any relation to real time or to any other
node in the system.
EVENT ORDERING USING HB
 Goal: define the notion of time of an event such
that
 If A-> B then C(A) < C(B)
 If A and B are concurrent, then C(A) <, = or > C(B)
 Solution:
 Each processor maintains a logical clock LCi
 Whenever an event occurs locally at I, LCi = LCi+1
 When i sends message to j, piggyback Lci
 When j receives message from i
 If LCj < LCi then LCj = LCi +1 else do nothing
 Claim: this algorithm meets the above goals
PROCESS EACH WITH ITS OWN CLOCK
•At time 6 , Process 0 sends message A to Process 1
•It arrives to Process 1 at 16( It took 10 ticks to make journey
•Message B from 1 to 2 takes 16 ticks
•Message C from 2 to 1 leaves at
60 and arrives at 56 -Not Possible
•Message D from 1 to 0 leaves at
64 and arrives at 54 -Not Possible
LAMPORT’S ALGORITHM CORRECTS THE CLOCKS
 Use „happened-before‟
relation
 Each message carries the
sending time (as per sender‟s
clock)
 When arrives, receiver fast
forwards its clock to be one
more than the sending time.
(between every two events,
the clock must tick at least
once)
PHYSICAL CLOCKS
 The time difference between two computers is known as
drift. Clock drift over time is known as skew. Computer
clock manufacturers specify a maximum skew rate in their
products.
 Computer clocks are among the least accurate modern
timepieces.
 Inside every computer is a chip surrounding a quartz
crystal oscillator to record time. These crystals cost 25
seconds to produce.
 Average loss of accuracy: 0.86 seconds per day
 This skew is unacceptable for distributed systems. Several
methods are now in use to attempt the synchronization of
physical clocks in distributed systems:
PHYSICAL CLOCKS
 17th Century: time has been measured
astronomically
 Solar Day: interval between two consecutive
transit of sun
 Solar Second: 1/86400th of solar day
PHYSICAL CLOCKS
 1948: Atomic Clocks are invented
 Accurate clocks are atomic oscillators (one part in 1013)
 BIH decide TAI(International Atomic Time)
 TAI seconds is now about 3 msec less than solar day
 BIH solves the problem by introducing leap seconds
Whenever discrepancy between TAI and solar time grow to
800 msec
 This time is called Universal Coordinated Time(UTC)
 When BIH announces leap second, the power companies
raise their frequency to 61 & 51 Hz for 60 & 50 sec, to
advance all the clocks in their distribution area.
PHYSICAL CLOCKS - UTC
Coordinated Universal Time
(UTC) is the international
time standard.
UTC is the current term for
what was commonly
referred to as Greenwich
Mean Time (GMT).
Zero hours UTC is midnight in
Greenwich, England, which
lies on the zero longitudinal
meridian.
UTC is based on a 24-hour
clock.
PHYSICAL CLOCKS
 Most clocks are less accurate (e.g., mechanical watches)
 Computers use crystal-based blocks (one part in million)
 Results in clock drift
 How do you tell time?
 Use astronomical metrics (solar day)
 Coordinated universal time (UTC) – international standard
based on atomic time
 Add leap seconds to be consistent with astronomical time
 UTC broadcast on radio (satellite and earth)
 Receivers accurate to 0.1 – 10 ms
 Need to synchronize machines with a master or with one
another
CLOCK SYNCHRONIZATION
 Each clock has a maximum drift rate
 1- dC/dt <= 1+
 Two clocks may drift by 2 t in time t
 To limit drift to  resynchronize after every
2 seconds
CHRISTIAN’S ALGORITHM
 Assuming there is one time server with UTC:
 Each node in the distributed system periodically polls the time server.
 Time( treply) is estimated as t + (Treply + Treq)/2
 This process is repeated several times and an average is provided.
 Machine Treply then attempts to adjust its time.
 Disadvantages:
 Must attempt to take delay between server Treply and time
server into account
 Single point of failure if time server fails
CRISTIAN’S ALGORITHM
Synchronize machines to a
time server with a UTC
receiver
Machine P requests time from
server every seconds
 Receives time t from
server, P sets clock to
t+treply where treply is the
time to send reply to P
 Use (treq+treply)/2 as an
estimate of treply
 Improve accuracy by
making a series of
measurements
PROBLEM WITH CRISTIAN’S ALGORITHM
Major Problem
 Time must never run
backward
 If sender‟s clock is
fast, CUTC will be
smaller than the
sender‟s current value
of C
Minor Problem
 It takes nonzero time
for the time server‟s
reply
 This delay may be large
and vary with network
load
SOLUTION
Major Problem
 Control the clock
 Suppose that the timer set
to generate 100 intrpt/sec
 Normally each interrupt
add 10 msec to the time
 To slow down add only 9
msec
 To advance add 11 msec to
the time
Minor Problem
 Measure it
 Make a series of
measurements for accuracy
 Discard the measurements
that exceeds the threshold
value
 The message that came
back fastest can be taken to
be the most accurate.
BERKELEY ALGORITHM
 Used in systems without UTC receiver
 Keep clocks synchronized with one another
 One computer is master, other are slaves
 Master periodically polls slaves for their times
 Average times and return differences to slaves
 Communication delays compensated as in Cristian‟s
algo
 Failure of master => election of a new master
BERKELEY ALGORITHM
a) The time daemon asks all the other machines for their clock values
b) The machines answer
c) The time daemon tells everyone how to adjust their clock
30
DECENTRALIZED AVERAGING ALGORITHM
 Each machine on the distributed system has a daemon
without UTC.
 Periodically, at an agreed-upon fixed time, each machine
broadcasts its local time.
 Each machine calculates the correct time by averaging
all results.
31
NETWORK TIME PROTOCOL (NTP)
 Enables clients across the Internet to be
synchronized accurately to UTC.
 Overcomes large and variable message delays
 Employs statistical techniques for filtering, based on past
quality of servers and several other measures
 Can survive lengthy losses of connectivity:
 Redundant servers
 Redundant paths to servers
 Provides protection against malicious interference
through authentication techniques
32
NETWORK TIME PROTOCOL (NTP) (CONT.)
 Uses a hierarchy of servers located across the Internet.
Primary servers are directly connected to a UTC time
source.
33
NETWORK TIME PROTOCOL (NTP) (CONT.)
 NTP has three modes:
 Multicast Mode:
 Suitable for user workstations on a LAN
 One or more servers periodically multicasts the time to other
machines on the network.
 Procedure Call Mode:
 Similar to Christian‟s Algorithm
 Provides higher accuracy than Multicast Mode because delays
are compensated.
 Symmetric Mode:
 Pairs of servers exchange pairs of timing messages that contain
time stamps of recent message events.
 The most accurate, but also the most expensive mode
Although NTP is quite advanced, there is still a drift of 20-35 milliseconds!!!
MORE PROBLEMS
 Causality
 Vector timestamps
 Global state and termination detection
 Election algorithms
LOGICAL CLOCKS
 For many DS algorithms, associating
an event to an absolute real time is
not essential, we only need to know
an unambiguous order of events
 Lamport's timestamps
 Vector timestamps
LOGICAL CLOCKS (CONT.)
 Synchronization based on “relative time”.
 “relative time” may not relate to the “real
time”.
 Example: Unix make (Is output.c updated after the
generation of output.o?)
 What‟s important is that the processes in
the Distributed System agree on the
ordering in which certain events occur.
 Such “clocks” are referred to as Logical
Clocks.
EXAMPLE: WHY ORDER MATTERS?
 Replicated accounts in Jaipur(JP) and Bikaner(BN)
 Two updates occur at the same time
 Current balance: $1,000
 Update1: Add $100 at BN; Update2: Add interest of 1% at
JP
 Whoops, inconsistent states!
LAMPORT ALGORITHM
 Clock synchronization does not have to be
exact
 Synchronization not needed if there is no
interaction between machines
 Synchronization only needed when machines
communicate
 i.e. must only agree on ordering of interacting
events
LAMPORT’S “HAPPENS-BEFORE” PARTIAL
ORDER
 Given two events e & e`, e < e` if:
1. Same process: e <i e`, for some
process Pi
2. Same message: e = send(m) and
e`=receive(m) for some message m
3. Transitivity: there is an event e* such
that e < e* and e* < e`
CONCURRENT EVENTS
 Given two events e & e`:
 If neither e < e` nor e`< e, then e || e`
P1
P2
P3
Real Time
a b
c
f
d
e
m1
m2
LAMPORT LOGICAL CLOCKS
 Substitute synchronized clocks with a global
ordering of events
 ei < ej LC(ei) < LC(ej)
 LCi is a local clock: contains increasing values
 each process i has own LCi
 Increment LCi on each event occurrence
 within same process i, if ej occurs before ek
 LCi(ej) < LCi(ek)
 If es is a send event and er receives that send,
then
 LCi(es) < LCj(er)
LAMPORT ALGORITHM
 Each process increments local clock
between any two successive events
 Message contains a timestamp
 Upon receiving a message, if received
timestamp is ahead, receiver fast forward
its clock to be one more than sending
time
LAMPORT ALGORITHM (CONT.)
 Timestamp
 Each event is given a timestamp t
 If es is a send message m from pi, then
t=LCi(es)
 When pj receives m, set LCj value as follows
 If t < LCj, increment LCj by one
 Message regarded as next event on j
 If t ≥ LCj, set LCj to t+1
LAMPORT’S ALGORITHM ANALYSIS (1)
 Claim: ei < ej LC(ei) < LC(ej)
 Proof: by induction on the length of the
sequence of events relating to ei and ej
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
1 2
3
5
4
1
g
5
LAMPORT’S ALGORITHM ANALYSIS (2)
 LC(ei) < LC(ej) ei < ej ?
 Claim: if LC(ei) < LC(ej), then it is not
necessarily true that ei < ej
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
1 2
3
5
4
1
g
2
TOTAL ORDERING OF EVENTS
 Happens before is only a partial order
 Make the timestamp of an event e of
process Pi be: (LC(e),i)
 (a,b) < (c,d) iff a < c, or a = c and b < d
APPLICATION: TOTALLY-ORDERED MULTICASTING
 Message is timestamped with sender‟s
logical time
 Message is multicast (including sender itself)
 When message is received
 It is put into local queue
 Ordered according to timestamp
 Multicast acknowledgement
 Message is delivered to applications only
when
 It is at head of queue
 It has been acknowledged by all involved
processes
APPLICATION: TOTALLY-ORDERED MULTICASTING
 Update 1 is time-stamped and multicast. Added to local queues.
 Update 2 is time-stamped and multicast. Added to local queues.
 Acknowledgements for Update 2 sent/received. Update 2 can now be
processed.
 Acknowledgements for Update 1 sent/received. Update 1 can now be
processed.
 (Note: all queues are the same, as the timestamps have been used to
ensure the “happens-before” relation holds.)
LIMITATION OF LAMPORT’S ALGORITHM
 ei < ej LC(ei) < LC(ej)
 However, LC(ei) < LC(ej) does not imply ei <
ej
 for instance, (1,1) < (1,3), but events a and e are
concurrent
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
(1,1) (2,1)
(3,2)
(5,3)
(4,2)
(1,3)
g
(2,3)
VECTOR TIMESTAMPS
 Pi‟s clock is a vector VTi[]
 VTi[i] = number of events Pi has
stamped
 VTi[j] = what Pi thinks number of
events Pj has stamped (i j)
VECTOR TIMESTAMPS (CONT.)
 Initialization
 the vector timestamp for each process is
initialized to (0,0,…,0)
 Local event
 when an event occurs on process Pi, VTi[i]
VTi[i] + 1
 e.g., on processor 3, (1,2,1,3) (1,2,2,3)
 Message passing
 when Pi sends a message to Pj, the message
has timestamp t[]=VTi[]
 when Pj receives the message, it sets VTj[k] to
max (VTj[k],t[k]), for k = 1, 2, …, N
 e.g., P2 receives a message with timestamp (3,2,4)
and P2‟s timestamp is (3,4,3), then P2 adjust its
timestamp to (3,4,4)
VECTOR TIMESTAMPS (CONT.)
COMPARING VECTORS
 VT1 = VT2 iff VT1[i] = VT2[i] for all i
 VT1 VT2 iff VT1[i] VT2[i] for all i
 VT1 < VT2 iff VT1 VT2 & VT1 VT2
 for instance, (1, 2, 2) < (1, 3, 2)
VECTOR TIMESTAMP ANALYSIS
 Claim: e < e‟ iff e.VT < e‟.VT
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
[1,0,0]
[2,0,0]
[2,1,0]
[2,2,3]
[2,2,0]
[0,0,1]
g
[0,0,2]
APPLICATION: CAUSALLY-ORDERED MULTICASTING
 For ordered delivery, we also need…
 Multicast msgs (reliable but may be out-of-order)
 Vi[i] is only incremented when sending
 When k gets a msg from j, with timestamp ts,
the msg is buffered until:
 1: ts[j] = Vk[j] + 1
 (this is the next timestamp that k is expecting from j)
 2: ts[i] ≤ Vk[i] for all i ≠ j
 (k has seen all msgs that were seen by j when j sent the
msg)
CAUSALLY-ORDERED MULTICASTING
P2
a
P1
c
d
P3
e
g
[1,0,0]
[1,0,0][1,0,0]
[1,0,1]
[1,0,1]
Post a
r: Reply a
Message a arrives at P2 before the reply r from P3 does
b
[1,0,1]
[0,0,0]
[0,0,0]
[0,0,0]
CAUSALLY-ORDERED MULTICASTING (CONT.)
P2
a
P1 P3
d
g
[1,0,0]
[1,0,0]
[1,0,1]
Post a
r: Reply a
Buffered
c
[1,0,0]
The message a arrives at P2 after the reply from P3; The reply is
not delivered right away.
b
[1,0,1]
[0,0,0]
[0,0,0]
[0,0,0]
Deliver r
ORDERED COMMUNICATION
 Totally ordered multicast
 Use Lamport timestamps
 Causally ordered multicast
 Use vector timestamps
VECTOR CLOCKS
 Each process i maintains a vector Vi
 Vi[i] : number of events that have occurred at i
 Vi[j] : number of events I knows have occurred at process
j
 Update vector clocks as follows
 Local event: increment Vi[I]
 Send a message :piggyback entire vector V
 Receipt of a message: Vj[k] = max( Vj[k],Vi[k] )
 Receiver is told about how many events the sender knows
occurred at another process k
 Also Vj[i] = Vj[i]+1
GLOBAL STATE
 Global state of a distributed system
 Local state of each process
 Messages sent but not received (state of the queues)
 Many applications need to know the state of the
system
 Failure recovery, distributed deadlock detection
 Problem: how can you figure out the state of a
distributed system?
 Each process is independent
 No global clock or synchronization
 Distributed snapshot: a consistent global state
GLOBAL STATE (1)
a) A consistent cut
b) An inconsistent cut
DISTRIBUTED SNAPSHOT ALGORITHM
 Assume each process communicates with
another process using unidirectional point-to-
point channels (e.g, TCP connections)
 Any process can initiate the algorithm
 Checkpoint local state
 Send marker on every outgoing channel
 On receiving a marker
 Checkpoint state if first marker and send marker
on outgoing channels, save messages on all
other channels until:
 Subsequent marker on a channel: stop saving
state for that channel
DISTRIBUTED SNAPSHOT
 A process finishes when
 It receives a marker on each incoming channel
and processes them all
 State: local state plus state of all channels
 Send state to initiator
 Any process can initiate snapshot
 Multiple snapshots may be in progress
 Each is separate, and each is distinguished by tagging
the marker with the initiator ID (and sequence
number)
A
C
B
M
M
SNAPSHOT ALGORITHM EXAMPLE
a) Organization of a process and channels for a distributed
snapshot
SNAPSHOT ALGORITHM EXAMPLE
b) Process Q receives a marker for the first time and records its
local state
c) Q records all incoming message
d) Q receives a marker for its incoming channel and finishes
recording the state of the incoming channel
TERMINATION DETECTION
 Detecting the end of a distributed computation
 Notation: let sender be predecessor, receiver be successor
 Two types of markers: Done and Continue
 After finishing its part of the snapshot, process Q sends a
Done or a Continue to its predecessor
 Send a Done only when
 All of Q‟s successors send a Done
 Q has not received any message since it check-pointed its local state
and received a marker on all incoming channels
 Else send a Continue
 Computation has terminated if the initiator receives Done
messages from everyone
DISTRIBUTED SYNCHRONIZATION
 Distributed system with multiple processes may
need to share data or access shared data
structures
 Use critical sections with mutual exclusion
 Single process with multiple threads
 Semaphores, locks, monitors
 How do you do this for multiple processes in a
distributed system?
 Processes may be running on different machines
 Solution: lock mechanism for a distributed
environment
 Can be centralized or distributed
CENTRALIZED MUTUAL EXCLUSION
 Assume processes are numbered
 One process is elected coordinator (highest ID
process)
 Every process needs to check with coordinator
before entering the critical section
 To obtain exclusive access: send request, await
reply
 To release: send release message
 Coordinator:
 Receive request: if available and queue empty, send
grant; if not, queue request
 Receive release: remove next request from queue and
send grant
MUTUAL EXCLUSION:
A CENTRALIZED ALGORITHM
a) Process 1 asks the coordinator for permission to enter a
critical region. Permission is granted
b) Process 2 then asks permission to enter the same critical
region. The coordinator does not reply.
c) When process 1 exits the critical region, it tells the
coordinator, when then replies to 2
PROPERTIES
 Simulates centralized lock using blocking calls
 Fair: requests are granted the lock in the order they were
received
 Simple: three messages per use of a critical section
(request, grant, release)
 Shortcomings:
 Single point of failure
 How do you detect a dead coordinator?
 A process can not distinguish between “lock in use” from a dead
coordinator
 No response from coordinator in either case
 Performance bottleneck in large distributed systems
DISTRIBUTED ALGORITHM
 [Ricart and Agrawala]: needs 2(n-1) messages
 Based on event ordering and time stamps
 Process k enters critical section as follows
 Generate new time stamp TSk = TSk+1
 Send request(k,TSk) all other n-1 processes
 Wait until reply(j) received from all other processes
 Enter critical section
 Upon receiving a request message, process j
 Sends reply if no contention
 If already in critical section, does not reply, queue request
 If wants to enter, compare TSj with TSk and send reply if TSk<TSj,
else queue
A DISTRIBUTED ALGORITHM
a) Two processes want to enter the same critical region at the same
moment.
b) Process 0 has the lowest timestamp, so it wins.
c) When process 0 is done, it sends an OK also, so 2 can now enter
the critical region.
PROPERTIES
 Fully decentralized
 N points of failure!
 All processes are involved in all decisions
 Any overloaded process can become a
bottleneck
ELECTION ALGORITHMS
 Many distributed algorithms need one process
to act as coordinator
 Doesn‟t matter which process does the job, just
need to pick one
 Election algorithms: technique to pick a unique
coordinator (aka leader election)
 Examples: take over the role of a failed
process, pick a master in Berkeley clock
synchronization algorithm
 Types of election algorithms: Bully and Ring
algorithms
BULLY ALGORITHM
 Each process has a unique numerical ID
 Processes know the Ids and address of every other
process
 Communication is assumed reliable
 Key Idea: select process with highest ID
 Process initiates election if it just recovered from
failure or if coordinator failed
 3 message types: election, OK, I won
 Several processes can initiate an election
simultaneously
 Need consistent result
 O(n2) messages required with n processes
BULLY ALGORITHM DETAILS
 Any process P can initiate an election
 P sends Election messages to all process with
higher Ids and awaits OK messages
 If no OK messages, P becomes coordinator and
sends I won messages to all process with lower Ids
 If it receives an OK, it drops out and waits for an I
won
 If a process receives an Election msg, it returns an
OK and starts an election
 If a process receives a I won, it treats sender an
coordinator
BULLY ALGORITHM EXAMPLE
 The bully election algorithm
 Process 4 holds an election
 Process 5 and 6 respond, telling 4 to stop
 Now 5 and 6 each hold an election
BULLY ALGORITHM EXAMPLE
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone
LAST CLASS
 Vector timestamps
 Global state
 Distributed Snapshot
 Election algorithms
TODAY: STILL MORE CANONICAL
PROBLEMS
 Election algorithms
 Bully algorithm
 Ring algorithm
 Distributed synchronization and mutual
exclusion
 Distributed transactions
ELECTION ALGORITHMS
 Many distributed algorithms need one process
to act as coordinator
 Doesn‟t matter which process does the job, just
need to pick one
 Election algorithms: technique to pick a unique
coordinator (aka leader election)
 Examples: take over the role of a failed
process, pick a master in Berkeley clock
synchronization algorithm
 Types of election algorithms: Bully and Ring
BULLY ALGORITHM
 Each process has a unique numerical ID
 Processes know the Ids and address of every
other process
 Communication is assumed reliable
 Key Idea: select process with highest ID
 Process initiates election if it just recovered
from failure or if coordinator failed
 3 message types: election, OK, I won
 Several processes can initiate an election
simultaneously
BULLY ALGORITHM DETAILS
 Any process P can initiate an election
 P sends Election messages to all process with higher Ids and
awaits OK messages
 If no OK messages, P becomes coordinator and sends I won
messages to all process with lower Ids
 If it receives an OK, it drops out and waits for an I won
 If a process receives an Election msg, it returns an OK and
starts an election
 If a process receives a I won, it treats sender an coordinator
BULLY ALGORITHM EXAMPLE
 The bully election algorithm
 Process 4 holds an election
 Process 5 and 6 respond, telling 4 to stop
 Now 5 and 6 each hold an election
BULLY ALGORITHM EXAMPLE
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone
RING-BASED ELECTION
 Processes have unique Ids and arranged in a logical ring
 Each process knows its neighbors
 Select process with highest ID
 Begin election if just recovered or coordinator has failed
 Send Election to closest downstream node that is alive
 Sequentially poll each successor until a live node is found
 Each process tags its ID on the message
 Initiator picks node with highest ID and sends a coordinator
message
 Multiple elections can be in progress
 Wastes network bandwidth but does no harm
A RING ALGORITHM
 Election algorithm using a ring.
COMPARISON
 Assume n processes and one election in
progress
 Bully algorithm
 Worst case: initiator is node with lowest ID
 Triggers n-2 elections at higher ranked nodes: O(n2)
msgs
 Best case: immediate election: n-2 messages
 Ring
 2 (n-1) messages always
A TOKEN RING ALGORITHM
a) An unordered group of processes on a network.
b) A logical ring constructed in software.
• Use a token to arbitrate access to critical section
• Must wait for token before entering CS
• Pass the token to neighbor once done or if not interested
• Detecting token loss in non-trivial
COMPARISON
 A comparison of three mutual exclusion
algorithms.
Algorithm
Messages per
entry/exit
Delay before entry (in
message times)
Problems
Centralized 3 2 Coordinator crash
Distributed 2 ( n – 1 ) 2 ( n – 1 )
Crash of any
process
Token ring 1 to 0 to n – 1
Lost token, process
crash
TRANSACTIONS
Transactions provide higher
level mechanism for atomicity
of processing in distributed
systems
 Have their origins in databases
Banking example: Three
accounts A:$100, B:$200,
C:$300
 Client 1: transfer $4 from A to
B
 Client 2: transfer $3 from C to
B
Result can be inconsistent
unless certain properties are
Client 1 Client 2
Read A: $100
Write A: $96
Read C: $300
Write C:$297
Read B: $200
Read B: $200
Write B:$203
Write B:$204
ACID PROPERTIES
Atomic: all or nothing
Consistent: transaction takes
system from one consistent
state to another
Isolated: Immediate effects
are not visible to other
(serializable)
Durable: Changes are
permanent once transaction
completes (commits)
Client 1 Client 2
Read A: $100
Write A: $96
Read B: $200
Write B:$204
Read C: $300
Write C:$297
Read B: $204
Write B:$207

Contenu connexe

Tendances

8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating SystemsDr Sandeep Kumar Poonia
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Sri Prasanna
 
clock synchronization in Distributed System
clock synchronization in Distributed System clock synchronization in Distributed System
clock synchronization in Distributed System Harshita Ved
 
Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionSHIKHA GAUTAM
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented CommunicationDilum Bandara
 
Vector clock algorithm
Vector clock algorithmVector clock algorithm
Vector clock algorithmS. Anbu
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dosvanamali_vanu
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software conceptsPrajakta Rane
 
Mutual exclusion in distributed systems
Mutual exclusion in distributed systemsMutual exclusion in distributed systems
Mutual exclusion in distributed systemsAJAY KHARAT
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsAya Mahmoud
 
Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??Abdul Aslam
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K SinhaJawwad Rafiq
 
resource management
  resource management  resource management
resource managementAshish Kumar
 

Tendances (20)

8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Stream oriented communication
Stream oriented communicationStream oriented communication
Stream oriented communication
 
Naming in Distributed System
Naming in Distributed SystemNaming in Distributed System
Naming in Distributed System
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
 
clock synchronization in Distributed System
clock synchronization in Distributed System clock synchronization in Distributed System
clock synchronization in Distributed System
 
Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock Detection
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
Resource management
Resource managementResource management
Resource management
 
Vector clock algorithm
Vector clock algorithmVector clock algorithm
Vector clock algorithm
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts
 
Mutual exclusion in distributed systems
Mutual exclusion in distributed systemsMutual exclusion in distributed systems
Mutual exclusion in distributed systems
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K Sinha
 
resource management
  resource management  resource management
resource management
 
Synch
SynchSynch
Synch
 

Similaire à 6.Distributed Operating Systems

Chapter 5-Synchronozation.ppt
Chapter 5-Synchronozation.pptChapter 5-Synchronozation.ppt
Chapter 5-Synchronozation.pptsirajmohammed35
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical ClocksDilum Bandara
 
Chapter 6-Synchronozation2.ppt
Chapter 6-Synchronozation2.pptChapter 6-Synchronozation2.ppt
Chapter 6-Synchronozation2.pptMeymunaMohammed1
 
Lesson 05 - Time in Distrributed System.pptx
Lesson 05 - Time in Distrributed System.pptxLesson 05 - Time in Distrributed System.pptx
Lesson 05 - Time in Distrributed System.pptxLagamaPasala
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Sri Prasanna
 
Unit iii-Synchronization
Unit iii-SynchronizationUnit iii-Synchronization
Unit iii-SynchronizationDhivyaa C.R
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsCS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsNandakumar P
 
Parallel and Distributed Computing Chapter 13
Parallel and Distributed Computing Chapter 13Parallel and Distributed Computing Chapter 13
Parallel and Distributed Computing Chapter 13AbdullahMunir32
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsIRJET Journal
 

Similaire à 6.Distributed Operating Systems (20)

3. syncro. in distributed system
3. syncro. in distributed system3. syncro. in distributed system
3. syncro. in distributed system
 
Clock.pdf
Clock.pdfClock.pdf
Clock.pdf
 
Chap 5
Chap 5Chap 5
Chap 5
 
Chapter 5-Synchronozation.ppt
Chapter 5-Synchronozation.pptChapter 5-Synchronozation.ppt
Chapter 5-Synchronozation.ppt
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
 
Chapter 6-Synchronozation2.ppt
Chapter 6-Synchronozation2.pptChapter 6-Synchronozation2.ppt
Chapter 6-Synchronozation2.ppt
 
Synchronization
SynchronizationSynchronization
Synchronization
 
Shoaib
ShoaibShoaib
Shoaib
 
Chapter 6 synchronization
Chapter 6 synchronizationChapter 6 synchronization
Chapter 6 synchronization
 
Clock synchronization
Clock synchronizationClock synchronization
Clock synchronization
 
Lesson 05 - Time in Distrributed System.pptx
Lesson 05 - Time in Distrributed System.pptxLesson 05 - Time in Distrributed System.pptx
Lesson 05 - Time in Distrributed System.pptx
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
 
Unit iii-Synchronization
Unit iii-SynchronizationUnit iii-Synchronization
Unit iii-Synchronization
 
Shoaib
ShoaibShoaib
Shoaib
 
Ds ppt imp.
Ds ppt imp.Ds ppt imp.
Ds ppt imp.
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsCS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed Systems
 
Parallel and Distributed Computing Chapter 13
Parallel and Distributed Computing Chapter 13Parallel and Distributed Computing Chapter 13
Parallel and Distributed Computing Chapter 13
 
L12.FA20.ppt
L12.FA20.pptL12.FA20.ppt
L12.FA20.ppt
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 

Plus de Dr Sandeep Kumar Poonia

An improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmAn improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmDr Sandeep Kumar Poonia
 
Modified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmModified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmDr Sandeep Kumar Poonia
 
Enhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmEnhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmDr Sandeep Kumar Poonia
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithmDr Sandeep Kumar Poonia
 
Improved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmImproved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmDr Sandeep Kumar Poonia
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmComparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmDr Sandeep Kumar Poonia
 
A novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmA novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmDr Sandeep Kumar Poonia
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsMultiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsDr Sandeep Kumar Poonia
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmDr Sandeep Kumar Poonia
 
New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm Dr Sandeep Kumar Poonia
 
Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Dr Sandeep Kumar Poonia
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Dr Sandeep Kumar Poonia
 

Plus de Dr Sandeep Kumar Poonia (20)

Soft computing
Soft computingSoft computing
Soft computing
 
An improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmAn improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithm
 
Modified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmModified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithm
 
Enhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmEnhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithm
 
RMABC
RMABCRMABC
RMABC
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithm
 
Improved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmImproved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithm
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmComparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
 
A novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmA novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithm
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsMultiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithm
 
New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm
 
A new approach of program slicing
A new approach of program slicingA new approach of program slicing
A new approach of program slicing
 
Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...
 
Enhanced abc algo for tsp
Enhanced abc algo for tspEnhanced abc algo for tsp
Enhanced abc algo for tsp
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...
 
Lecture28 tsp
Lecture28 tspLecture28 tsp
Lecture28 tsp
 
Lecture27 linear programming
Lecture27 linear programmingLecture27 linear programming
Lecture27 linear programming
 
Lecture26
Lecture26Lecture26
Lecture26
 

Dernier

Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 

Dernier (20)

Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 

6.Distributed Operating Systems

  • 2. CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS  Time ordering and clock synchronization  Leader election  Mutual exclusion  Distributed transactions  Deadlock detection
  • 3. THE IMPORTANCE OF SYNCHRONIZATION  Because various components of a distributed system must cooperate and exchange information, synchronization is a necessity.  Various components of the system must agree on the timing and ordering of events. Imagine a banking system that did not track the timing and ordering of financial transactions. Similar chaos would ensure if distributed systems were not synchronized.  Constraints, both implicit and explicit, are therefore enforced to ensure synchronization of components.
  • 4. CLOCK SYNCHRONIZATION  As in non-distributed systems, the knowledge of “when events occur” is necessary.  However, clock synchronization is often more difficult in distributed systems because there is no ideal time source, and because distributed algorithms must sometimes be used.  Distributed algorithms must overcome:  Scattering of information  Local, rather than global, decision-making
  • 5. CLOCK SYNCHRONIZATION  Time is unambiguous in centralized systems  System clock keeps time, all entities use this for time  Distributed systems: each node has own system clock  Crystal-based clocks are less accurate (1 part in million)  Problem: An event that occurred after another may be assigned an earlier time
  • 6. LACK OF GLOBAL TIME IN DS  It is impossible to guarantee that physical clocks run at the same frequency  Lack of global time, can cause problems  Example: UNIX make  Edit output.c at a client  output.o is at a server (compile at server)  Client machine clock can be lagging behind the server machine clock
  • 7. LACK OF GLOBAL TIME – EXAMPLE When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
  • 8. LOGICAL CLOCKS  For many problems, internal consistency of clocks is important  Absolute time is less important  Use logical clocks  Key idea:  Clock synchronization need not be absolute  If two machines do not interact, no need to synchronize them  More importantly, processes need to agree on the order in which events occur rather than the time at which they occurred
  • 9. EVENT ORDERING  Problem: define a total ordering of all events that occur in a system  Events in a single processor machine are totally ordered  In a distributed system:  No global clock, local clocks may be unsynchronized  Can not order events on different machines using local times  Key idea [Lamport ]  Processes exchange messages  Message must be sent before received  Send/receive used to order events (and synchronize
  • 10. LOGICAL CLOCKS  Often, it is not necessary for a computer to know the exact time, only relative time. This is known as “logical time”.  Logical time is not based on timing but on the ordering of events.  Logical clocks can only advance forward, not in reverse.  Non-interacting processes cannot share a logical clock.  Computers generally obtain logical time using interrupts to update a software clock. The more interrupts (the more frequently time is updated), the higher the overhead.
  • 11. LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION ALGORITHM  The most common logical clock synchronization algorithm for distributed systems is Lamport‟s Algorithm. It is used in situations where ordering is important but global time is not required.  Based on the “happens-before” relation:  Event A “happens-before” Event B (A→B) when all processes involved in a distributed system agree that event A occurred first, and B subsequently occurred.  This DOES NOT mean that Event A actually occurred before Event B in absolute clock time.
  • 12. LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION ALGORITHM  A distributed system can use the “happens-before” relation when:  Events A and B are observed by the same process, or by multiple processes with the same global clock  Event A acknowledges sending a message and Event B acknowledges receiving it, since a message cannot be received before it is sent  If two events do not communicate via messages, they are considered concurrent – because order cannot be determined and it does not matter. Concurrent events can be ignored.
  • 13. LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION ALGORITHM (CONT.)  In the previous examples, Clock C(a) < C(b)  If they are concurrent, C(a) = C(b)  Concurrent events can only occur on the same system, because every message transfer between two systems takes at least one clock tick.  In Lamport‟s Algorithm, logical clock values for events may be changed, but always by moving the clock forward. Time values can never be decreased.  An additional refinement in the algorithm is often used:  If Event A and Event B are concurrent. C(a) = C(b), some unique property of the processes associated with these events can be used to choose a winner. This establishes a total ordering of all events.  Process ID is often used as the tiebreaker.
  • 14. LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION ALGORITHM (CONT.)  Lamport‟s Algorithm can thus be used in distributed systems to ensure synchronization:  A logical clock is implemented in each node in the system.  Each node can determine the order in which events have occurred in that system’s own point of view.  The logical clock of one node does not need to have any relation to real time or to any other node in the system.
  • 15. EVENT ORDERING USING HB  Goal: define the notion of time of an event such that  If A-> B then C(A) < C(B)  If A and B are concurrent, then C(A) <, = or > C(B)  Solution:  Each processor maintains a logical clock LCi  Whenever an event occurs locally at I, LCi = LCi+1  When i sends message to j, piggyback Lci  When j receives message from i  If LCj < LCi then LCj = LCi +1 else do nothing  Claim: this algorithm meets the above goals
  • 16. PROCESS EACH WITH ITS OWN CLOCK •At time 6 , Process 0 sends message A to Process 1 •It arrives to Process 1 at 16( It took 10 ticks to make journey •Message B from 1 to 2 takes 16 ticks •Message C from 2 to 1 leaves at 60 and arrives at 56 -Not Possible •Message D from 1 to 0 leaves at 64 and arrives at 54 -Not Possible
  • 17. LAMPORT’S ALGORITHM CORRECTS THE CLOCKS  Use „happened-before‟ relation  Each message carries the sending time (as per sender‟s clock)  When arrives, receiver fast forwards its clock to be one more than the sending time. (between every two events, the clock must tick at least once)
  • 18. PHYSICAL CLOCKS  The time difference between two computers is known as drift. Clock drift over time is known as skew. Computer clock manufacturers specify a maximum skew rate in their products.  Computer clocks are among the least accurate modern timepieces.  Inside every computer is a chip surrounding a quartz crystal oscillator to record time. These crystals cost 25 seconds to produce.  Average loss of accuracy: 0.86 seconds per day  This skew is unacceptable for distributed systems. Several methods are now in use to attempt the synchronization of physical clocks in distributed systems:
  • 19. PHYSICAL CLOCKS  17th Century: time has been measured astronomically  Solar Day: interval between two consecutive transit of sun  Solar Second: 1/86400th of solar day
  • 20. PHYSICAL CLOCKS  1948: Atomic Clocks are invented  Accurate clocks are atomic oscillators (one part in 1013)  BIH decide TAI(International Atomic Time)  TAI seconds is now about 3 msec less than solar day  BIH solves the problem by introducing leap seconds Whenever discrepancy between TAI and solar time grow to 800 msec  This time is called Universal Coordinated Time(UTC)  When BIH announces leap second, the power companies raise their frequency to 61 & 51 Hz for 60 & 50 sec, to advance all the clocks in their distribution area.
  • 21. PHYSICAL CLOCKS - UTC Coordinated Universal Time (UTC) is the international time standard. UTC is the current term for what was commonly referred to as Greenwich Mean Time (GMT). Zero hours UTC is midnight in Greenwich, England, which lies on the zero longitudinal meridian. UTC is based on a 24-hour clock.
  • 22. PHYSICAL CLOCKS  Most clocks are less accurate (e.g., mechanical watches)  Computers use crystal-based blocks (one part in million)  Results in clock drift  How do you tell time?  Use astronomical metrics (solar day)  Coordinated universal time (UTC) – international standard based on atomic time  Add leap seconds to be consistent with astronomical time  UTC broadcast on radio (satellite and earth)  Receivers accurate to 0.1 – 10 ms  Need to synchronize machines with a master or with one another
  • 23. CLOCK SYNCHRONIZATION  Each clock has a maximum drift rate  1- dC/dt <= 1+  Two clocks may drift by 2 t in time t  To limit drift to  resynchronize after every 2 seconds
  • 24. CHRISTIAN’S ALGORITHM  Assuming there is one time server with UTC:  Each node in the distributed system periodically polls the time server.  Time( treply) is estimated as t + (Treply + Treq)/2  This process is repeated several times and an average is provided.  Machine Treply then attempts to adjust its time.  Disadvantages:  Must attempt to take delay between server Treply and time server into account  Single point of failure if time server fails
  • 25. CRISTIAN’S ALGORITHM Synchronize machines to a time server with a UTC receiver Machine P requests time from server every seconds  Receives time t from server, P sets clock to t+treply where treply is the time to send reply to P  Use (treq+treply)/2 as an estimate of treply  Improve accuracy by making a series of measurements
  • 26. PROBLEM WITH CRISTIAN’S ALGORITHM Major Problem  Time must never run backward  If sender‟s clock is fast, CUTC will be smaller than the sender‟s current value of C Minor Problem  It takes nonzero time for the time server‟s reply  This delay may be large and vary with network load
  • 27. SOLUTION Major Problem  Control the clock  Suppose that the timer set to generate 100 intrpt/sec  Normally each interrupt add 10 msec to the time  To slow down add only 9 msec  To advance add 11 msec to the time Minor Problem  Measure it  Make a series of measurements for accuracy  Discard the measurements that exceeds the threshold value  The message that came back fastest can be taken to be the most accurate.
  • 28. BERKELEY ALGORITHM  Used in systems without UTC receiver  Keep clocks synchronized with one another  One computer is master, other are slaves  Master periodically polls slaves for their times  Average times and return differences to slaves  Communication delays compensated as in Cristian‟s algo  Failure of master => election of a new master
  • 29. BERKELEY ALGORITHM a) The time daemon asks all the other machines for their clock values b) The machines answer c) The time daemon tells everyone how to adjust their clock
  • 30. 30 DECENTRALIZED AVERAGING ALGORITHM  Each machine on the distributed system has a daemon without UTC.  Periodically, at an agreed-upon fixed time, each machine broadcasts its local time.  Each machine calculates the correct time by averaging all results.
  • 31. 31 NETWORK TIME PROTOCOL (NTP)  Enables clients across the Internet to be synchronized accurately to UTC.  Overcomes large and variable message delays  Employs statistical techniques for filtering, based on past quality of servers and several other measures  Can survive lengthy losses of connectivity:  Redundant servers  Redundant paths to servers  Provides protection against malicious interference through authentication techniques
  • 32. 32 NETWORK TIME PROTOCOL (NTP) (CONT.)  Uses a hierarchy of servers located across the Internet. Primary servers are directly connected to a UTC time source.
  • 33. 33 NETWORK TIME PROTOCOL (NTP) (CONT.)  NTP has three modes:  Multicast Mode:  Suitable for user workstations on a LAN  One or more servers periodically multicasts the time to other machines on the network.  Procedure Call Mode:  Similar to Christian‟s Algorithm  Provides higher accuracy than Multicast Mode because delays are compensated.  Symmetric Mode:  Pairs of servers exchange pairs of timing messages that contain time stamps of recent message events.  The most accurate, but also the most expensive mode Although NTP is quite advanced, there is still a drift of 20-35 milliseconds!!!
  • 34. MORE PROBLEMS  Causality  Vector timestamps  Global state and termination detection  Election algorithms
  • 35. LOGICAL CLOCKS  For many DS algorithms, associating an event to an absolute real time is not essential, we only need to know an unambiguous order of events  Lamport's timestamps  Vector timestamps
  • 36. LOGICAL CLOCKS (CONT.)  Synchronization based on “relative time”.  “relative time” may not relate to the “real time”.  Example: Unix make (Is output.c updated after the generation of output.o?)  What‟s important is that the processes in the Distributed System agree on the ordering in which certain events occur.  Such “clocks” are referred to as Logical Clocks.
  • 37. EXAMPLE: WHY ORDER MATTERS?  Replicated accounts in Jaipur(JP) and Bikaner(BN)  Two updates occur at the same time  Current balance: $1,000  Update1: Add $100 at BN; Update2: Add interest of 1% at JP  Whoops, inconsistent states!
  • 38. LAMPORT ALGORITHM  Clock synchronization does not have to be exact  Synchronization not needed if there is no interaction between machines  Synchronization only needed when machines communicate  i.e. must only agree on ordering of interacting events
  • 39. LAMPORT’S “HAPPENS-BEFORE” PARTIAL ORDER  Given two events e & e`, e < e` if: 1. Same process: e <i e`, for some process Pi 2. Same message: e = send(m) and e`=receive(m) for some message m 3. Transitivity: there is an event e* such that e < e* and e* < e`
  • 40. CONCURRENT EVENTS  Given two events e & e`:  If neither e < e` nor e`< e, then e || e` P1 P2 P3 Real Time a b c f d e m1 m2
  • 41. LAMPORT LOGICAL CLOCKS  Substitute synchronized clocks with a global ordering of events  ei < ej LC(ei) < LC(ej)  LCi is a local clock: contains increasing values  each process i has own LCi  Increment LCi on each event occurrence  within same process i, if ej occurs before ek  LCi(ej) < LCi(ek)  If es is a send event and er receives that send, then  LCi(es) < LCj(er)
  • 42. LAMPORT ALGORITHM  Each process increments local clock between any two successive events  Message contains a timestamp  Upon receiving a message, if received timestamp is ahead, receiver fast forward its clock to be one more than sending time
  • 43. LAMPORT ALGORITHM (CONT.)  Timestamp  Each event is given a timestamp t  If es is a send message m from pi, then t=LCi(es)  When pj receives m, set LCj value as follows  If t < LCj, increment LCj by one  Message regarded as next event on j  If t ≥ LCj, set LCj to t+1
  • 44. LAMPORT’S ALGORITHM ANALYSIS (1)  Claim: ei < ej LC(ei) < LC(ej)  Proof: by induction on the length of the sequence of events relating to ei and ej P1 P2 P3 Real Timea b c f d e m1 m2 1 2 3 5 4 1 g 5
  • 45. LAMPORT’S ALGORITHM ANALYSIS (2)  LC(ei) < LC(ej) ei < ej ?  Claim: if LC(ei) < LC(ej), then it is not necessarily true that ei < ej P1 P2 P3 Real Timea b c f d e m1 m2 1 2 3 5 4 1 g 2
  • 46. TOTAL ORDERING OF EVENTS  Happens before is only a partial order  Make the timestamp of an event e of process Pi be: (LC(e),i)  (a,b) < (c,d) iff a < c, or a = c and b < d
  • 47. APPLICATION: TOTALLY-ORDERED MULTICASTING  Message is timestamped with sender‟s logical time  Message is multicast (including sender itself)  When message is received  It is put into local queue  Ordered according to timestamp  Multicast acknowledgement  Message is delivered to applications only when  It is at head of queue  It has been acknowledged by all involved processes
  • 48. APPLICATION: TOTALLY-ORDERED MULTICASTING  Update 1 is time-stamped and multicast. Added to local queues.  Update 2 is time-stamped and multicast. Added to local queues.  Acknowledgements for Update 2 sent/received. Update 2 can now be processed.  Acknowledgements for Update 1 sent/received. Update 1 can now be processed.  (Note: all queues are the same, as the timestamps have been used to ensure the “happens-before” relation holds.)
  • 49. LIMITATION OF LAMPORT’S ALGORITHM  ei < ej LC(ei) < LC(ej)  However, LC(ei) < LC(ej) does not imply ei < ej  for instance, (1,1) < (1,3), but events a and e are concurrent P1 P2 P3 Real Timea b c f d e m1 m2 (1,1) (2,1) (3,2) (5,3) (4,2) (1,3) g (2,3)
  • 50. VECTOR TIMESTAMPS  Pi‟s clock is a vector VTi[]  VTi[i] = number of events Pi has stamped  VTi[j] = what Pi thinks number of events Pj has stamped (i j)
  • 51. VECTOR TIMESTAMPS (CONT.)  Initialization  the vector timestamp for each process is initialized to (0,0,…,0)  Local event  when an event occurs on process Pi, VTi[i] VTi[i] + 1  e.g., on processor 3, (1,2,1,3) (1,2,2,3)
  • 52.  Message passing  when Pi sends a message to Pj, the message has timestamp t[]=VTi[]  when Pj receives the message, it sets VTj[k] to max (VTj[k],t[k]), for k = 1, 2, …, N  e.g., P2 receives a message with timestamp (3,2,4) and P2‟s timestamp is (3,4,3), then P2 adjust its timestamp to (3,4,4) VECTOR TIMESTAMPS (CONT.)
  • 53. COMPARING VECTORS  VT1 = VT2 iff VT1[i] = VT2[i] for all i  VT1 VT2 iff VT1[i] VT2[i] for all i  VT1 < VT2 iff VT1 VT2 & VT1 VT2  for instance, (1, 2, 2) < (1, 3, 2)
  • 54. VECTOR TIMESTAMP ANALYSIS  Claim: e < e‟ iff e.VT < e‟.VT P1 P2 P3 Real Timea b c f d e m1 m2 [1,0,0] [2,0,0] [2,1,0] [2,2,3] [2,2,0] [0,0,1] g [0,0,2]
  • 55. APPLICATION: CAUSALLY-ORDERED MULTICASTING  For ordered delivery, we also need…  Multicast msgs (reliable but may be out-of-order)  Vi[i] is only incremented when sending  When k gets a msg from j, with timestamp ts, the msg is buffered until:  1: ts[j] = Vk[j] + 1  (this is the next timestamp that k is expecting from j)  2: ts[i] ≤ Vk[i] for all i ≠ j  (k has seen all msgs that were seen by j when j sent the msg)
  • 56. CAUSALLY-ORDERED MULTICASTING P2 a P1 c d P3 e g [1,0,0] [1,0,0][1,0,0] [1,0,1] [1,0,1] Post a r: Reply a Message a arrives at P2 before the reply r from P3 does b [1,0,1] [0,0,0] [0,0,0] [0,0,0]
  • 57. CAUSALLY-ORDERED MULTICASTING (CONT.) P2 a P1 P3 d g [1,0,0] [1,0,0] [1,0,1] Post a r: Reply a Buffered c [1,0,0] The message a arrives at P2 after the reply from P3; The reply is not delivered right away. b [1,0,1] [0,0,0] [0,0,0] [0,0,0] Deliver r
  • 58. ORDERED COMMUNICATION  Totally ordered multicast  Use Lamport timestamps  Causally ordered multicast  Use vector timestamps
  • 59. VECTOR CLOCKS  Each process i maintains a vector Vi  Vi[i] : number of events that have occurred at i  Vi[j] : number of events I knows have occurred at process j  Update vector clocks as follows  Local event: increment Vi[I]  Send a message :piggyback entire vector V  Receipt of a message: Vj[k] = max( Vj[k],Vi[k] )  Receiver is told about how many events the sender knows occurred at another process k  Also Vj[i] = Vj[i]+1
  • 60. GLOBAL STATE  Global state of a distributed system  Local state of each process  Messages sent but not received (state of the queues)  Many applications need to know the state of the system  Failure recovery, distributed deadlock detection  Problem: how can you figure out the state of a distributed system?  Each process is independent  No global clock or synchronization  Distributed snapshot: a consistent global state
  • 61. GLOBAL STATE (1) a) A consistent cut b) An inconsistent cut
  • 62. DISTRIBUTED SNAPSHOT ALGORITHM  Assume each process communicates with another process using unidirectional point-to- point channels (e.g, TCP connections)  Any process can initiate the algorithm  Checkpoint local state  Send marker on every outgoing channel  On receiving a marker  Checkpoint state if first marker and send marker on outgoing channels, save messages on all other channels until:  Subsequent marker on a channel: stop saving state for that channel
  • 63. DISTRIBUTED SNAPSHOT  A process finishes when  It receives a marker on each incoming channel and processes them all  State: local state plus state of all channels  Send state to initiator  Any process can initiate snapshot  Multiple snapshots may be in progress  Each is separate, and each is distinguished by tagging the marker with the initiator ID (and sequence number) A C B M M
  • 64. SNAPSHOT ALGORITHM EXAMPLE a) Organization of a process and channels for a distributed snapshot
  • 65. SNAPSHOT ALGORITHM EXAMPLE b) Process Q receives a marker for the first time and records its local state c) Q records all incoming message d) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel
  • 66. TERMINATION DETECTION  Detecting the end of a distributed computation  Notation: let sender be predecessor, receiver be successor  Two types of markers: Done and Continue  After finishing its part of the snapshot, process Q sends a Done or a Continue to its predecessor  Send a Done only when  All of Q‟s successors send a Done  Q has not received any message since it check-pointed its local state and received a marker on all incoming channels  Else send a Continue  Computation has terminated if the initiator receives Done messages from everyone
  • 67. DISTRIBUTED SYNCHRONIZATION  Distributed system with multiple processes may need to share data or access shared data structures  Use critical sections with mutual exclusion  Single process with multiple threads  Semaphores, locks, monitors  How do you do this for multiple processes in a distributed system?  Processes may be running on different machines  Solution: lock mechanism for a distributed environment  Can be centralized or distributed
  • 68. CENTRALIZED MUTUAL EXCLUSION  Assume processes are numbered  One process is elected coordinator (highest ID process)  Every process needs to check with coordinator before entering the critical section  To obtain exclusive access: send request, await reply  To release: send release message  Coordinator:  Receive request: if available and queue empty, send grant; if not, queue request  Receive release: remove next request from queue and send grant
  • 69. MUTUAL EXCLUSION: A CENTRALIZED ALGORITHM a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply. c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2
  • 70. PROPERTIES  Simulates centralized lock using blocking calls  Fair: requests are granted the lock in the order they were received  Simple: three messages per use of a critical section (request, grant, release)  Shortcomings:  Single point of failure  How do you detect a dead coordinator?  A process can not distinguish between “lock in use” from a dead coordinator  No response from coordinator in either case  Performance bottleneck in large distributed systems
  • 71. DISTRIBUTED ALGORITHM  [Ricart and Agrawala]: needs 2(n-1) messages  Based on event ordering and time stamps  Process k enters critical section as follows  Generate new time stamp TSk = TSk+1  Send request(k,TSk) all other n-1 processes  Wait until reply(j) received from all other processes  Enter critical section  Upon receiving a request message, process j  Sends reply if no contention  If already in critical section, does not reply, queue request  If wants to enter, compare TSj with TSk and send reply if TSk<TSj, else queue
  • 72. A DISTRIBUTED ALGORITHM a) Two processes want to enter the same critical region at the same moment. b) Process 0 has the lowest timestamp, so it wins. c) When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
  • 73. PROPERTIES  Fully decentralized  N points of failure!  All processes are involved in all decisions  Any overloaded process can become a bottleneck
  • 74. ELECTION ALGORITHMS  Many distributed algorithms need one process to act as coordinator  Doesn‟t matter which process does the job, just need to pick one  Election algorithms: technique to pick a unique coordinator (aka leader election)  Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm  Types of election algorithms: Bully and Ring algorithms
  • 75. BULLY ALGORITHM  Each process has a unique numerical ID  Processes know the Ids and address of every other process  Communication is assumed reliable  Key Idea: select process with highest ID  Process initiates election if it just recovered from failure or if coordinator failed  3 message types: election, OK, I won  Several processes can initiate an election simultaneously  Need consistent result  O(n2) messages required with n processes
  • 76. BULLY ALGORITHM DETAILS  Any process P can initiate an election  P sends Election messages to all process with higher Ids and awaits OK messages  If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids  If it receives an OK, it drops out and waits for an I won  If a process receives an Election msg, it returns an OK and starts an election  If a process receives a I won, it treats sender an coordinator
  • 77. BULLY ALGORITHM EXAMPLE  The bully election algorithm  Process 4 holds an election  Process 5 and 6 respond, telling 4 to stop  Now 5 and 6 each hold an election
  • 78. BULLY ALGORITHM EXAMPLE d) Process 6 tells 5 to stop e) Process 6 wins and tells everyone
  • 79. LAST CLASS  Vector timestamps  Global state  Distributed Snapshot  Election algorithms
  • 80. TODAY: STILL MORE CANONICAL PROBLEMS  Election algorithms  Bully algorithm  Ring algorithm  Distributed synchronization and mutual exclusion  Distributed transactions
  • 81. ELECTION ALGORITHMS  Many distributed algorithms need one process to act as coordinator  Doesn‟t matter which process does the job, just need to pick one  Election algorithms: technique to pick a unique coordinator (aka leader election)  Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm  Types of election algorithms: Bully and Ring
  • 82. BULLY ALGORITHM  Each process has a unique numerical ID  Processes know the Ids and address of every other process  Communication is assumed reliable  Key Idea: select process with highest ID  Process initiates election if it just recovered from failure or if coordinator failed  3 message types: election, OK, I won  Several processes can initiate an election simultaneously
  • 83. BULLY ALGORITHM DETAILS  Any process P can initiate an election  P sends Election messages to all process with higher Ids and awaits OK messages  If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids  If it receives an OK, it drops out and waits for an I won  If a process receives an Election msg, it returns an OK and starts an election  If a process receives a I won, it treats sender an coordinator
  • 84. BULLY ALGORITHM EXAMPLE  The bully election algorithm  Process 4 holds an election  Process 5 and 6 respond, telling 4 to stop  Now 5 and 6 each hold an election
  • 85. BULLY ALGORITHM EXAMPLE d) Process 6 tells 5 to stop e) Process 6 wins and tells everyone
  • 86. RING-BASED ELECTION  Processes have unique Ids and arranged in a logical ring  Each process knows its neighbors  Select process with highest ID  Begin election if just recovered or coordinator has failed  Send Election to closest downstream node that is alive  Sequentially poll each successor until a live node is found  Each process tags its ID on the message  Initiator picks node with highest ID and sends a coordinator message  Multiple elections can be in progress  Wastes network bandwidth but does no harm
  • 87. A RING ALGORITHM  Election algorithm using a ring.
  • 88. COMPARISON  Assume n processes and one election in progress  Bully algorithm  Worst case: initiator is node with lowest ID  Triggers n-2 elections at higher ranked nodes: O(n2) msgs  Best case: immediate election: n-2 messages  Ring  2 (n-1) messages always
  • 89. A TOKEN RING ALGORITHM a) An unordered group of processes on a network. b) A logical ring constructed in software. • Use a token to arbitrate access to critical section • Must wait for token before entering CS • Pass the token to neighbor once done or if not interested • Detecting token loss in non-trivial
  • 90. COMPARISON  A comparison of three mutual exclusion algorithms. Algorithm Messages per entry/exit Delay before entry (in message times) Problems Centralized 3 2 Coordinator crash Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process Token ring 1 to 0 to n – 1 Lost token, process crash
  • 91. TRANSACTIONS Transactions provide higher level mechanism for atomicity of processing in distributed systems  Have their origins in databases Banking example: Three accounts A:$100, B:$200, C:$300  Client 1: transfer $4 from A to B  Client 2: transfer $3 from C to B Result can be inconsistent unless certain properties are Client 1 Client 2 Read A: $100 Write A: $96 Read C: $300 Write C:$297 Read B: $200 Read B: $200 Write B:$203 Write B:$204
  • 92. ACID PROPERTIES Atomic: all or nothing Consistent: transaction takes system from one consistent state to another Isolated: Immediate effects are not visible to other (serializable) Durable: Changes are permanent once transaction completes (commits) Client 1 Client 2 Read A: $100 Write A: $96 Read B: $200 Write B:$204 Read C: $300 Write C:$297 Read B: $204 Write B:$207