SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Chapter 4: Global State and Snapshot Recording
                                  Algorithms

                                          Ajay Kshemkalyani and Mukesh Singhal

                                     Distributed Computing: Principles, Algorithms, and Systems


                                                    Cambridge University Press




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   1 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Introduction



            Recording the global state of a distributed system on-the-fly is an important
            paradigm.
            The lack of globally shared memory, global clock and unpredictable message
            delays in a distributed system make this problem non-trivial.
            This chapter first defines consistent global states and discusses issues to be
            addressed to compute consistent distributed snapshots.
            Then several algorithms to determine on-the-fly such snapshots are presented
            for several types of networks.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   2 / 51
Distributed Computing: Principles, Algorithms, and Systems



   System model


            The system consists of a collection of n processes p1 , p2 , ..., pn that are
            connected by channels.
            There are no globally shared memory and physical global clock and processes
            communicate by passing messages through communication channels.
            Cij denotes the channel from process pi to process pj and its state is denoted
            by SCij .
            The actions performed by a process are modeled as three types of events:
            Internal events,the message send event and the message receive event.
            For a message mij that is sent by process pi to process pj , let send(mij ) and
            rec(mij ) denote its send and receive events.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   3 / 51
Distributed Computing: Principles, Algorithms, and Systems



   System model


            At any instant, the state of process pi , denoted by LSi , is a result of the
            sequence of all the events executed by pi till that instant.
            For an event e and a process state LSi , e∈LSi iff e belongs to the sequence
            of events that have taken process pi to state LSi .
            For an event e and a process state LSi , e∈LSi iff e does not belong to the
            sequence of events that have taken process pi to state LSi .
            For a channel Cij , the following set of messages can be defined based on the
            local states of the processes pi and pj

            Transit: transit(LSi , LSj ) = {mij |send(mij ) ∈ LSi                                      rec(mij ) ∈ LSj }




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                             CUP 2008   4 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Models of communication


    Recall, there are three models of communication: FIFO, non-FIFO, and Co.
            In FIFO model, each channel acts as a first-in first-out message queue and
            thus, message ordering is preserved by a channel.
            In non-FIFO model, a channel acts like a set in which the sender process adds
            messages and the receiver process removes messages from it in a random
            order.
            A system that supports causal delivery of messages satisfies the following
            property: “For any two messages mij and mkj , if send(mij ) −→ send(mkj ),
            then rec(mij ) −→ rec(mkj )”.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   5 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Consistent global state



            The global state of a distributed system is a collection of the local states of
            the processes and the channels.
            Notationally, global state GS is defined as,
                                       GS = { i LSi ,                                    i ,j SCij     }
            A global state GS is a consistent global state iff it satisfies the following two
            conditions :
                      C1: send(mij )∈LSi ⇒ mij ∈SCij ⊕ rec(mij )∈LSj . (⊕ is Ex-OR
                           operator.)
                      C2: send(mij )∈LSi ⇒ mij ∈SCij ∧ rec(mij )∈LSj .




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                     CUP 2008   6 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Interpretation in terms of cuts


            A cut in a space-time diagram is a line joining an arbitrary point on each
            process line that slices the space-time diagram into a PAST and a FUTURE.
            A consistent global state corresponds to a cut in which every message
            received in the PAST of the cut was sent in the PAST of that cut.
            Such a cut is known as a consistent cut.
            For example, consider the space-time diagram for the computation illustrated
            in Figure 4.1.
            Cut C1 is inconsistent because message m1 is flowing from the FUTURE to
            the PAST.
            Cut C2 is consistent and message m4 must be captured in the state of
            channel C21 .




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   7 / 51
Distributed Computing: Principles, Algorithms, and Systems




                                                 C                                    C2
                                                  1
                                        e1
                                         1
                                                       e2
                                                        1
                                                                                               e3
                                                                                               1
                                                                                                        4
                                                                                                       e1
                         p
                             1                              m1
                                             1
                                            e2                   e22    e23          e4
                                                                                      2   m                 m
                                                                                                             5
                         p2                                                                4
                                        1                   m2
                                    e             e2
                                                   3
                                                                    3
                                                                   e3                            4
                                                                                                e3
                                                                                                                   5
                                                                                                                  e3
                         p3             3

                                                                        m
                                              1
                                             e4                          3     e42
                         p4

                                                                                                                 time


                                  Figure 4.1: An Interpretation in Terms of a Cut.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                                  CUP 2008   8 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Issues in recording a global state


    The following two issues need to be addressed:
               I1: How to distinguish between the messages to be recorded in the
                   snapshot from those not to be recorded.

                           -Any message that is sent by a process before recording its
                           snapshot, must be recorded in the global snapshot (from C1).
                           -Any message that is sent by a process after recording its snapshot,
                           must not be recorded in the global snapshot (from C2).
                       I2: How to determine the instant when a process takes its snapshot.

                              -A process pj must record its snapshot before processing a message
                              mij that was sent by process pi after recording its snapshot.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   9 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Snapshot algorithms for FIFO channels


    Chandy-Lamport algorithm
            The Chandy-Lamport algorithm uses a control message, called a marker
            whose role in a FIFO system is to separate messages in the channels.
            After a site has recorded its snapshot, it sends a marker, along all of its
            outgoing channels before sending out any more messages.
            A marker separates the messages in the channel into those to be included in
            the snapshot from those not to be recorded in the snapshot.
            A process must record its snapshot no later than when it receives a marker on
            any of its incoming channels.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   10 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Chandy-Lamport algorithm


            The algorithm can be initiated by any process by executing the “Marker
            Sending Rule” by which it records its local state and sends a marker on each
            outgoing channel.
            A process executes the “Marker Receiving Rule” on receiving a marker. If the
            process has not yet recorded its local state, it records the state of the channel
            on which the marker is received as empty and executes the “Marker Sending
            Rule” to record its local state.
            The algorithm terminates after each process has received a marker on all of
            its incoming channels.
            All the local snapshots get disseminated to all other processes and all the
            processes can determine the global state.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   11 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Chandy-Lamport algorithm

     Marker Sending Rule for process i
      1 Process i records its state.
         2    For each outgoing channel C on which a marker
              has not been sent, i sends a marker along C
              before i sends further messages along C.
     Marker Receiving Rule for process j
     On receiving a marker along channel C:
         if j has not recorded its state then
                 Record the state of C as the empty set
                 Follow the “Marker Sending Rule”
         else
                 Record the state of C as the set of messages
                 received along C after j’s state was recorded
                 and before j received the marker along C


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   12 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Correctness and Complexity

    Correctness
        Due to FIFO property of channels, it follows that no message sent after the
        marker on that channel is recorded in the channel state. Thus, condition C2
        is satisfied.
        When a process pj receives message mij that precedes the marker on channel
        Cij , it acts as follows: If process pj has not taken its snapshot yet, then it
        includes mij in its recorded snapshot. Otherwise, it records mij in the state of
        the channel Cij . Thus, condition C1 is satisfied.
    Complexity
            The recording part of a single instance of the algorithm requires O(e)
            messages and O(d) time, where e is the number of edges in the network and
            d is the diameter of the network.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   13 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Properties of the recorded global state


            The recorded global state may not correspond to any of the global states that
            occurred during the computation.
            This happens because a process can change its state asynchronously before
            the markers it sent are received by other sites and the other sites record their
            states.
                ◮    But the system could have passed through the recorded global states in some
                     equivalent executions.
                ◮    The recorded global state is a valid state in an equivalent execution and if a
                     stable property (i.e., a property that persists) holds in the system before the
                     snapshot algorithm begins, it holds in the recorded global snapshot.
                ◮    Therefore, a recorded global state is useful in detecting stable properties.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   14 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Spezialetti-Kearns algorithm
    There are two phases in obtaining a global snapshot: locally recording the
    snapshot at every process and distributing the resultant global snapshot to all the
    initiators.
    Efficient snapshot recording
            In the Spezialetti-Kearns algorithm, a markers carries the identifier of the
            initiator of the algorithm. Each process has a variable master to keep track of
            the initiator of the algorithm.
            A key notion used by the optimizations is that of a region in the system. A
            region encompasses all the processes whose master field contains the
            identifier of the same initiator.
            When the initiator’s identifier in a marker received along a channel is different
            from the value in the master variable, the sender of the marker lies in a
            different region.
            The identifier of the concurrent initiator is recorded in a local variable
            id-border -set.


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   15 / 51
Distributed Computing: Principles, Algorithms, and Systems




            The state of the channel is recorded just as in the Chandy-Lamport algorithm
            (including those that cross a border between regions).
            Snapshot recording at a process is complete after it has received a marker
            along each of its channels.
            After every process has recorded its snapshot, the system is partitioned into
            as many regions as the number of concurrent initiations of the algorithm.
            Variable id-border -set at a process contains the identifiers of the neighboring
            regions.
    Efficient dissemination of the recorded snapshot
            In the snapshot recording phase, a forest of spanning trees is implicitly
            created in the system. The initiator of the algorithm is the root of a spanning
            tree and all processes in its region belong to its spanning tree.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   16 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Efficient dissemination of the recorded snapshot

            If pi receives its first marker from pj then process pj is the parent of process
            pi in the spanning tree.
            When an intermediate process in a spanning tree has received the recorded
            states from all its child processes and has recorded the states of all incoming
            channels, it forwards its locally recorded state and the locally recorded states
            of all its descendent processes to its parent.
            When the initiator receives the locally recorded states of all its descendents
            from its children processes, it assembles the snapshot for all the processes in
            its region and the channels incident on these processes.
            The initiator exchanges the snapshot of its region with the initiators in
            adjacent regions in rounds.
            The message complexity of snapshot recording is O(e) irrespective of the
            number of concurrent initiations of the algorithm. The message complexity of
            assembling and disseminating the snapshot is O(rn2 ) where r is the number
            of concurrent initiations.


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   17 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Snapshot algorithms for non-FIFO channels




            In a non-FIFO system, a marker cannot be used to delineate messages into
            those to be recorded in the global state from those not to be recorded in the
            global state.
            In a non-FIFO system, either some degree of inhibition or piggybacking of
            control information on computation messages to capture out-of-sequence
            messages.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   18 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Lai-Yang algorithm


    The Lai-Yang algorithm fulfills this role of a marker in a non-FIFO system by
    using a coloring scheme on computation messages that works as follows:
        1   Every process is initially white and turns red while taking a snapshot. The
            equivalent of the “Marker Sending Rule” is executed when a process turns
            red.
        2   Every message sent by a white (red) process is colored white (red).
        3   Thus, a white (red) message is a message that was sent before (after) the
            sender of that message recorded its local snapshot.
        4   Every white process takes its snapshot at its convenience, but no later than
            the instant it receives a red message.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   19 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Lai-Yang algorithm



        4   Every white process records a history of all white messages sent or received
            by it along each channel.
        5   When a process turns red, it sends these histories along with its snapshot to
            the initiator process that collects the global snapshot.
        6   The initiator process evaluates transit(LSi , LSj ) to compute the state of a
            channel Cij as given below:
            SCij = white messages sent by pi on Cij − white messages received by pj on
            Cij
            = {send(mij )|send(mij ) ∈ LSi } − {rec(mij )|rec(mij ) ∈ LSj }.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   20 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Mattern’s algorithm
    Mattern’s algorithm is based on vector clocks and assumes a single initiator
    process and works as follows:
        1   The initiator “ticks” its local clock and selects a future vector time s at
            which it would like a global snapshot to be recorded. It then broadcasts this
            time s and freezes all activity until it receives all acknowledgements of the
            receipt of this broadcast.
        2   When a process receives the broadcast, it remembers the value s and returns
            an acknowledgement to the initiator.
        3   After having received an acknowledgement from every process, the initiator
            increases its vector clock to s and broadcasts a dummy message to all
            processes.
        4   The receipt of this dummy message forces each recipient to increase its clock
            to a value ≥ s if not already ≥ s.
        5   Each process takes a local snapshot and sends it to the initiator when (just
            before) its clock increases from a value less than s to a value ≥ s.
        6   The state of Cij is all messages sent along Cij , whose timestamp is smaller
            than s and which are received by pj after recording LSj .
A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   21 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Mattern’s algorithm


            A termination detection scheme for non-FIFO channels is required to detect
            that no white messages are in transit.
            One of the following schemes can be used for termination detection:
    First method:
            Each process i keeps a counter cntri that indicates the difference between the
            number of white messages it has sent and received before recording its
            snapshot.
            It reports this value to the initiator process along with its snapshot and
            forwards all white messages, it receives henceforth, to the initiator.
            Snapshot collection terminates when the initiator has received                             i   cntri
            number of forwarded white messages.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                     CUP 2008   22 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Mattern’s algorithm



    Second method:
            Each red message sent by a process carries a piggybacked value of the number
            of white messages sent on that channel before the local state recording.
            Each process keeps a counter for the number of white messages received on
            each channel.
            A process can detect termination of recording the states of incoming channels
            when it receives as many white messages on each channel as the value
            piggybacked on red messages received on that channel.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   23 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Snapshots in a causal delivery system

            The causal message delivery property CO provides a built-in message
            synchronization to control and computation messages.
            Two global snapshot recording algorithms, namely, Acharya-Badrinath and
            Alagar-Venkatesan exist that assume that the underlying system supports
            causal message delivery.
            In both these algorithms recording of process state is identical and proceed as
            follows :
            An initiator process broadcasts a token, denoted as token, to every process
            including itself.
            Let the copy of the token received by process pi be denoted tokeni .
            A process pi records its local snapshot LSi when it receives tokeni and sends
            the recorded snapshot to the initiator.
            The algorithm terminates when the initiator receives the snapshot recorded
            by each process.


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   24 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Snapshots in a causal delivery system

    Correctness
    For any two processes pi and pj , the following property is satisfied:

                                            send(mij ) ∈ LSi ⇒ rec(mij ) ∈ LSj .

            This is due to the causal ordering property of the underlying system as
            explained next.
                ◮    Let a message mij be such that rec(tokeni ) −→ send(mij ).
                ◮    Then send(tokenj ) −→ send(mij ) and the underlying causal ordering property
                     ensures that rec(tokenj ), at which instant process pj records LSj , happens
                     before rec(mij ).
                ◮    Thus, mij whose send is not recorded in LSi , is not recorded as received in LSj .
            Channel state recording is different in these two algorithms and is discussed
            next.



A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   25 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Channel state recording in Acharya-Badrinath algorithm



            Each process pi maintains arrays SENTi [1, ...N] and RECDi [1, ..., N].
            SENTi [j] is the number of messages sent by process pi to process pj .
            RECDi [j] is the number of messages received by process pi from process pj .
            Channel states are recorded as follows:
            When a process pi records its local snapshot LSi on the receipt of tokeni , it
            includes arrays RECDi and SENTi in its local state before sending the
            snapshot to the initiator.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   26 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Channel state recording in Acharya-Badrinath algorithm


    When the algorithm terminates, the initiator determines the state of channels as
    follows:
            The state of each channel from the initiator to each process is empty.
            The state of channel from process pi to process pj is the set of messages
            whose sequence numbers are given by {RECDj [i] + 1, . . . , SENTi [j]}.
    Complexity:
            This algorithm requires 2n messages and 2 time units for recording and
            assembling the snapshot, where one time unit is required for the delivery of a
            message.
            If the contents of messages in channels state are required, the algorithm
            requires 2n messages and 2 time units additionally.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   27 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Channel state recording in Alagar-Venkatesan algorithm


            A message is referred to as old if the send of the message causally precedes
            the send of the token.
            Otherwise, the message is referred to as new.
    In Alagar-Venkatesan algorithm channel states are recorded as follows:
        1   When a process receives the token, it takes its snapshot, initializes the state
            of all channels to empty, and returns Done message to the initiator. Now
            onwards, a process includes a message received on a channel in the channel
            state only if it is an old message.
        2   After the initiator has received Done message from all processes, it
            broadcasts a Terminate message.
        3   A process stops the snapshot algorithm after receiving a Terminate message.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   28 / 51
Distributed Computing: Principles, Algorithms, and Systems




                Algorithms                    Features
                Chandy-         Baseline algorithm. Requires FIFO channels. O(e) messages
                Lamport         to record snapshot and O(d) time.
                Spezialetti-    supports concurrent initiators, efficient assembly
                Kearns          and distribution of a snapshot. Assumes bidirectional channels.
                                O(e) messages to record, O(rn2 ) messages to assemble and
                                distribute snapshot.
           Lai-Yang             Works for non-FIFO channels. Markers piggybacked
                                on computation messages. Message history required
                                to compute channel states.
           Li et al.            Small message history
                                needed as channel states are computed incrementally.
           Mattern              No message history required. Termination detection
                                required to compute channel states.
           Acharya-             Requires causal delivery support, Centralized computation of channel
           Badrinath            states, Channel message contents need not be known.
                                Requires 2n messages, 2 time units.
           Alagar-Venkatesan    Requires causal delivery support, Distributed computation of channel
                                states. Requires 3n messages, 3 time units, small messages.
   n = # processes, u = # edges on which messages were sent after previous snapshot, e = # channels, d is the
                             diameter of the network, r = # concurrent initiators.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   29 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Necessary and sufficient conditions for consistent global
   snapshots

            Many applications require that local process states are periodically recorded
            and analyzed during execution or post martem.
            A saved intermediate state of a process during its execution is called a local
            checkpoint of the process.
            A consistent snapshot consists of a set of local states that occurred
            concurrently or had a potential to occur simultaneously.
            Processes take checkpoints asynchronously. The i th (i ≥ 0) checkpoint of
            process pp is assigned the sequence number i and is denoted by Cp,i .
            We assume that each process takes an initial checkpoint before execution
            begins and takes a virtual checkpoint after execution ends.
            The i th checkpoint interval of process pp consists of all the computation
            performed between its (i − 1)th and i th checkpoints (and includes the
            (i − 1)th checkpoint but not i th ).


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   30 / 51
Distributed Computing: Principles, Algorithms, and Systems




    Even if two local checkpoints do not have a causal path between them, they may
    not belong to the same consistent global snapshot.

    An Example:
    Consider the execution shown in the Figure 4.2.
        Although neither of the checkpoints C1,1 and C3,2 happened before the other,
        they cannot be grouped together with a checkpoint on process p2 to form a
        consistent global snapshot.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   31 / 51
Distributed Computing: Principles, Algorithms, and Systems




                          C 1,0                     C 1,1                                              C 1,2
              p1
                                       m1                                       m3                                          m5
                          C 2,0                     C 2,1                               C 2,2                   C 2,3
              p2
                                          m                                m4                                  m6
                          C 3,0               2     C 3,1                            C3,2                               C 3,3
              p3

                                                                                       Legend:                      A checkpoint


                                        Figure 4.2: An Illustration of zigzag paths.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                                      CUP 2008   32 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Necessary and sufficient conditions for consistent global
   snapshots


            To describe the necessary and sufficient conditions for a consistent snapshot,
            Netzer and Xu defined a generalization of the Lamport’s happen before
            relation, called a zigzag path.
            A zigzag path between two checkpoints is a causal path, however, a zigzag
            path allows a message to be sent before the previous one in the path is
            received.
            In the Figure 4.2 although a causal path does not exist from C1,1 to C3,2 , a
            zigzag path does exist from C1,1 to C3,2 .
            This zigzag path means that no consistent snapshot exists in this execution
            that contains both C1,1 and C3,2 .




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   33 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Zigzag paths and consistent global snapshots


    Definition
    A zigzag path exists from a checkpoint Cx,i to a checkpoint Cy,j iff there exists
    messages m1 , m2 , ...mn (n≥1) such that
        1   m1 is sent by process px after Cx,i .
        2   If mk (1≤k≤n) is received by process pz , then mk+1 is sent by pz in the same
            or a later checkpoint interval (although mk+1 may be sent before or after mk
            is received), and
        3   mn is received by process py before Cy,j .

    Example:
    In the Figure 4.2 a zigzag path exists from C1,1 to C3,2 due to messages m3 and
    m4 .




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   34 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Zigzag cycle
    Definition
    A checkpoint C is involved in a zigzag cycle iff there is a zigzag path from C to
    itself.
    In Figure 4.3, C2,1 is on a zigzag cycle formed by messages m1 and m2 .


                                 C 1,0                                                           C 1,1                   C1,2
                       p1
                                                                                                            m3
                                               m1
                                 C 2,0                 C 2,1                   m2     C 2,2                      C 2,3
                       p
                        2                                                                      m
                                                                                                 4
                                C 3,0                               C 3,1                            C3,2
                       p
                        3




          Figure 4.3: A zigzag cycle, inconsistent snapshot, and consistent snapshot.


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                                          CUP 2008   35 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Difference between a zigzag path and a causal path


            A causal path exists from a checkpoint A to another checkpoint B iff there is
            chain of messages starting after A and ending before B such that each
            message is sent after the previous one in the chain is received.
            A zigzag path consists of such a message chain, however, a message in the
            chain can be sent before the previous one in the chain is received, as long as
            the send and receive are in the same checkpoint interval.
            Thus a causal path is always a zigzag path, but a zigzag path need not be a
            causal path.
            Another difference between a zigzag path and a causal path is that a zigzag
            path can form a cycle but a causal path never forms a cycle.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   36 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Consistent global snapshots


    Netzer and Xu proved that if no zigzag path (or cycle) exists between any two
    checkpoints from a set S of checkpoints, then a consistent snapshot can be
    formed that includes the set S of checkpoints and vice versa.
        The absence of a causal path between checkpoints in a snapshot corresponds
        to the necessary condition for a consistent snapshot, and the absence of a
        zigzag path between checkpoints in a snapshot corresponds to the necessary
        and sufficient conditions for a consistent snapshot.
            A set of checkpoints S can be extended to a consistent snapshot if and only if
            no checkpoint in S has a zigzag path to any other checkpoint in S.
            A checkpoint can be a part of a consistent snapshot if and only if it is not
            invloved in a Z-cycle.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   37 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Finding consistent global snapshots in a distributed
   computation

            Discuss how individual local checkpoints can be combined with those from
            other processes to form global snapshots that are consistent.
            Manivannan-Netzer-Singhal analyzed the set of all consistent snapshots that
            can be built from a set of checkpoints S.

    Definition
    Let A, B be individual checkpoints and R, S be sets of checkpoints. Let                                   be a
    relation defined over checkpoints and sets of checkpoints such that
        1   A        B iff a Z-path exists from A to B,
        2   A        S iff a Z-path exists from A to some member of S,
        3   S        A iff a Z-path exists from some member of S to A, and
        4   R        S iff a Z-path exists from some member of R to some member of S.



A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   38 / 51
Distributed Computing: Principles, Algorithms, and Systems




    S     S defines that no Z-path (including Z-cycle) exists from any member of S to
    any other member of S and implies that checkpoints in S are all from different
    processes.
    In this notations, the results of Netzer and Xu can be expressed as follows:
    Theorem 4.1:
    A set of checkpoints S can be extended to a consistent global snapshot if and
    only if S    S.
    Corollary 4.1
    A checkpoint C can be part of a consistent global snapshot if and only if it is not
    involved in a Z-cycle.
    Corollary 4.2
    A set of checkpoints S is a consistent global snapshot if and only if S   S and
    |S| = N, where N is the number of processes.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   39 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Finding consistent global snapshots

    Given a set S of checkpoints such that S      S, we first discuss what checkpoints
    from other processes can be combined with S to build a consistent global
    snapshot.
    First Observation:
         None of the checkpoints that have a Z-path to or from any of the
         checkpoints in S can be used.
         Only those checkpoints that have no Z-paths to or from any of the
         checkpoints in S are candidates for inclusion in the consistent snapshot.
         We call the set of all such candidates the Z-cone of S and all checkpoints
         that have no causal path to or from any checkpoint in S the C-cone of S.
            A causal path is always Z-path, the Z-cone of S is a subset of the C-cone of
            S for an arbitrary S as shown in the Figure 4.4




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   40 / 51
Distributed Computing: Principles, Algorithms, and Systems




                                                                               S




                    Edge of C − cone                                Edges of Z − cone                      Edge of C − cone




                                       Z − paths to S            Z − unordered with S            Z − paths from S
                                                                      ( Z − cone)
                    Casual paths to S                       Casually unordered with S                     Casual paths from S
                                                                      (C − cone)


      Figure 4.4: The Z-cone and the C-cone associated with a set of checkpoints S.


A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                                     CUP 2008   41 / 51
Distributed Computing: Principles, Algorithms, and Systems




    Second Observation:
    Although candidates for building a consistent snapshot from S must lie in the
    Z-cone of S, not all checkpoints in the Z-cone can form a consistent snapshot
    with S. If a checkpoint in the Z-cone is involved in a Z-cycle, then it cannot be
    part of a consistent snapshot.

    Definition
    Let S be a set of checkpoints such that S                                        S. Then, for each process pq , the
         q
    set Suseful is defined as
                          q
                         Suseful = {Cq,i | (S                    Cq,i ) ∧ (Cq,i             S) ∧ (Cq,i   Cq,i )}.

    In addition, we define
                                                                                    q
                                                             Suseful =             Suseful .
                                                                               q




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                              CUP 2008   42 / 51
Distributed Computing: Principles, Algorithms, and Systems




    Lemma 4.1
    “Let S be a set of checkpoints such that S    S. Let Cq,i be any checkpoint of
    process pq such that Cq,i ∈ S. Then S ∪ {Cq,i } can be extended to a consistent
    snapshot if and only if Cq,i ∈ Suseful .”

    Lemma 4.1 states that if we are given a set S such that S    S, we are
    guaranteed that any single checkpoint from Suseful can belong to a consistent
    global snapshot that also contains S.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   43 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Third observation

            Although none of the checkpoints in Suseful has a Z-path to or from any
            checkpoint in S, Z-paths may exist between members of Suseful .
            One final constraint is placed on the set T we choose from Suseful to build a
            consistent snapshot from S: checkpoints in T must have no Z-paths between
            them. Furthermore, since S     S, from Theorem 4.1 at least one such T
            must exist.
    Theorem 4.2:
    Let S be a set of checkpoints such that S  S and let T be any set of
    checkpoints such that S ∩ T = ∅. Then, S ∪ T is a consistent global snapshot if
    and only if
        1   T ⊆ Suseful ,
        2   T   T , and
        3   |S ∪ T | = N.



A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   44 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Manivannan-Netzer-Singhal algorithm for enumerating
   consistent snapshots
    An algorithm due to Manivannan-Netzer-Singhal that explicitly computes all
    consistent snapshots that include a given set a set of checkpoints S.
    1:          ComputeAllCgs(S) {
    2:               let G = ∅
    3:               if S    S then
    4:                       let AllProcs be the set of all processes not represented in S
    5:                       ComputeAllCgsFrom(S, AllProcs)
    6:               return G
    7:          }
    8:          ComputeAllCgsFrom(T , ProcSet) {
    9:               if (ProcSet = ∅) then
    10:                      G = G ∪ {T }
    11:              else
    12:                      let pq be any process in ProcSet
                                                         q
    13:                      for each checkpoint C ∈ Tuseful do
    14:                              ComputeAllCgsFrom(T ∪ {C }, ProcSet  {pq })
    15:         }

A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   45 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Manivannan-Netzer-Singhal algorithm



    The algorithm restricts its selection of checkpoints to those within the Z-cone of
    S and it checks for the presence of Z-cycles within the Z-cone.
    Correctness:
    The following theorem argues the correctness of the algorithm.
    Theorem 4.3:
    Let S be a set of checkpoints and G be the set returned by ComputeAllCgs(S). If
    S     S, then T ∈ G if and only if T is a consistent snapshot containing S. That
    is, G contains exactly the consistent snapshots that contain S.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   46 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Finding Z-paths in a distributed computation

    Discuss a method for determining the existence of Z-paths between checkpoints in
    a distributed computation that has terminated or has stopped execution, using the
    rollback-dependency graph (R-graph).

    Definition
    The rollback-dependency graph of a distributed computation is a directed graph
    G = (V , E ), where the vertices V are the checkpoints of the distributed
    computation and an edge (Cp,i , Cq,j ) from checkpoint Cp,i to checkpoint Cq,j
    belongs to E if
      1  p = q and j = i + 1, or
        2   p = q and a message m sent from the i th checkpoint interval of pp is
            received by pq in its j th checkpoint interval (i, j > 0).




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   47 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Example of an R-graph



            Figure 4.6 shows shows the R-graph of the computation shown in Figure 4.5.
            In Figure 4.6, C1,3 , C2,3 , and C3,3 represent the volatile checkpoints, the
            checkpoints representing the last state the process attained before
            terminating.
            We denote the fact that there is a path from C to D in the R-graph by
               rd
            C     D. It only denotes the existence of a path; it does not specify any
            particular path.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   48 / 51
Distributed Computing: Principles, Algorithms, and Systems




                        C 1,0            C 1,1                                 C 1,2
                  p1
                                                         m1            m3
                        C 2,0                                 C 2,1                    m4 C
                                                                                            2,2
                  p2
                                                 m                                                     m5          m6
                        C 3,0                        2                C 3,1                                 C3,2
                  p3


                                          Figure 4.5: A distributed computation.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                              CUP 2008   49 / 51
Distributed Computing: Principles, Algorithms, and Systems




             C                  C 1,1                                 C                           C
              1,0                                                         1,2                          1,3




             C 2,0                        C                                                              C 2,3
                                              2,1
                                                                                                                         Volatile
                                                                                   C 2,2                                  checkpoints




             C
              3,0

                                                    C 3,1                                       C 3,2            C 3,3


                           Figure 4.6: The R-graph of the computation in Figure 4.5.




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                                    CUP 2008      50 / 51
Distributed Computing: Principles, Algorithms, and Systems



   Example of an R-graph

    The following theorem establishes the correspondence between the paths in the
    R-graph and the Z-paths between checkpoints.
    Theorem:
    Let G = (V , E ) be the R-graph of a distributed computation. Then, for any two
    checkpoints Cp,i and Cq,j , Cp,i  Cq,j if and only if
        1   p = q and i < j, or
                         rd
        2   Cp,i +1           Cq,j in G (note that in this case p could still be equal to q).
    Examples:
       In Figure 4.5, a zigzag path exists from C1,1 to C3,1 because in the
                                                          rd
       corresponding R-graph, shown in Figure 4.6, C1,2      C3,1 .
            Likewise, C2,1 is on a Z-cycle because in the corresponding R-graph, shown in
                              rd
            Figure 4.6, C2,2     C2,1 .




A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms
                                                     G                                                 CUP 2008   51 / 51

Contenu connexe

Tendances

Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...ArunChokkalingam
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems Maurvi04
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Routing algorithm
Routing algorithmRouting algorithm
Routing algorithmBushra M
 
Congestion avoidance in TCP
Congestion avoidance in TCPCongestion avoidance in TCP
Congestion avoidance in TCPselvakumar_b1985
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layerNitesh Singh
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Distributed system architecture
Distributed system architectureDistributed system architecture
Distributed system architectureYisal Khan
 
Lamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionLamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionNeelamani Samal
 
Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design PrinciplesSHUBHA CHATURVEDI
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed systemishapadhy
 

Tendances (20)

Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
Peer to Peer services and File systems
Peer to Peer services and File systemsPeer to Peer services and File systems
Peer to Peer services and File systems
 
11. dfs
11. dfs11. dfs
11. dfs
 
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts –  ...
WSN NETWORK -MAC PROTOCOLS - Low Duty Cycle Protocols And Wakeup Concepts – ...
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Routing algorithm
Routing algorithmRouting algorithm
Routing algorithm
 
Transport layer
Transport layer Transport layer
Transport layer
 
Congestion avoidance in TCP
Congestion avoidance in TCPCongestion avoidance in TCP
Congestion avoidance in TCP
 
Audio and Video Compression
Audio and Video CompressionAudio and Video Compression
Audio and Video Compression
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layer
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Mobile computing (Wireless) Medium Access Control (MAC)
Mobile computing (Wireless) Medium Access Control (MAC)Mobile computing (Wireless) Medium Access Control (MAC)
Mobile computing (Wireless) Medium Access Control (MAC)
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
Distributed system architecture
Distributed system architectureDistributed system architecture
Distributed system architecture
 
Lamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionLamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusion
 
Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design Principles
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
 

En vedette

Global state recording in Distributed Systems
Global state recording in Distributed SystemsGlobal state recording in Distributed Systems
Global state recording in Distributed SystemsArsnet
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsZbigniew Jerzak
 
Distributed computing time
Distributed computing timeDistributed computing time
Distributed computing timeDeepak John
 
Flexible Symmetric Global Snapshot
Flexible Symmetric Global Snapshot Flexible Symmetric Global Snapshot
Flexible Symmetric Global Snapshot Ashutosh Jaiswal
 
Distributed Snapshots
Distributed SnapshotsDistributed Snapshots
Distributed Snapshotsawesomesos
 
Mp Os Survey
Mp Os SurveyMp Os Survey
Mp Os Surveyallankliu
 
Multiprocessor structures
Multiprocessor structuresMultiprocessor structures
Multiprocessor structuresShareb Ismaeel
 
Computer Architecture: A quantitative approach - Cap4 - Section 3
Computer Architecture: A quantitative approach - Cap4 - Section 3Computer Architecture: A quantitative approach - Cap4 - Section 3
Computer Architecture: A quantitative approach - Cap4 - Section 3Marcelo Arbore
 
Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Marcelo Arbore
 
Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1Marcelo Arbore
 
Computer Architecture: A quantitative approach - Cap4 - Section 6
Computer Architecture: A quantitative approach - Cap4 - Section 6Computer Architecture: A quantitative approach - Cap4 - Section 6
Computer Architecture: A quantitative approach - Cap4 - Section 6Marcelo Arbore
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefsbergwolf
 
Computer Architecture: A quantitative approach - Cap4 - Section 5
Computer Architecture: A quantitative approach - Cap4 - Section 5Computer Architecture: A quantitative approach - Cap4 - Section 5
Computer Architecture: A quantitative approach - Cap4 - Section 5Marcelo Arbore
 
BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...
BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...
BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...Nur Atiqah Mohd Rosli
 
Posix threads(asha)
Posix threads(asha)Posix threads(asha)
Posix threads(asha)Nagarajan
 
LDAP - Lightweight Directory Access Protocol
LDAP - Lightweight Directory Access ProtocolLDAP - Lightweight Directory Access Protocol
LDAP - Lightweight Directory Access ProtocolS. Hasnain Raza
 
Computer Architecture: A quantitative approach - Cap4 - Section 2
Computer Architecture: A quantitative approach - Cap4 - Section 2Computer Architecture: A quantitative approach - Cap4 - Section 2
Computer Architecture: A quantitative approach - Cap4 - Section 2Marcelo Arbore
 

En vedette (20)

Global state recording in Distributed Systems
Global state recording in Distributed SystemsGlobal state recording in Distributed Systems
Global state recording in Distributed Systems
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
Distributed computing time
Distributed computing timeDistributed computing time
Distributed computing time
 
LDAP
LDAPLDAP
LDAP
 
Flexible Symmetric Global Snapshot
Flexible Symmetric Global Snapshot Flexible Symmetric Global Snapshot
Flexible Symmetric Global Snapshot
 
Distributed Snapshots
Distributed SnapshotsDistributed Snapshots
Distributed Snapshots
 
Mp Os Survey
Mp Os SurveyMp Os Survey
Mp Os Survey
 
Multiprocessor structures
Multiprocessor structuresMultiprocessor structures
Multiprocessor structures
 
Computer Architecture: A quantitative approach - Cap4 - Section 3
Computer Architecture: A quantitative approach - Cap4 - Section 3Computer Architecture: A quantitative approach - Cap4 - Section 3
Computer Architecture: A quantitative approach - Cap4 - Section 3
 
Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8
 
Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1
 
Computer Architecture: A quantitative approach - Cap4 - Section 6
Computer Architecture: A quantitative approach - Cap4 - Section 6Computer Architecture: A quantitative approach - Cap4 - Section 6
Computer Architecture: A quantitative approach - Cap4 - Section 6
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefs
 
Computer Architecture: A quantitative approach - Cap4 - Section 5
Computer Architecture: A quantitative approach - Cap4 - Section 5Computer Architecture: A quantitative approach - Cap4 - Section 5
Computer Architecture: A quantitative approach - Cap4 - Section 5
 
BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...
BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...
BITS 1213 - OPERATING SYSTEM (PROCESS,THREAD,SYMMETRIC MULTIPROCESSOR,MICROKE...
 
Posix threads(asha)
Posix threads(asha)Posix threads(asha)
Posix threads(asha)
 
AD & LDAP
AD & LDAPAD & LDAP
AD & LDAP
 
LDAP - Lightweight Directory Access Protocol
LDAP - Lightweight Directory Access ProtocolLDAP - Lightweight Directory Access Protocol
LDAP - Lightweight Directory Access Protocol
 
Computer Architecture: A quantitative approach - Cap4 - Section 2
Computer Architecture: A quantitative approach - Cap4 - Section 2Computer Architecture: A quantitative approach - Cap4 - Section 2
Computer Architecture: A quantitative approach - Cap4 - Section 2
 
Coda file system tahir
Coda file system   tahirCoda file system   tahir
Coda file system tahir
 

Similaire à Day 2 global_state_and_snapshot_algorithms

IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...IRJET Journal
 
A NOVEL METHOD FOR PETRI NET MODELING
A NOVEL METHOD FOR PETRI NET MODELINGA NOVEL METHOD FOR PETRI NET MODELING
A NOVEL METHOD FOR PETRI NET MODELINGcscpconf
 
Detection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secretDetection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secretParag Tamhane
 
Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Austin Benson
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMKyuri Kim
 
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONLATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONcsandit
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Analysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restrictionAnalysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restrictioniaemedu
 
Analysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive controlAnalysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive controliaemedu
 
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIComprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIijtsrd
 
Global analysis of nonlinear dynamics
Global analysis of nonlinear dynamicsGlobal analysis of nonlinear dynamics
Global analysis of nonlinear dynamicsSpringer
 
Behavioural analysis of a complex manufacturing system having queue in the ma...
Behavioural analysis of a complex manufacturing system having queue in the ma...Behavioural analysis of a complex manufacturing system having queue in the ma...
Behavioural analysis of a complex manufacturing system having queue in the ma...Alexander Decker
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Filip Krikava
 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
 

Similaire à Day 2 global_state_and_snapshot_algorithms (20)

Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
 
A NOVEL METHOD FOR PETRI NET MODELING
A NOVEL METHOD FOR PETRI NET MODELINGA NOVEL METHOD FOR PETRI NET MODELING
A NOVEL METHOD FOR PETRI NET MODELING
 
70
7070
70
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Detection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secretDetection and identification of cheaters in (t, n) secret
Detection and identification of cheaters in (t, n) secret
 
Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)
 
Tesis-Maestria-Presentacion-SIturriaga
Tesis-Maestria-Presentacion-SIturriagaTesis-Maestria-Presentacion-SIturriaga
Tesis-Maestria-Presentacion-SIturriaga
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTM
 
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONLATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Analysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restrictionAnalysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restriction
 
Analysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive controlAnalysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive control
 
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIComprehensive Performance Evaluation on Multiplication of Matrices using MPI
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
 
Global analysis of nonlinear dynamics
Global analysis of nonlinear dynamicsGlobal analysis of nonlinear dynamics
Global analysis of nonlinear dynamics
 
Behavioural analysis of a complex manufacturing system having queue in the ma...
Behavioural analysis of a complex manufacturing system having queue in the ma...Behavioural analysis of a complex manufacturing system having queue in the ma...
Behavioural analysis of a complex manufacturing system having queue in the ma...
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Day 2 global_state_and_snapshot_algorithms

  • 1. Chapter 4: Global State and Snapshot Recording Algorithms Ajay Kshemkalyani and Mukesh Singhal Distributed Computing: Principles, Algorithms, and Systems Cambridge University Press A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 1 / 51
  • 2. Distributed Computing: Principles, Algorithms, and Systems Introduction Recording the global state of a distributed system on-the-fly is an important paradigm. The lack of globally shared memory, global clock and unpredictable message delays in a distributed system make this problem non-trivial. This chapter first defines consistent global states and discusses issues to be addressed to compute consistent distributed snapshots. Then several algorithms to determine on-the-fly such snapshots are presented for several types of networks. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 2 / 51
  • 3. Distributed Computing: Principles, Algorithms, and Systems System model The system consists of a collection of n processes p1 , p2 , ..., pn that are connected by channels. There are no globally shared memory and physical global clock and processes communicate by passing messages through communication channels. Cij denotes the channel from process pi to process pj and its state is denoted by SCij . The actions performed by a process are modeled as three types of events: Internal events,the message send event and the message receive event. For a message mij that is sent by process pi to process pj , let send(mij ) and rec(mij ) denote its send and receive events. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 3 / 51
  • 4. Distributed Computing: Principles, Algorithms, and Systems System model At any instant, the state of process pi , denoted by LSi , is a result of the sequence of all the events executed by pi till that instant. For an event e and a process state LSi , e∈LSi iff e belongs to the sequence of events that have taken process pi to state LSi . For an event e and a process state LSi , e∈LSi iff e does not belong to the sequence of events that have taken process pi to state LSi . For a channel Cij , the following set of messages can be defined based on the local states of the processes pi and pj Transit: transit(LSi , LSj ) = {mij |send(mij ) ∈ LSi rec(mij ) ∈ LSj } A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 4 / 51
  • 5. Distributed Computing: Principles, Algorithms, and Systems Models of communication Recall, there are three models of communication: FIFO, non-FIFO, and Co. In FIFO model, each channel acts as a first-in first-out message queue and thus, message ordering is preserved by a channel. In non-FIFO model, a channel acts like a set in which the sender process adds messages and the receiver process removes messages from it in a random order. A system that supports causal delivery of messages satisfies the following property: “For any two messages mij and mkj , if send(mij ) −→ send(mkj ), then rec(mij ) −→ rec(mkj )”. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 5 / 51
  • 6. Distributed Computing: Principles, Algorithms, and Systems Consistent global state The global state of a distributed system is a collection of the local states of the processes and the channels. Notationally, global state GS is defined as, GS = { i LSi , i ,j SCij } A global state GS is a consistent global state iff it satisfies the following two conditions : C1: send(mij )∈LSi ⇒ mij ∈SCij ⊕ rec(mij )∈LSj . (⊕ is Ex-OR operator.) C2: send(mij )∈LSi ⇒ mij ∈SCij ∧ rec(mij )∈LSj . A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 6 / 51
  • 7. Distributed Computing: Principles, Algorithms, and Systems Interpretation in terms of cuts A cut in a space-time diagram is a line joining an arbitrary point on each process line that slices the space-time diagram into a PAST and a FUTURE. A consistent global state corresponds to a cut in which every message received in the PAST of the cut was sent in the PAST of that cut. Such a cut is known as a consistent cut. For example, consider the space-time diagram for the computation illustrated in Figure 4.1. Cut C1 is inconsistent because message m1 is flowing from the FUTURE to the PAST. Cut C2 is consistent and message m4 must be captured in the state of channel C21 . A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 7 / 51
  • 8. Distributed Computing: Principles, Algorithms, and Systems C C2 1 e1 1 e2 1 e3 1 4 e1 p 1 m1 1 e2 e22 e23 e4 2 m m 5 p2 4 1 m2 e e2 3 3 e3 4 e3 5 e3 p3 3 m 1 e4 3 e42 p4 time Figure 4.1: An Interpretation in Terms of a Cut. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 8 / 51
  • 9. Distributed Computing: Principles, Algorithms, and Systems Issues in recording a global state The following two issues need to be addressed: I1: How to distinguish between the messages to be recorded in the snapshot from those not to be recorded. -Any message that is sent by a process before recording its snapshot, must be recorded in the global snapshot (from C1). -Any message that is sent by a process after recording its snapshot, must not be recorded in the global snapshot (from C2). I2: How to determine the instant when a process takes its snapshot. -A process pj must record its snapshot before processing a message mij that was sent by process pi after recording its snapshot. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 9 / 51
  • 10. Distributed Computing: Principles, Algorithms, and Systems Snapshot algorithms for FIFO channels Chandy-Lamport algorithm The Chandy-Lamport algorithm uses a control message, called a marker whose role in a FIFO system is to separate messages in the channels. After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending out any more messages. A marker separates the messages in the channel into those to be included in the snapshot from those not to be recorded in the snapshot. A process must record its snapshot no later than when it receives a marker on any of its incoming channels. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 10 / 51
  • 11. Distributed Computing: Principles, Algorithms, and Systems Chandy-Lamport algorithm The algorithm can be initiated by any process by executing the “Marker Sending Rule” by which it records its local state and sends a marker on each outgoing channel. A process executes the “Marker Receiving Rule” on receiving a marker. If the process has not yet recorded its local state, it records the state of the channel on which the marker is received as empty and executes the “Marker Sending Rule” to record its local state. The algorithm terminates after each process has received a marker on all of its incoming channels. All the local snapshots get disseminated to all other processes and all the processes can determine the global state. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 11 / 51
  • 12. Distributed Computing: Principles, Algorithms, and Systems Chandy-Lamport algorithm Marker Sending Rule for process i 1 Process i records its state. 2 For each outgoing channel C on which a marker has not been sent, i sends a marker along C before i sends further messages along C. Marker Receiving Rule for process j On receiving a marker along channel C: if j has not recorded its state then Record the state of C as the empty set Follow the “Marker Sending Rule” else Record the state of C as the set of messages received along C after j’s state was recorded and before j received the marker along C A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 12 / 51
  • 13. Distributed Computing: Principles, Algorithms, and Systems Correctness and Complexity Correctness Due to FIFO property of channels, it follows that no message sent after the marker on that channel is recorded in the channel state. Thus, condition C2 is satisfied. When a process pj receives message mij that precedes the marker on channel Cij , it acts as follows: If process pj has not taken its snapshot yet, then it includes mij in its recorded snapshot. Otherwise, it records mij in the state of the channel Cij . Thus, condition C1 is satisfied. Complexity The recording part of a single instance of the algorithm requires O(e) messages and O(d) time, where e is the number of edges in the network and d is the diameter of the network. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 13 / 51
  • 14. Distributed Computing: Principles, Algorithms, and Systems Properties of the recorded global state The recorded global state may not correspond to any of the global states that occurred during the computation. This happens because a process can change its state asynchronously before the markers it sent are received by other sites and the other sites record their states. ◮ But the system could have passed through the recorded global states in some equivalent executions. ◮ The recorded global state is a valid state in an equivalent execution and if a stable property (i.e., a property that persists) holds in the system before the snapshot algorithm begins, it holds in the recorded global snapshot. ◮ Therefore, a recorded global state is useful in detecting stable properties. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 14 / 51
  • 15. Distributed Computing: Principles, Algorithms, and Systems Spezialetti-Kearns algorithm There are two phases in obtaining a global snapshot: locally recording the snapshot at every process and distributing the resultant global snapshot to all the initiators. Efficient snapshot recording In the Spezialetti-Kearns algorithm, a markers carries the identifier of the initiator of the algorithm. Each process has a variable master to keep track of the initiator of the algorithm. A key notion used by the optimizations is that of a region in the system. A region encompasses all the processes whose master field contains the identifier of the same initiator. When the initiator’s identifier in a marker received along a channel is different from the value in the master variable, the sender of the marker lies in a different region. The identifier of the concurrent initiator is recorded in a local variable id-border -set. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 15 / 51
  • 16. Distributed Computing: Principles, Algorithms, and Systems The state of the channel is recorded just as in the Chandy-Lamport algorithm (including those that cross a border between regions). Snapshot recording at a process is complete after it has received a marker along each of its channels. After every process has recorded its snapshot, the system is partitioned into as many regions as the number of concurrent initiations of the algorithm. Variable id-border -set at a process contains the identifiers of the neighboring regions. Efficient dissemination of the recorded snapshot In the snapshot recording phase, a forest of spanning trees is implicitly created in the system. The initiator of the algorithm is the root of a spanning tree and all processes in its region belong to its spanning tree. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 16 / 51
  • 17. Distributed Computing: Principles, Algorithms, and Systems Efficient dissemination of the recorded snapshot If pi receives its first marker from pj then process pj is the parent of process pi in the spanning tree. When an intermediate process in a spanning tree has received the recorded states from all its child processes and has recorded the states of all incoming channels, it forwards its locally recorded state and the locally recorded states of all its descendent processes to its parent. When the initiator receives the locally recorded states of all its descendents from its children processes, it assembles the snapshot for all the processes in its region and the channels incident on these processes. The initiator exchanges the snapshot of its region with the initiators in adjacent regions in rounds. The message complexity of snapshot recording is O(e) irrespective of the number of concurrent initiations of the algorithm. The message complexity of assembling and disseminating the snapshot is O(rn2 ) where r is the number of concurrent initiations. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 17 / 51
  • 18. Distributed Computing: Principles, Algorithms, and Systems Snapshot algorithms for non-FIFO channels In a non-FIFO system, a marker cannot be used to delineate messages into those to be recorded in the global state from those not to be recorded in the global state. In a non-FIFO system, either some degree of inhibition or piggybacking of control information on computation messages to capture out-of-sequence messages. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 18 / 51
  • 19. Distributed Computing: Principles, Algorithms, and Systems Lai-Yang algorithm The Lai-Yang algorithm fulfills this role of a marker in a non-FIFO system by using a coloring scheme on computation messages that works as follows: 1 Every process is initially white and turns red while taking a snapshot. The equivalent of the “Marker Sending Rule” is executed when a process turns red. 2 Every message sent by a white (red) process is colored white (red). 3 Thus, a white (red) message is a message that was sent before (after) the sender of that message recorded its local snapshot. 4 Every white process takes its snapshot at its convenience, but no later than the instant it receives a red message. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 19 / 51
  • 20. Distributed Computing: Principles, Algorithms, and Systems Lai-Yang algorithm 4 Every white process records a history of all white messages sent or received by it along each channel. 5 When a process turns red, it sends these histories along with its snapshot to the initiator process that collects the global snapshot. 6 The initiator process evaluates transit(LSi , LSj ) to compute the state of a channel Cij as given below: SCij = white messages sent by pi on Cij − white messages received by pj on Cij = {send(mij )|send(mij ) ∈ LSi } − {rec(mij )|rec(mij ) ∈ LSj }. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 20 / 51
  • 21. Distributed Computing: Principles, Algorithms, and Systems Mattern’s algorithm Mattern’s algorithm is based on vector clocks and assumes a single initiator process and works as follows: 1 The initiator “ticks” its local clock and selects a future vector time s at which it would like a global snapshot to be recorded. It then broadcasts this time s and freezes all activity until it receives all acknowledgements of the receipt of this broadcast. 2 When a process receives the broadcast, it remembers the value s and returns an acknowledgement to the initiator. 3 After having received an acknowledgement from every process, the initiator increases its vector clock to s and broadcasts a dummy message to all processes. 4 The receipt of this dummy message forces each recipient to increase its clock to a value ≥ s if not already ≥ s. 5 Each process takes a local snapshot and sends it to the initiator when (just before) its clock increases from a value less than s to a value ≥ s. 6 The state of Cij is all messages sent along Cij , whose timestamp is smaller than s and which are received by pj after recording LSj . A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 21 / 51
  • 22. Distributed Computing: Principles, Algorithms, and Systems Mattern’s algorithm A termination detection scheme for non-FIFO channels is required to detect that no white messages are in transit. One of the following schemes can be used for termination detection: First method: Each process i keeps a counter cntri that indicates the difference between the number of white messages it has sent and received before recording its snapshot. It reports this value to the initiator process along with its snapshot and forwards all white messages, it receives henceforth, to the initiator. Snapshot collection terminates when the initiator has received i cntri number of forwarded white messages. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 22 / 51
  • 23. Distributed Computing: Principles, Algorithms, and Systems Mattern’s algorithm Second method: Each red message sent by a process carries a piggybacked value of the number of white messages sent on that channel before the local state recording. Each process keeps a counter for the number of white messages received on each channel. A process can detect termination of recording the states of incoming channels when it receives as many white messages on each channel as the value piggybacked on red messages received on that channel. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 23 / 51
  • 24. Distributed Computing: Principles, Algorithms, and Systems Snapshots in a causal delivery system The causal message delivery property CO provides a built-in message synchronization to control and computation messages. Two global snapshot recording algorithms, namely, Acharya-Badrinath and Alagar-Venkatesan exist that assume that the underlying system supports causal message delivery. In both these algorithms recording of process state is identical and proceed as follows : An initiator process broadcasts a token, denoted as token, to every process including itself. Let the copy of the token received by process pi be denoted tokeni . A process pi records its local snapshot LSi when it receives tokeni and sends the recorded snapshot to the initiator. The algorithm terminates when the initiator receives the snapshot recorded by each process. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 24 / 51
  • 25. Distributed Computing: Principles, Algorithms, and Systems Snapshots in a causal delivery system Correctness For any two processes pi and pj , the following property is satisfied: send(mij ) ∈ LSi ⇒ rec(mij ) ∈ LSj . This is due to the causal ordering property of the underlying system as explained next. ◮ Let a message mij be such that rec(tokeni ) −→ send(mij ). ◮ Then send(tokenj ) −→ send(mij ) and the underlying causal ordering property ensures that rec(tokenj ), at which instant process pj records LSj , happens before rec(mij ). ◮ Thus, mij whose send is not recorded in LSi , is not recorded as received in LSj . Channel state recording is different in these two algorithms and is discussed next. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 25 / 51
  • 26. Distributed Computing: Principles, Algorithms, and Systems Channel state recording in Acharya-Badrinath algorithm Each process pi maintains arrays SENTi [1, ...N] and RECDi [1, ..., N]. SENTi [j] is the number of messages sent by process pi to process pj . RECDi [j] is the number of messages received by process pi from process pj . Channel states are recorded as follows: When a process pi records its local snapshot LSi on the receipt of tokeni , it includes arrays RECDi and SENTi in its local state before sending the snapshot to the initiator. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 26 / 51
  • 27. Distributed Computing: Principles, Algorithms, and Systems Channel state recording in Acharya-Badrinath algorithm When the algorithm terminates, the initiator determines the state of channels as follows: The state of each channel from the initiator to each process is empty. The state of channel from process pi to process pj is the set of messages whose sequence numbers are given by {RECDj [i] + 1, . . . , SENTi [j]}. Complexity: This algorithm requires 2n messages and 2 time units for recording and assembling the snapshot, where one time unit is required for the delivery of a message. If the contents of messages in channels state are required, the algorithm requires 2n messages and 2 time units additionally. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 27 / 51
  • 28. Distributed Computing: Principles, Algorithms, and Systems Channel state recording in Alagar-Venkatesan algorithm A message is referred to as old if the send of the message causally precedes the send of the token. Otherwise, the message is referred to as new. In Alagar-Venkatesan algorithm channel states are recorded as follows: 1 When a process receives the token, it takes its snapshot, initializes the state of all channels to empty, and returns Done message to the initiator. Now onwards, a process includes a message received on a channel in the channel state only if it is an old message. 2 After the initiator has received Done message from all processes, it broadcasts a Terminate message. 3 A process stops the snapshot algorithm after receiving a Terminate message. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 28 / 51
  • 29. Distributed Computing: Principles, Algorithms, and Systems Algorithms Features Chandy- Baseline algorithm. Requires FIFO channels. O(e) messages Lamport to record snapshot and O(d) time. Spezialetti- supports concurrent initiators, efficient assembly Kearns and distribution of a snapshot. Assumes bidirectional channels. O(e) messages to record, O(rn2 ) messages to assemble and distribute snapshot. Lai-Yang Works for non-FIFO channels. Markers piggybacked on computation messages. Message history required to compute channel states. Li et al. Small message history needed as channel states are computed incrementally. Mattern No message history required. Termination detection required to compute channel states. Acharya- Requires causal delivery support, Centralized computation of channel Badrinath states, Channel message contents need not be known. Requires 2n messages, 2 time units. Alagar-Venkatesan Requires causal delivery support, Distributed computation of channel states. Requires 3n messages, 3 time units, small messages. n = # processes, u = # edges on which messages were sent after previous snapshot, e = # channels, d is the diameter of the network, r = # concurrent initiators. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 29 / 51
  • 30. Distributed Computing: Principles, Algorithms, and Systems Necessary and sufficient conditions for consistent global snapshots Many applications require that local process states are periodically recorded and analyzed during execution or post martem. A saved intermediate state of a process during its execution is called a local checkpoint of the process. A consistent snapshot consists of a set of local states that occurred concurrently or had a potential to occur simultaneously. Processes take checkpoints asynchronously. The i th (i ≥ 0) checkpoint of process pp is assigned the sequence number i and is denoted by Cp,i . We assume that each process takes an initial checkpoint before execution begins and takes a virtual checkpoint after execution ends. The i th checkpoint interval of process pp consists of all the computation performed between its (i − 1)th and i th checkpoints (and includes the (i − 1)th checkpoint but not i th ). A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 30 / 51
  • 31. Distributed Computing: Principles, Algorithms, and Systems Even if two local checkpoints do not have a causal path between them, they may not belong to the same consistent global snapshot. An Example: Consider the execution shown in the Figure 4.2. Although neither of the checkpoints C1,1 and C3,2 happened before the other, they cannot be grouped together with a checkpoint on process p2 to form a consistent global snapshot. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 31 / 51
  • 32. Distributed Computing: Principles, Algorithms, and Systems C 1,0 C 1,1 C 1,2 p1 m1 m3 m5 C 2,0 C 2,1 C 2,2 C 2,3 p2 m m4 m6 C 3,0 2 C 3,1 C3,2 C 3,3 p3 Legend: A checkpoint Figure 4.2: An Illustration of zigzag paths. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 32 / 51
  • 33. Distributed Computing: Principles, Algorithms, and Systems Necessary and sufficient conditions for consistent global snapshots To describe the necessary and sufficient conditions for a consistent snapshot, Netzer and Xu defined a generalization of the Lamport’s happen before relation, called a zigzag path. A zigzag path between two checkpoints is a causal path, however, a zigzag path allows a message to be sent before the previous one in the path is received. In the Figure 4.2 although a causal path does not exist from C1,1 to C3,2 , a zigzag path does exist from C1,1 to C3,2 . This zigzag path means that no consistent snapshot exists in this execution that contains both C1,1 and C3,2 . A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 33 / 51
  • 34. Distributed Computing: Principles, Algorithms, and Systems Zigzag paths and consistent global snapshots Definition A zigzag path exists from a checkpoint Cx,i to a checkpoint Cy,j iff there exists messages m1 , m2 , ...mn (n≥1) such that 1 m1 is sent by process px after Cx,i . 2 If mk (1≤k≤n) is received by process pz , then mk+1 is sent by pz in the same or a later checkpoint interval (although mk+1 may be sent before or after mk is received), and 3 mn is received by process py before Cy,j . Example: In the Figure 4.2 a zigzag path exists from C1,1 to C3,2 due to messages m3 and m4 . A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 34 / 51
  • 35. Distributed Computing: Principles, Algorithms, and Systems Zigzag cycle Definition A checkpoint C is involved in a zigzag cycle iff there is a zigzag path from C to itself. In Figure 4.3, C2,1 is on a zigzag cycle formed by messages m1 and m2 . C 1,0 C 1,1 C1,2 p1 m3 m1 C 2,0 C 2,1 m2 C 2,2 C 2,3 p 2 m 4 C 3,0 C 3,1 C3,2 p 3 Figure 4.3: A zigzag cycle, inconsistent snapshot, and consistent snapshot. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 35 / 51
  • 36. Distributed Computing: Principles, Algorithms, and Systems Difference between a zigzag path and a causal path A causal path exists from a checkpoint A to another checkpoint B iff there is chain of messages starting after A and ending before B such that each message is sent after the previous one in the chain is received. A zigzag path consists of such a message chain, however, a message in the chain can be sent before the previous one in the chain is received, as long as the send and receive are in the same checkpoint interval. Thus a causal path is always a zigzag path, but a zigzag path need not be a causal path. Another difference between a zigzag path and a causal path is that a zigzag path can form a cycle but a causal path never forms a cycle. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 36 / 51
  • 37. Distributed Computing: Principles, Algorithms, and Systems Consistent global snapshots Netzer and Xu proved that if no zigzag path (or cycle) exists between any two checkpoints from a set S of checkpoints, then a consistent snapshot can be formed that includes the set S of checkpoints and vice versa. The absence of a causal path between checkpoints in a snapshot corresponds to the necessary condition for a consistent snapshot, and the absence of a zigzag path between checkpoints in a snapshot corresponds to the necessary and sufficient conditions for a consistent snapshot. A set of checkpoints S can be extended to a consistent snapshot if and only if no checkpoint in S has a zigzag path to any other checkpoint in S. A checkpoint can be a part of a consistent snapshot if and only if it is not invloved in a Z-cycle. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 37 / 51
  • 38. Distributed Computing: Principles, Algorithms, and Systems Finding consistent global snapshots in a distributed computation Discuss how individual local checkpoints can be combined with those from other processes to form global snapshots that are consistent. Manivannan-Netzer-Singhal analyzed the set of all consistent snapshots that can be built from a set of checkpoints S. Definition Let A, B be individual checkpoints and R, S be sets of checkpoints. Let be a relation defined over checkpoints and sets of checkpoints such that 1 A B iff a Z-path exists from A to B, 2 A S iff a Z-path exists from A to some member of S, 3 S A iff a Z-path exists from some member of S to A, and 4 R S iff a Z-path exists from some member of R to some member of S. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 38 / 51
  • 39. Distributed Computing: Principles, Algorithms, and Systems S S defines that no Z-path (including Z-cycle) exists from any member of S to any other member of S and implies that checkpoints in S are all from different processes. In this notations, the results of Netzer and Xu can be expressed as follows: Theorem 4.1: A set of checkpoints S can be extended to a consistent global snapshot if and only if S S. Corollary 4.1 A checkpoint C can be part of a consistent global snapshot if and only if it is not involved in a Z-cycle. Corollary 4.2 A set of checkpoints S is a consistent global snapshot if and only if S S and |S| = N, where N is the number of processes. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 39 / 51
  • 40. Distributed Computing: Principles, Algorithms, and Systems Finding consistent global snapshots Given a set S of checkpoints such that S S, we first discuss what checkpoints from other processes can be combined with S to build a consistent global snapshot. First Observation: None of the checkpoints that have a Z-path to or from any of the checkpoints in S can be used. Only those checkpoints that have no Z-paths to or from any of the checkpoints in S are candidates for inclusion in the consistent snapshot. We call the set of all such candidates the Z-cone of S and all checkpoints that have no causal path to or from any checkpoint in S the C-cone of S. A causal path is always Z-path, the Z-cone of S is a subset of the C-cone of S for an arbitrary S as shown in the Figure 4.4 A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 40 / 51
  • 41. Distributed Computing: Principles, Algorithms, and Systems S Edge of C − cone Edges of Z − cone Edge of C − cone Z − paths to S Z − unordered with S Z − paths from S ( Z − cone) Casual paths to S Casually unordered with S Casual paths from S (C − cone) Figure 4.4: The Z-cone and the C-cone associated with a set of checkpoints S. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 41 / 51
  • 42. Distributed Computing: Principles, Algorithms, and Systems Second Observation: Although candidates for building a consistent snapshot from S must lie in the Z-cone of S, not all checkpoints in the Z-cone can form a consistent snapshot with S. If a checkpoint in the Z-cone is involved in a Z-cycle, then it cannot be part of a consistent snapshot. Definition Let S be a set of checkpoints such that S S. Then, for each process pq , the q set Suseful is defined as q Suseful = {Cq,i | (S Cq,i ) ∧ (Cq,i S) ∧ (Cq,i Cq,i )}. In addition, we define q Suseful = Suseful . q A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 42 / 51
  • 43. Distributed Computing: Principles, Algorithms, and Systems Lemma 4.1 “Let S be a set of checkpoints such that S S. Let Cq,i be any checkpoint of process pq such that Cq,i ∈ S. Then S ∪ {Cq,i } can be extended to a consistent snapshot if and only if Cq,i ∈ Suseful .” Lemma 4.1 states that if we are given a set S such that S S, we are guaranteed that any single checkpoint from Suseful can belong to a consistent global snapshot that also contains S. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 43 / 51
  • 44. Distributed Computing: Principles, Algorithms, and Systems Third observation Although none of the checkpoints in Suseful has a Z-path to or from any checkpoint in S, Z-paths may exist between members of Suseful . One final constraint is placed on the set T we choose from Suseful to build a consistent snapshot from S: checkpoints in T must have no Z-paths between them. Furthermore, since S S, from Theorem 4.1 at least one such T must exist. Theorem 4.2: Let S be a set of checkpoints such that S S and let T be any set of checkpoints such that S ∩ T = ∅. Then, S ∪ T is a consistent global snapshot if and only if 1 T ⊆ Suseful , 2 T T , and 3 |S ∪ T | = N. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 44 / 51
  • 45. Distributed Computing: Principles, Algorithms, and Systems Manivannan-Netzer-Singhal algorithm for enumerating consistent snapshots An algorithm due to Manivannan-Netzer-Singhal that explicitly computes all consistent snapshots that include a given set a set of checkpoints S. 1: ComputeAllCgs(S) { 2: let G = ∅ 3: if S S then 4: let AllProcs be the set of all processes not represented in S 5: ComputeAllCgsFrom(S, AllProcs) 6: return G 7: } 8: ComputeAllCgsFrom(T , ProcSet) { 9: if (ProcSet = ∅) then 10: G = G ∪ {T } 11: else 12: let pq be any process in ProcSet q 13: for each checkpoint C ∈ Tuseful do 14: ComputeAllCgsFrom(T ∪ {C }, ProcSet {pq }) 15: } A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 45 / 51
  • 46. Distributed Computing: Principles, Algorithms, and Systems Manivannan-Netzer-Singhal algorithm The algorithm restricts its selection of checkpoints to those within the Z-cone of S and it checks for the presence of Z-cycles within the Z-cone. Correctness: The following theorem argues the correctness of the algorithm. Theorem 4.3: Let S be a set of checkpoints and G be the set returned by ComputeAllCgs(S). If S S, then T ∈ G if and only if T is a consistent snapshot containing S. That is, G contains exactly the consistent snapshots that contain S. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 46 / 51
  • 47. Distributed Computing: Principles, Algorithms, and Systems Finding Z-paths in a distributed computation Discuss a method for determining the existence of Z-paths between checkpoints in a distributed computation that has terminated or has stopped execution, using the rollback-dependency graph (R-graph). Definition The rollback-dependency graph of a distributed computation is a directed graph G = (V , E ), where the vertices V are the checkpoints of the distributed computation and an edge (Cp,i , Cq,j ) from checkpoint Cp,i to checkpoint Cq,j belongs to E if 1 p = q and j = i + 1, or 2 p = q and a message m sent from the i th checkpoint interval of pp is received by pq in its j th checkpoint interval (i, j > 0). A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 47 / 51
  • 48. Distributed Computing: Principles, Algorithms, and Systems Example of an R-graph Figure 4.6 shows shows the R-graph of the computation shown in Figure 4.5. In Figure 4.6, C1,3 , C2,3 , and C3,3 represent the volatile checkpoints, the checkpoints representing the last state the process attained before terminating. We denote the fact that there is a path from C to D in the R-graph by rd C D. It only denotes the existence of a path; it does not specify any particular path. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 48 / 51
  • 49. Distributed Computing: Principles, Algorithms, and Systems C 1,0 C 1,1 C 1,2 p1 m1 m3 C 2,0 C 2,1 m4 C 2,2 p2 m m5 m6 C 3,0 2 C 3,1 C3,2 p3 Figure 4.5: A distributed computation. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 49 / 51
  • 50. Distributed Computing: Principles, Algorithms, and Systems C C 1,1 C C 1,0 1,2 1,3 C 2,0 C C 2,3 2,1 Volatile C 2,2 checkpoints C 3,0 C 3,1 C 3,2 C 3,3 Figure 4.6: The R-graph of the computation in Figure 4.5. A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 50 / 51
  • 51. Distributed Computing: Principles, Algorithms, and Systems Example of an R-graph The following theorem establishes the correspondence between the paths in the R-graph and the Z-paths between checkpoints. Theorem: Let G = (V , E ) be the R-graph of a distributed computation. Then, for any two checkpoints Cp,i and Cq,j , Cp,i Cq,j if and only if 1 p = q and i < j, or rd 2 Cp,i +1 Cq,j in G (note that in this case p could still be equal to q). Examples: In Figure 4.5, a zigzag path exists from C1,1 to C3,1 because in the rd corresponding R-graph, shown in Figure 4.6, C1,2 C3,1 . Likewise, C2,1 is on a Z-cycle because in the corresponding R-graph, shown in rd Figure 4.6, C2,2 C2,1 . A. Kshemkalyani and M. Singhal (Distributed Computing) lobal State and Snapshot Recording Algorithms G CUP 2008 51 / 51