SlideShare a Scribd company logo
1 of 11
Download to read offline
International Journal of Computer Engineering (IJCET), ISSN 0976 – 6367(Print),
 International Journal of Computer Engineering and Technology
 ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME
and Technology (IJCET), ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online) Volume 1
                                                                     IJCET
Number 1, May - June (2010), pp. 46-56                                 ©IAEME
© IAEME, http://www.iaeme.com/ijcet.html


  A COMPARATIVE ANALYSIS OF MINIMUM-PROCESS
  COORDINATED CHECKPOINTING ALGORITHMS FOR
                   MOBILE DISTRIBUTED SYSTEMS
                                         Preeti gupta
                                     Singhania University
                                    Pacheri Bari, Rajasthan

                                     Parveen kumar
                      Meerut Institute of Engineering & Technology
                         Meerut, Email:pk223475@yahoo.com

                                   Anil kumar solanki
                      Meerut Institute of Engineering & Technology
                                       Meerut, India
 ABSTRACT
        Fault Tolerance Techniques enable systems to perform tasks in the presence of
 faults. A checkpoint is a local state of a process saved on stable storage. While dealing
 with Mobile Distributed systems, we come across some issues like: mobility, low
 bandwidth of wireless channels and lack of stable storage on mobile nodes,
 disconnections, limited battery power and high failure rate of mobile nodes.       These
 issues make traditional check pointing techniques designed for Distributed systems
 unsuitable for Mobile environments. To take a checkpoint, an MH has to transfer a large
 amount of checkpoint data to its local MSS over the wireless network. Since the wireless
 network has low bandwidth and MHs have low computation power, all-process check
 pointing will waste the scarce resources of the mobile system on every checkpoint.
 Minimum-process coordinated check pointing is a preferred approach for mobile
 distributed systems. In this paper, we discuss various existing minimum-process check
 pointing protocols for mobile distributed systems.
 Keywords: Check pointing algorithms; parallel & distributed computing; rollback
 recovery; fault-tolerant systems



                                              46
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


1. INTRODUCTION
          Parallel computing with clusters of workstations is being used extensively as they
are cost-effective and scalable, and are able to meet the demands of high performance
computing. Increase in the number of components in such systems increases the failure
probability. It is, thus, necessary to examine both hardware and software solutions to
ensure fault tolerance of such parallel computers. To provide fault tolerance, it is
essential to understand the nature of the faults that occur in these systems. There are
mainly two kinds of faults: permanent and transient. Permanent faults are caused by
permanent damage to one or more components and transient faults are caused by changes
in environmental conditions. Permanent faults can be rectified by repair or replacement
of components. Transient faults remain for a short duration of time and are difficult to
detect and deal with. Hence it is necessary to provide fault tolerance particularly for
transient failures in parallel computers. Fault-tolerant techniques enable a system to
perform tasks in the presence of faults. It is easier and more cost effective to provide
software fault tolerance solutions than hardware solutions to cope with transient failures
[1, 2].
          Local checkpoint is the saved state of a process at a processor at a given instance.
Global checkpoint is a collection of local checkpoints, one from each process. A global
state is said to be “consistent” if it contains no orphan message; i.e., a message whose
receive event is recorded, but its send event is lost. A transit message is a message whose
send event has been recorded by the sending process but whose receive event has not
been recorded by the receiving process [1, 7].
          The problem of taking a checkpoint in a message passing distributed system is
quite complex because any arbitrary set of checkpoints cannot be used for              recovery [9].
This is due to the fact that the set of checkpoints used for recovery must form a consistent
global state.
          Checkpointing is classified into following categories:
    •     Asynchronous/Uncoordinated Checkpointing
    •     Synchronous/Coordinated Checkpointing
    •     Quasi-Synchronous or Communication-induced Checkpointing



                                                 47
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


    •   Message Logging based Checkpointing
        The problem of taking a checkpoint in a message passing distributed system is
quite complex because any arbitrary set of checkpoints cannot be used for                  recovery.
This is due to the fact that the set of checkpoints used for recovery must form a consistent
global state.
1.1 Asynchronous/ Checkpointing
        Under the asynchronous approach, checkpoints at each process are taken
independently without any synchronization among the processes. Because of absence of
synchronization, there is no guarantee that a set of local checkpoints taken will be a
consistent set of checkpoints. Thus, a recovery algorithm has to search for the most recent
consistent set of checkpoints before it can initiate recovery [8], [9].
1.2 Synchronous / Coordinated Checkpointing
        In coordinated or synchronous checkpointing, processes coordinate their local
checkpointing actions such that the set of all recent checkpoints in the system is
guaranteed to be consistent [add reference list……]. In case of a fault, every process
restarts from its most recent permanent/committed checkpoint. Hence, this approach
simplifies recovery and it does not suffer from domino-effect. Furthermore, coordinated
checkpointing requires each process to maintain only one permanent checkpoint on stable
storage, reducing storage overhead and eliminating the need for garbage collection. Its
main disadvantage is the large latency involved in output commits [15].
        A    straightforward     approach      to    coordinate   checkpointing      is   to     block
communications while the checkpointing process executes. A coordinator takes a
checkpoint and broadcasts a request message to all processes, asking them to take a
checkpoint. When a process receives a message, it stops its execution, flushes all the
communication channels, takes a tentative checkpoint, and sends an acknowledgement
message back to the coordinator. After the coordinator receives acknowledgement from
all processes, it broadcasts a commit message that completes the two phase checkpointing
protocol. After receiving the commit message, each process receives the old permanent
checkpoint and makes the tentative checkpoint permanent. The process is then free to
resume execution and exchange messages with other processes.                      The coordinated



                                                    48
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


checkpointing algorithms can also be classified into following two categories: minimum-
process and all process algorithms. In all-process coordinated checkpointing algorithms,
every process is required to take its checkpoint in an initiation [7], [8]. In minimum-
process algorithms, minimum interacting processes are required to take their checkpoints
in an initiation.
        The coordinated checkpointing protocols can be classified into two types:
blocking and non-blocking. In blocking algorithms, as mentioned above, some blocking
of processes takes place during checkpointing [4].              In non-blocking algorithms, no
blocking of processes is required for checkpointing [5].
        In a centralized algorithm like Chandy-lamport [7], there is one node which
always initiates the checkpoints and coordinates the participating nodes. The
disadvantage of a centralized algorithm is that all nodes have to initiate checkpoints
whenever the centralized node decides to checkpoint. Nodes can be given autonomy in
initiating checkpoints by allowing any node in the system to initiate checkpoints.
1. 3 Mobile Distributed Computing System
        A mobile computing system consists of a large number of MHs and relatively
fewer MSSs. The distributed computation we consider consists of n spatially separated
sequential processes denoted by P0, P1, ..., Pn-1, running on fail-stop MHs or on MSSs.
Each MH or MSS has one process running on it. The static network provides reliable,
sequenced delivery of messages between any two MSSs, with arbitrary message latency.
The wireless network within a cell ensures FIFO delivery of messages between an MSS
and a local MH, i.e., there exists a FIFO channel from an MH to its local MSS, and
another FIFO channel from the MSS to the MH. If an MH did not leave the cell, then
every message sent to it from the local MSS would be received in the sequence in which
they are sent [2].
        To send a message from an MH h1 to another MH h2, h1 first sends the message to
its local MSS over the wireless network. This MSS then forwards the message to the
local MSS of h2 which forwards it to h2 over its local wireless network. Since the location
of an MH within the network is neither fixed nor universally known to the network,
because, an MH switches cells frequently. The local MSS of h1 needs to first determine



                                                 49
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


the MSS that currently serves h2. This is essentially the problem that has been tackled
through a variety of routing protocols at the network layer. Hence, the cost incurred for
routing and delivering a message to an MH, varies with the specific routing protocol
being used. Here, we have assumed that any message destined for an MH incurs a fixed
search cost [2].
1.4 Checkpointing Issues in Mobile Distributed Computing Systems
        The existence of mobile nodes in a distributed system introduces new issues that
need proper handling while designing a checkpointing algorithm for such systems. These
issues are mobility, disconnections, finite power source, vulnerable to physical damage,
lack of stable storage etc. [2]. The location of an MH within the network, as represented
by its current local MSS, changes with time. Checkpointing schemes that send control
messages to MH’s, will need to first locate the MH within the network, and thereby incur
a search overhead [2]. Due to vulnerability of mobile computers to catastrophic failures,
disk storage of an MH is not acceptably stable for storing message logs or local
checkpoints. Checkpointing schemes must therefore, rely on an alternative stable
repository for an MH’s local checkpoint [2]. Disconnections of one or more MHs should
not prevent recording the global state of an application executing on MHs. It should be
noted that disconnection of an MH is a voluntary operation, and frequent disconnections
of MHs is an expected feature of the mobile computing environments [2]. The battery at
the MH has limited life. To save energy, the MH can power down individual components
during periods of low activity [2]. This strategy is referred to as the doze mode operation.
An MH in doze mode is awakened on receiving a message. Therefore, energy
conservation and low bandwidth constraints require the checkpointing algorithms to
minimize the number of synchronization messages and the number of checkpoints.
2 SOME MINIMUM-PROCESS COORDINATED CHECKPOINTING
PROTOCOLS FOR MOBILE DISTRIBUTED SYSTEMS
2.1 GUOHONG CAO AND MUKESH SINGHAL ALGORITHM [5]
        Cao and Singhal achieved non-intrusiveness in the minimum-process algorithm
by introducing the concept of mutable checkpoints. In their algorithm, initiator, say Pin,
sends the checkpoint request to any process, say Pj, only if Pin receives m from Pj in the



                                                 50
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


current CI. Pj takes its tentative checkpoint if Pj has sent m to Pin in the current CI;
otherwise, Pj concludes that the checkpoint request is a useless one. Similarly, when Pj
takes its tentative checkpoint, it propagates the checkpoint request to other processes.
This process is continued till the checkpoint request reaches all the processes on which
the initiator transitively depends and a checkpointing tree is formed. During
checkpointing, if Pi receives m from Pj such that Pj has taken some checkpoint in the
current initiation before sending m, Pi may be forced to take a checkpoint, called mutable
checkpoint. If Pi is not in the minimum set, its mutable checkpoint is useless and is
discarded on commit. The huge data structure MR[] is also attached with the checkpoint
requests to reduce the number of useless checkpoint requests. The response from each
process is sent directly to initiator.
2.2 KUMAR AND KUMAR ALGORITHM [14]
        They proposed an algorithm which is based on keeping track of direct
dependencies of processes. Initiator MSS collects the direct dependency vectors of all
processes, computes the tentative minimum set (minimum set or its subset), and sends the
checkpoint request along with the tentative minimum set to all MSSs. This step is taken
to reduce the time to collect the coordinated checkpoint. It will also reduce the number of
useless checkpoints and the blocking of the processes. Suppose, during the execution of
the checkpointing algorithm, Pi takes its checkpoint and sends m to Pj. Pj receives m
such that it has not taken its checkpoint for the current initiation and it does not know
whether it will get the checkpoint request. If Pj takes its checkpoint after processing m, m
will become orphan. In order to avoid such orphan messages, they propose the following
technique. If Pj has sent at least one message to a process, say Pk and Pk is in the
tentative minimum set, there is a good probability that Pj will get the checkpoint request.
Therefore, Pj takes its induced checkpoint before processing m. An induced checkpoint is
similar to the mutable checkpoint [18]. In this case, most probably, Pj will get the
checkpoint request and its induced checkpoint will be converted into permanent one.
There is a less probability that Pj will not get the checkpoint request and its induced
checkpoint will be discarded. Alternatively, if there is not a good probability that Pj will
get the checkpoint request, Pj buffers m till it takes its checkpoint or receives the commit
message. They have tried to minimise the number of useless checkpoints and blocking of


                                                 51
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


the process by using the probabilistic approach and buffering selective messages at the
receiver end. Exact dependencies among processes are maintained. It abolishes the
useless checkpoint requests and reduces the number of duplicate checkpoint requests.
2.3 SILVA AND SILVA ALGORITHM [15]
        The authors proposed all process coordinated checkpointing protocol for
distributed systems. The non-intrusiveness during checkpointing is achieved by
piggybacking monotonically increasing checkpoint number along with computational
message. When a process receives a computational message with the high checkpoint
number, it takes its checkpoint before processing the message. When it actually gets the
checkpoint request from the initiator, it ignores the same. If each process of the
distributed program is allowed to initiate the checkpoint operation, the network may be
flooded with control messages and process might waste their time making unnecessary
checkpoints. In order to avoid this, Silva and Silva give the key to initiate checkpoint
algorithm to one process. The checkpoint event is triggered periodically by a local timer
mechanism. When this timer expires, the initiator process the checkpoint state of process
running in the machine and force all the others to take checkpoint by sending a broadcast
message. The interval between adjacent checkpoints is called checkpoint interval.
2.4 KOO-TOUEG’S ALGORITHM [10]
        They proposed a minimum process blocking checkpointing algorithm for
distributed systems. The algorithm consists of two phases. During the first phase, the
checkpoint initiator identifies all processes with which it has communicated since the last
checkpoint and sends them a request. Upon receiving the request, each process in turn
identifies all process it has communicated with since the last checkpoint and sends them a
request, and so on, until no more processes can be identified. During the second phase, all
process identified in the first phase take a checkpoint. The result is a consistent
checkpoint that involves only the participating processes. In this protocol, after a process
takes a checkpoint, it cannot send any message until the second phase terminates
successfully, although receiving messages after the checkpoint is permissible.
2.5 COA AND SINGHAL BLOCKING ALGORITHM [4]
        They presented a minimum process checkpointing algorithm in which, the
dependency information is recorded by a Boolean vector. This algorithm is a two-phase


                                                 52
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


protocol and saves two kinds of checkpoint on the stable storage. In the first phase, the
initiator sends a request to all processes to send their dependency vector. On receiving the
request, each process sends its dependency vector. Having received all the dependency
vectors, the initiator constructs an N*N dependency matrix with one row per process,
represented by the dependency vector of the process, Based on the dependency matrix,
the initiator can locally calculate all the process on which the initiator transitively depend.
After the initiator finds all the process that need to take their checkpoints, it adds them to
the set Sforced and asks them to take checkpoints. Any process receiving a checkpoint
request takes the checkpoint and sends a reply. The process has to be blocked after
receiving the dependency vectors request and resumes its computation after receiving a
checkpoint request.
2.6 PRAKASH-SINGHAL ALGORITHM [16]
        It forces only a minimum number of processes to take checkpoints and does not
block the underlying computation during checkpointing. Prakash Singhal algorithm
forces only part of processes to take checkpoints, the csn of some processes may be out-
of-date, and may not be able to avoid inconsistencies [8]. Prakash-Singhal algorithm
attempts to solve this problem by having each process maintains an array to save the csn,
where csni[i] has been the expected csn of Pi. Note that Pi’s csnj[i] may be different from
Pj's csnj [i] if there is no communication between Pi and Pj for several checkpoint
intervals. By using csn and the initiator identification number, they claim that their non-
blocking algorithm can avoid inconsistencies and minimize the number of checkpoints
during checkpointing.
2.7. P. KUMAR’S HYBRID ALGORITHM [13]
        In minimum-process coordinated checkpointing, some processes may not
checkpoint for several checkpoint initiations. In the case of a recovery after a fault, such
processes may rollback to far earlier checkpointed state and thus may cause greater loss
of computation. In all-process coordinated checkpointing, the recovery line is advanced
for all processes but the checkpointing overhead may be exceedingly high. To optimize
both matrices, the checkpointing overhead and the loss of computation on recovery,
P.Kumar proposed a hybrid checkpointing algorithm, wherein an all-process coordinated
checkpoint is taken after the execution of minimum-process coordinated checkpointing


                                                 53
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


algorithm for a fixed number of times. Thus, the Mobile nodes with low activity or in
doze mode operation may not be disturbed in the case of minimum-process checkpointing
and the recovery line is advanced for each process after an all-process checkpoint.
Additionally, he tried to minimize the information piggybacked onto each computation
message. For minimum-process checkpointing, he designed a blocking algorithm, where
no useless checkpoints are taken and an effort has been made to optimize the blocking of
processes. He proposed to delay selective messages at the receiver end. By doing so,
processes are allowed to perform their normal computation, send messages and partially
receive them during their blocking period. The proposed minimum-process blocking
algorithm forces zero useless checkpoints at the cost of very small blocking.
3. CONCLUSION
        The existence of mobile nodes in a distributed system introduces new issues that
need proper handling while designing a checkpointing algorithm for such systems. These
issues are mobility, disconnections, finite power source, vulnerable to physical damage,
lack of stable storage, etc. The new issues make traditional checkpointing techniques
unsuitable to checkpoint mobile distributed systems. A good checkpointing protocol for
mobile distributed systems should have low memory overheads on MHs, low overheads
on wireless channels and should avoid awakening of an MH in doze mode operation. The
disconnection of an MH should not lead to infinite wait state. The algorithm should be
non-intrusive and should force minimum number of processes to take their local
checkpoints.
        Minimum-process coordinated checkpointing is a suitable approach to introduce
fault tolerance in mobile distributed systems transparently. This approach is domino-free,
requires at most two checkpoints of a process on stable storage, and forces only a
minimum number of processes to checkpoint. It may require blocking of processes, extra
synchronization messages, piggybacking of some information along with computation
messages, or taking some useless checkpoints.




                                                 54
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        In this paper we have given introductory concepts related to checkpointing in
mobile distributed systems. We also gave an analysis of various minimum-process
checkpointing schemes especially designed for mobile distributed systems.
REFERENCES
[1] Singhal, M, N G Shivratri, “Advanced concepts in Operating Systems, Tata Mc Graw
    Hill, 1994.
[2] Acharya A., “Structuring Distributed Algorithms and Services for networks with
    Mobile Hosts”, Ph.D. Thesis, Rutgers University, 1995.
[3] Cao G. and Singhal M., “On coordinated checkpointing in Distributed Systems”,
    IEEE Transactions on Parallel and Distributed Systems, vol. 9, no.12, pp. 1213-1225,
    Dec 1998.
[4] Cao G. and Singhal M., “On the Impossibility of Min-process Non-blocking
    Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing
    Systems,” Proceedings of International Conference on Parallel Processing, pp. 37-
    44, August 1998.
[5] Cao G. and Singhal M., “Mutable Checkpoints: A New Checkpointing Approach for
    Mobile Computing systems,” IEEE Transaction On Parallel and Distributed Systems,
    vol. 12, no. 2, pp. 157-172, February 2001.
[6] Cao G. and Singhal M., “Checkpointing with Mutable Checkpoints”, Theoretical
    Computer Science, 290(2003), pp. 1127-1148.
[7] Chandy K. M. and Lamport L., “Distributed Snapshots: Determining Global State of
    Distributed Systems,” ACM Transaction on Computing Systems, vol. 3, No. 1, pp. 63-
    75, February 1985.
[8] Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., “A Survey of Rollback-
    Recovery Protocols in Message-Passing Systems,” ACM Computing Surveys, vol. 34,
    no. 3, pp. 375-408, 2002.
[9] Kalaiselvi, S., Rajaraman, V., “A Survey of Checkpointing Algorithms for Parallel
    and Distributed Systems”, Sadhna, Vol. 25, Part 5, October 2000, pp. 489-510.




                                                 55
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


[10] Koo R. and Toueg S., “Checkpointing and Roll-Back Recovery for Distributed
    Systems,” IEEE Trans. on Software Engineering, vol. 13, no. 1, pp. 23-31, January
    1987.
[11] Parveen Kumar, Lalit Kumar, R K Chauhan, V K Gupta “A Non-Intrusive Minimum
    Process Synchronous Checkpointing Protocol for Mobile Distributed Systems”
    Proceedings of IEEE ICPWC-2005, January 2005.
[12] Pushpendra Singh, Gilbert Cabillic, “A Checkpointing Algorithm for Mobile
    Computing Environment”, LNCS, No. 2775, pp 65-74, 2003.
[13] Parveen Kumar, “A Low-Cost              Hybrid Coordinated Checkpointing Protocol for
    Mobile Distributed Systems”, Mobile Information Systems [An International Journal
    from IOS Press, Netherlands] pp 13-32, Vol. 4, No. 1, 2007.
[14] Lalit Kumar, Parveen Kumar “A Synchronous Checkpointing Protocol for Mobile
    Distributed Systems:          A Probabilistic Approach”, International Journal of
    Information and Computer Security [An International Journal from Inderscience
    Publishers, USA], pp 298-314, Vol. 3 No. 1, 2007.
[15] Silva L, Silva J 1992 Global checkpointing for distributed programs. Proc. IEEE
    11th Symp. On Reliable Distributed Syst. pp 155-162.
[16] R. Prakash and M. Singhal. “Low-Cost Checkpointing and Failure Recovery in
    Mobile      Computing Systems”. IEEE Trans. on Parallel and Distributed System,
    pages 1035-1048,Oct. 1996.




                                                 56

More Related Content

What's hot

PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...
PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...
PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...cscpconf
 
resource management
  resource management  resource management
resource managementAshish Kumar
 
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...ijics
 
Crypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over NetworkingCrypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over NetworkingIRJET Journal
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET Journal
 
Design of Predicate Filter for Predicated Branch Instructions
Design of Predicate Filter for Predicated Branch InstructionsDesign of Predicate Filter for Predicated Branch Instructions
Design of Predicate Filter for Predicated Branch InstructionsIOSR Journals
 
DETECTING NETWORK ANOMALIES USING CUSUM and FCM
DETECTING NETWORK ANOMALIES USING CUSUM and FCMDETECTING NETWORK ANOMALIES USING CUSUM and FCM
DETECTING NETWORK ANOMALIES USING CUSUM and FCMEditor IJMTER
 
IRJET- Wavelet Decomposition along with ANN used for Fault Detection
IRJET- Wavelet Decomposition along with ANN used for Fault DetectionIRJET- Wavelet Decomposition along with ANN used for Fault Detection
IRJET- Wavelet Decomposition along with ANN used for Fault DetectionIRJET Journal
 
Developing fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDeveloping fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDr Amira Bibo
 
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...IJECEIAES
 

What's hot (13)

PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...
PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...
PREDICTIVE DETECTION OF KNOWN SECURITY CRITICALITIES IN CYBER PHYSICAL SYSTEM...
 
resource management
  resource management  resource management
resource management
 
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
 
Crypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over NetworkingCrypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
 
50120140506008
5012014050600850120140506008
50120140506008
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
 
Design of Predicate Filter for Predicated Branch Instructions
Design of Predicate Filter for Predicated Branch InstructionsDesign of Predicate Filter for Predicated Branch Instructions
Design of Predicate Filter for Predicated Branch Instructions
 
DETECTING NETWORK ANOMALIES USING CUSUM and FCM
DETECTING NETWORK ANOMALIES USING CUSUM and FCMDETECTING NETWORK ANOMALIES USING CUSUM and FCM
DETECTING NETWORK ANOMALIES USING CUSUM and FCM
 
IRJET- Wavelet Decomposition along with ANN used for Fault Detection
IRJET- Wavelet Decomposition along with ANN used for Fault DetectionIRJET- Wavelet Decomposition along with ANN used for Fault Detection
IRJET- Wavelet Decomposition along with ANN used for Fault Detection
 
1762 1765
1762 17651762 1765
1762 1765
 
419 426
419 426419 426
419 426
 
Developing fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDeveloping fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systems
 
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
 

Viewers also liked

A role of innovative idea management in hrm
A role of innovative idea management in hrmA role of innovative idea management in hrm
A role of innovative idea management in hrmiaemedu
 
Internationalization from sme perspective with reference
Internationalization from sme perspective with referenceInternationalization from sme perspective with reference
Internationalization from sme perspective with referenceiaemedu
 
Healthcare management status of indian states
Healthcare management status of indian statesHealthcare management status of indian states
Healthcare management status of indian statesiaemedu
 
Defect effort prediction models in software maintenance projects
Defect  effort prediction models in software maintenance projectsDefect  effort prediction models in software maintenance projects
Defect effort prediction models in software maintenance projectsiaemedu
 
Voltage recovery of induction generator using indirect torque control method
Voltage recovery of induction generator using indirect torque control methodVoltage recovery of induction generator using indirect torque control method
Voltage recovery of induction generator using indirect torque control methodiaemedu
 
Knowledge management strategies in higher
Knowledge management strategies in higherKnowledge management strategies in higher
Knowledge management strategies in higheriaemedu
 
Strategies for service characteristics of star hotel in chennai
Strategies for service characteristics of star hotel in chennaiStrategies for service characteristics of star hotel in chennai
Strategies for service characteristics of star hotel in chennaiiaemedu
 
Consumer behavior and preference towards hathway broadband internet an empir...
Consumer behavior and preference towards hathway broadband internet  an empir...Consumer behavior and preference towards hathway broadband internet  an empir...
Consumer behavior and preference towards hathway broadband internet an empir...iaemedu
 

Viewers also liked (8)

A role of innovative idea management in hrm
A role of innovative idea management in hrmA role of innovative idea management in hrm
A role of innovative idea management in hrm
 
Internationalization from sme perspective with reference
Internationalization from sme perspective with referenceInternationalization from sme perspective with reference
Internationalization from sme perspective with reference
 
Healthcare management status of indian states
Healthcare management status of indian statesHealthcare management status of indian states
Healthcare management status of indian states
 
Defect effort prediction models in software maintenance projects
Defect  effort prediction models in software maintenance projectsDefect  effort prediction models in software maintenance projects
Defect effort prediction models in software maintenance projects
 
Voltage recovery of induction generator using indirect torque control method
Voltage recovery of induction generator using indirect torque control methodVoltage recovery of induction generator using indirect torque control method
Voltage recovery of induction generator using indirect torque control method
 
Knowledge management strategies in higher
Knowledge management strategies in higherKnowledge management strategies in higher
Knowledge management strategies in higher
 
Strategies for service characteristics of star hotel in chennai
Strategies for service characteristics of star hotel in chennaiStrategies for service characteristics of star hotel in chennai
Strategies for service characteristics of star hotel in chennai
 
Consumer behavior and preference towards hathway broadband internet an empir...
Consumer behavior and preference towards hathway broadband internet  an empir...Consumer behavior and preference towards hathway broadband internet  an empir...
Consumer behavior and preference towards hathway broadband internet an empir...
 

Similar to Comparative Analysis of Minimum-Process Coordinated Checkpointing Algorithms

An efficient recovery mechanism
An efficient recovery mechanismAn efficient recovery mechanism
An efficient recovery mechanismijcsa
 
A minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithmA minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithmiaemedu
 
A minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithmA minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithmiaemedu
 
Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks
Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks   Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks
Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks pijans
 
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...CSCJournals
 
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...Neelamani Samal
 
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...Eswar Publications
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The CloudMonica Carter
 
An Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors RisksAn Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors Riskssipij
 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...ijcseit
 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...ijcseit
 
Survey on replication techniques for distributed system
Survey on replication techniques for distributed systemSurvey on replication techniques for distributed system
Survey on replication techniques for distributed systemIJECEIAES
 
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMRCHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMRijujournal
 
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMRCHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMRijujournal
 
Basic features of distributed system
Basic features of distributed systemBasic features of distributed system
Basic features of distributed systemsatish raj
 

Similar to Comparative Analysis of Minimum-Process Coordinated Checkpointing Algorithms (20)

An efficient recovery mechanism
An efficient recovery mechanismAn efficient recovery mechanism
An efficient recovery mechanism
 
A minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithmA minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithm
 
A minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithmA minimum process synchronous checkpointing algorithm
A minimum process synchronous checkpointing algorithm
 
Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks
Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks   Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks
Minimum Process Coordinated Checkpointing Scheme For Ad Hoc Networks
 
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
 
111 118
111 118111 118
111 118
 
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
 
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systems
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The Cloud
 
347 352
347 352347 352
347 352
 
50120130406041 2
50120130406041 250120130406041 2
50120130406041 2
 
An Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors RisksAn Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors Risks
 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
 
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
EXPOSURE AND AVOIDANCE MECHANISM OF BLACK HOLE AND JAMMING ATTACK IN MOBILE A...
 
Survey on replication techniques for distributed system
Survey on replication techniques for distributed systemSurvey on replication techniques for distributed system
Survey on replication techniques for distributed system
 
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMRCHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
 
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMRCHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR
 
50120140502006
5012014050200650120140502006
50120140502006
 
Basic features of distributed system
Basic features of distributed systemBasic features of distributed system
Basic features of distributed system
 

More from iaemedu

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech iniaemedu
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridiaemedu
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingiaemedu
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationiaemedu
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challengesiaemedu
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanismiaemedu
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationiaemedu
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cmaiaemedu
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presenceiaemedu
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineeringiaemedu
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobilesiaemedu
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacementiaemedu
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approachiaemedu
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentiaemedu
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationiaemedu
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksiaemedu
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyiaemedu
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryiaemedu
 

More from iaemedu (20)

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech in
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacement
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
 

Comparative Analysis of Minimum-Process Coordinated Checkpointing Algorithms

  • 1. International Journal of Computer Engineering (IJCET), ISSN 0976 – 6367(Print), International Journal of Computer Engineering and Technology ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME and Technology (IJCET), ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 1 IJCET Number 1, May - June (2010), pp. 46-56 ©IAEME © IAEME, http://www.iaeme.com/ijcet.html A COMPARATIVE ANALYSIS OF MINIMUM-PROCESS COORDINATED CHECKPOINTING ALGORITHMS FOR MOBILE DISTRIBUTED SYSTEMS Preeti gupta Singhania University Pacheri Bari, Rajasthan Parveen kumar Meerut Institute of Engineering & Technology Meerut, Email:pk223475@yahoo.com Anil kumar solanki Meerut Institute of Engineering & Technology Meerut, India ABSTRACT Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. While dealing with Mobile Distributed systems, we come across some issues like: mobility, low bandwidth of wireless channels and lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional check pointing techniques designed for Distributed systems unsuitable for Mobile environments. To take a checkpoint, an MH has to transfer a large amount of checkpoint data to its local MSS over the wireless network. Since the wireless network has low bandwidth and MHs have low computation power, all-process check pointing will waste the scarce resources of the mobile system on every checkpoint. Minimum-process coordinated check pointing is a preferred approach for mobile distributed systems. In this paper, we discuss various existing minimum-process check pointing protocols for mobile distributed systems. Keywords: Check pointing algorithms; parallel & distributed computing; rollback recovery; fault-tolerant systems 46
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME 1. INTRODUCTION Parallel computing with clusters of workstations is being used extensively as they are cost-effective and scalable, and are able to meet the demands of high performance computing. Increase in the number of components in such systems increases the failure probability. It is, thus, necessary to examine both hardware and software solutions to ensure fault tolerance of such parallel computers. To provide fault tolerance, it is essential to understand the nature of the faults that occur in these systems. There are mainly two kinds of faults: permanent and transient. Permanent faults are caused by permanent damage to one or more components and transient faults are caused by changes in environmental conditions. Permanent faults can be rectified by repair or replacement of components. Transient faults remain for a short duration of time and are difficult to detect and deal with. Hence it is necessary to provide fault tolerance particularly for transient failures in parallel computers. Fault-tolerant techniques enable a system to perform tasks in the presence of faults. It is easier and more cost effective to provide software fault tolerance solutions than hardware solutions to cope with transient failures [1, 2]. Local checkpoint is the saved state of a process at a processor at a given instance. Global checkpoint is a collection of local checkpoints, one from each process. A global state is said to be “consistent” if it contains no orphan message; i.e., a message whose receive event is recorded, but its send event is lost. A transit message is a message whose send event has been recorded by the sending process but whose receive event has not been recorded by the receiving process [1, 7]. The problem of taking a checkpoint in a message passing distributed system is quite complex because any arbitrary set of checkpoints cannot be used for recovery [9]. This is due to the fact that the set of checkpoints used for recovery must form a consistent global state. Checkpointing is classified into following categories: • Asynchronous/Uncoordinated Checkpointing • Synchronous/Coordinated Checkpointing • Quasi-Synchronous or Communication-induced Checkpointing 47
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME • Message Logging based Checkpointing The problem of taking a checkpoint in a message passing distributed system is quite complex because any arbitrary set of checkpoints cannot be used for recovery. This is due to the fact that the set of checkpoints used for recovery must form a consistent global state. 1.1 Asynchronous/ Checkpointing Under the asynchronous approach, checkpoints at each process are taken independently without any synchronization among the processes. Because of absence of synchronization, there is no guarantee that a set of local checkpoints taken will be a consistent set of checkpoints. Thus, a recovery algorithm has to search for the most recent consistent set of checkpoints before it can initiate recovery [8], [9]. 1.2 Synchronous / Coordinated Checkpointing In coordinated or synchronous checkpointing, processes coordinate their local checkpointing actions such that the set of all recent checkpoints in the system is guaranteed to be consistent [add reference list……]. In case of a fault, every process restarts from its most recent permanent/committed checkpoint. Hence, this approach simplifies recovery and it does not suffer from domino-effect. Furthermore, coordinated checkpointing requires each process to maintain only one permanent checkpoint on stable storage, reducing storage overhead and eliminating the need for garbage collection. Its main disadvantage is the large latency involved in output commits [15]. A straightforward approach to coordinate checkpointing is to block communications while the checkpointing process executes. A coordinator takes a checkpoint and broadcasts a request message to all processes, asking them to take a checkpoint. When a process receives a message, it stops its execution, flushes all the communication channels, takes a tentative checkpoint, and sends an acknowledgement message back to the coordinator. After the coordinator receives acknowledgement from all processes, it broadcasts a commit message that completes the two phase checkpointing protocol. After receiving the commit message, each process receives the old permanent checkpoint and makes the tentative checkpoint permanent. The process is then free to resume execution and exchange messages with other processes. The coordinated 48
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME checkpointing algorithms can also be classified into following two categories: minimum- process and all process algorithms. In all-process coordinated checkpointing algorithms, every process is required to take its checkpoint in an initiation [7], [8]. In minimum- process algorithms, minimum interacting processes are required to take their checkpoints in an initiation. The coordinated checkpointing protocols can be classified into two types: blocking and non-blocking. In blocking algorithms, as mentioned above, some blocking of processes takes place during checkpointing [4]. In non-blocking algorithms, no blocking of processes is required for checkpointing [5]. In a centralized algorithm like Chandy-lamport [7], there is one node which always initiates the checkpoints and coordinates the participating nodes. The disadvantage of a centralized algorithm is that all nodes have to initiate checkpoints whenever the centralized node decides to checkpoint. Nodes can be given autonomy in initiating checkpoints by allowing any node in the system to initiate checkpoints. 1. 3 Mobile Distributed Computing System A mobile computing system consists of a large number of MHs and relatively fewer MSSs. The distributed computation we consider consists of n spatially separated sequential processes denoted by P0, P1, ..., Pn-1, running on fail-stop MHs or on MSSs. Each MH or MSS has one process running on it. The static network provides reliable, sequenced delivery of messages between any two MSSs, with arbitrary message latency. The wireless network within a cell ensures FIFO delivery of messages between an MSS and a local MH, i.e., there exists a FIFO channel from an MH to its local MSS, and another FIFO channel from the MSS to the MH. If an MH did not leave the cell, then every message sent to it from the local MSS would be received in the sequence in which they are sent [2]. To send a message from an MH h1 to another MH h2, h1 first sends the message to its local MSS over the wireless network. This MSS then forwards the message to the local MSS of h2 which forwards it to h2 over its local wireless network. Since the location of an MH within the network is neither fixed nor universally known to the network, because, an MH switches cells frequently. The local MSS of h1 needs to first determine 49
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME the MSS that currently serves h2. This is essentially the problem that has been tackled through a variety of routing protocols at the network layer. Hence, the cost incurred for routing and delivering a message to an MH, varies with the specific routing protocol being used. Here, we have assumed that any message destined for an MH incurs a fixed search cost [2]. 1.4 Checkpointing Issues in Mobile Distributed Computing Systems The existence of mobile nodes in a distributed system introduces new issues that need proper handling while designing a checkpointing algorithm for such systems. These issues are mobility, disconnections, finite power source, vulnerable to physical damage, lack of stable storage etc. [2]. The location of an MH within the network, as represented by its current local MSS, changes with time. Checkpointing schemes that send control messages to MH’s, will need to first locate the MH within the network, and thereby incur a search overhead [2]. Due to vulnerability of mobile computers to catastrophic failures, disk storage of an MH is not acceptably stable for storing message logs or local checkpoints. Checkpointing schemes must therefore, rely on an alternative stable repository for an MH’s local checkpoint [2]. Disconnections of one or more MHs should not prevent recording the global state of an application executing on MHs. It should be noted that disconnection of an MH is a voluntary operation, and frequent disconnections of MHs is an expected feature of the mobile computing environments [2]. The battery at the MH has limited life. To save energy, the MH can power down individual components during periods of low activity [2]. This strategy is referred to as the doze mode operation. An MH in doze mode is awakened on receiving a message. Therefore, energy conservation and low bandwidth constraints require the checkpointing algorithms to minimize the number of synchronization messages and the number of checkpoints. 2 SOME MINIMUM-PROCESS COORDINATED CHECKPOINTING PROTOCOLS FOR MOBILE DISTRIBUTED SYSTEMS 2.1 GUOHONG CAO AND MUKESH SINGHAL ALGORITHM [5] Cao and Singhal achieved non-intrusiveness in the minimum-process algorithm by introducing the concept of mutable checkpoints. In their algorithm, initiator, say Pin, sends the checkpoint request to any process, say Pj, only if Pin receives m from Pj in the 50
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME current CI. Pj takes its tentative checkpoint if Pj has sent m to Pin in the current CI; otherwise, Pj concludes that the checkpoint request is a useless one. Similarly, when Pj takes its tentative checkpoint, it propagates the checkpoint request to other processes. This process is continued till the checkpoint request reaches all the processes on which the initiator transitively depends and a checkpointing tree is formed. During checkpointing, if Pi receives m from Pj such that Pj has taken some checkpoint in the current initiation before sending m, Pi may be forced to take a checkpoint, called mutable checkpoint. If Pi is not in the minimum set, its mutable checkpoint is useless and is discarded on commit. The huge data structure MR[] is also attached with the checkpoint requests to reduce the number of useless checkpoint requests. The response from each process is sent directly to initiator. 2.2 KUMAR AND KUMAR ALGORITHM [14] They proposed an algorithm which is based on keeping track of direct dependencies of processes. Initiator MSS collects the direct dependency vectors of all processes, computes the tentative minimum set (minimum set or its subset), and sends the checkpoint request along with the tentative minimum set to all MSSs. This step is taken to reduce the time to collect the coordinated checkpoint. It will also reduce the number of useless checkpoints and the blocking of the processes. Suppose, during the execution of the checkpointing algorithm, Pi takes its checkpoint and sends m to Pj. Pj receives m such that it has not taken its checkpoint for the current initiation and it does not know whether it will get the checkpoint request. If Pj takes its checkpoint after processing m, m will become orphan. In order to avoid such orphan messages, they propose the following technique. If Pj has sent at least one message to a process, say Pk and Pk is in the tentative minimum set, there is a good probability that Pj will get the checkpoint request. Therefore, Pj takes its induced checkpoint before processing m. An induced checkpoint is similar to the mutable checkpoint [18]. In this case, most probably, Pj will get the checkpoint request and its induced checkpoint will be converted into permanent one. There is a less probability that Pj will not get the checkpoint request and its induced checkpoint will be discarded. Alternatively, if there is not a good probability that Pj will get the checkpoint request, Pj buffers m till it takes its checkpoint or receives the commit message. They have tried to minimise the number of useless checkpoints and blocking of 51
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME the process by using the probabilistic approach and buffering selective messages at the receiver end. Exact dependencies among processes are maintained. It abolishes the useless checkpoint requests and reduces the number of duplicate checkpoint requests. 2.3 SILVA AND SILVA ALGORITHM [15] The authors proposed all process coordinated checkpointing protocol for distributed systems. The non-intrusiveness during checkpointing is achieved by piggybacking monotonically increasing checkpoint number along with computational message. When a process receives a computational message with the high checkpoint number, it takes its checkpoint before processing the message. When it actually gets the checkpoint request from the initiator, it ignores the same. If each process of the distributed program is allowed to initiate the checkpoint operation, the network may be flooded with control messages and process might waste their time making unnecessary checkpoints. In order to avoid this, Silva and Silva give the key to initiate checkpoint algorithm to one process. The checkpoint event is triggered periodically by a local timer mechanism. When this timer expires, the initiator process the checkpoint state of process running in the machine and force all the others to take checkpoint by sending a broadcast message. The interval between adjacent checkpoints is called checkpoint interval. 2.4 KOO-TOUEG’S ALGORITHM [10] They proposed a minimum process blocking checkpointing algorithm for distributed systems. The algorithm consists of two phases. During the first phase, the checkpoint initiator identifies all processes with which it has communicated since the last checkpoint and sends them a request. Upon receiving the request, each process in turn identifies all process it has communicated with since the last checkpoint and sends them a request, and so on, until no more processes can be identified. During the second phase, all process identified in the first phase take a checkpoint. The result is a consistent checkpoint that involves only the participating processes. In this protocol, after a process takes a checkpoint, it cannot send any message until the second phase terminates successfully, although receiving messages after the checkpoint is permissible. 2.5 COA AND SINGHAL BLOCKING ALGORITHM [4] They presented a minimum process checkpointing algorithm in which, the dependency information is recorded by a Boolean vector. This algorithm is a two-phase 52
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME protocol and saves two kinds of checkpoint on the stable storage. In the first phase, the initiator sends a request to all processes to send their dependency vector. On receiving the request, each process sends its dependency vector. Having received all the dependency vectors, the initiator constructs an N*N dependency matrix with one row per process, represented by the dependency vector of the process, Based on the dependency matrix, the initiator can locally calculate all the process on which the initiator transitively depend. After the initiator finds all the process that need to take their checkpoints, it adds them to the set Sforced and asks them to take checkpoints. Any process receiving a checkpoint request takes the checkpoint and sends a reply. The process has to be blocked after receiving the dependency vectors request and resumes its computation after receiving a checkpoint request. 2.6 PRAKASH-SINGHAL ALGORITHM [16] It forces only a minimum number of processes to take checkpoints and does not block the underlying computation during checkpointing. Prakash Singhal algorithm forces only part of processes to take checkpoints, the csn of some processes may be out- of-date, and may not be able to avoid inconsistencies [8]. Prakash-Singhal algorithm attempts to solve this problem by having each process maintains an array to save the csn, where csni[i] has been the expected csn of Pi. Note that Pi’s csnj[i] may be different from Pj's csnj [i] if there is no communication between Pi and Pj for several checkpoint intervals. By using csn and the initiator identification number, they claim that their non- blocking algorithm can avoid inconsistencies and minimize the number of checkpoints during checkpointing. 2.7. P. KUMAR’S HYBRID ALGORITHM [13] In minimum-process coordinated checkpointing, some processes may not checkpoint for several checkpoint initiations. In the case of a recovery after a fault, such processes may rollback to far earlier checkpointed state and thus may cause greater loss of computation. In all-process coordinated checkpointing, the recovery line is advanced for all processes but the checkpointing overhead may be exceedingly high. To optimize both matrices, the checkpointing overhead and the loss of computation on recovery, P.Kumar proposed a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing 53
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME algorithm for a fixed number of times. Thus, the Mobile nodes with low activity or in doze mode operation may not be disturbed in the case of minimum-process checkpointing and the recovery line is advanced for each process after an all-process checkpoint. Additionally, he tried to minimize the information piggybacked onto each computation message. For minimum-process checkpointing, he designed a blocking algorithm, where no useless checkpoints are taken and an effort has been made to optimize the blocking of processes. He proposed to delay selective messages at the receiver end. By doing so, processes are allowed to perform their normal computation, send messages and partially receive them during their blocking period. The proposed minimum-process blocking algorithm forces zero useless checkpoints at the cost of very small blocking. 3. CONCLUSION The existence of mobile nodes in a distributed system introduces new issues that need proper handling while designing a checkpointing algorithm for such systems. These issues are mobility, disconnections, finite power source, vulnerable to physical damage, lack of stable storage, etc. The new issues make traditional checkpointing techniques unsuitable to checkpoint mobile distributed systems. A good checkpointing protocol for mobile distributed systems should have low memory overheads on MHs, low overheads on wireless channels and should avoid awakening of an MH in doze mode operation. The disconnection of an MH should not lead to infinite wait state. The algorithm should be non-intrusive and should force minimum number of processes to take their local checkpoints. Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. This approach is domino-free, requires at most two checkpoints of a process on stable storage, and forces only a minimum number of processes to checkpoint. It may require blocking of processes, extra synchronization messages, piggybacking of some information along with computation messages, or taking some useless checkpoints. 54
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME In this paper we have given introductory concepts related to checkpointing in mobile distributed systems. We also gave an analysis of various minimum-process checkpointing schemes especially designed for mobile distributed systems. REFERENCES [1] Singhal, M, N G Shivratri, “Advanced concepts in Operating Systems, Tata Mc Graw Hill, 1994. [2] Acharya A., “Structuring Distributed Algorithms and Services for networks with Mobile Hosts”, Ph.D. Thesis, Rutgers University, 1995. [3] Cao G. and Singhal M., “On coordinated checkpointing in Distributed Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 9, no.12, pp. 1213-1225, Dec 1998. [4] Cao G. and Singhal M., “On the Impossibility of Min-process Non-blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems,” Proceedings of International Conference on Parallel Processing, pp. 37- 44, August 1998. [5] Cao G. and Singhal M., “Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 12, no. 2, pp. 157-172, February 2001. [6] Cao G. and Singhal M., “Checkpointing with Mutable Checkpoints”, Theoretical Computer Science, 290(2003), pp. 1127-1148. [7] Chandy K. M. and Lamport L., “Distributed Snapshots: Determining Global State of Distributed Systems,” ACM Transaction on Computing Systems, vol. 3, No. 1, pp. 63- 75, February 1985. [8] Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., “A Survey of Rollback- Recovery Protocols in Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002. [9] Kalaiselvi, S., Rajaraman, V., “A Survey of Checkpointing Algorithms for Parallel and Distributed Systems”, Sadhna, Vol. 25, Part 5, October 2000, pp. 489-510. 55
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME [10] Koo R. and Toueg S., “Checkpointing and Roll-Back Recovery for Distributed Systems,” IEEE Trans. on Software Engineering, vol. 13, no. 1, pp. 23-31, January 1987. [11] Parveen Kumar, Lalit Kumar, R K Chauhan, V K Gupta “A Non-Intrusive Minimum Process Synchronous Checkpointing Protocol for Mobile Distributed Systems” Proceedings of IEEE ICPWC-2005, January 2005. [12] Pushpendra Singh, Gilbert Cabillic, “A Checkpointing Algorithm for Mobile Computing Environment”, LNCS, No. 2775, pp 65-74, 2003. [13] Parveen Kumar, “A Low-Cost Hybrid Coordinated Checkpointing Protocol for Mobile Distributed Systems”, Mobile Information Systems [An International Journal from IOS Press, Netherlands] pp 13-32, Vol. 4, No. 1, 2007. [14] Lalit Kumar, Parveen Kumar “A Synchronous Checkpointing Protocol for Mobile Distributed Systems: A Probabilistic Approach”, International Journal of Information and Computer Security [An International Journal from Inderscience Publishers, USA], pp 298-314, Vol. 3 No. 1, 2007. [15] Silva L, Silva J 1992 Global checkpointing for distributed programs. Proc. IEEE 11th Symp. On Reliable Distributed Syst. pp 155-162. [16] R. Prakash and M. Singhal. “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems”. IEEE Trans. on Parallel and Distributed System, pages 1035-1048,Oct. 1996. 56