SlideShare une entreprise Scribd logo
1  sur  15
CprE 458/558: Real-Time Systems


                              Lecture 17
                  Fault-tolerant design techniques




CprE 458/558               G. Manimaran (ISU)
Fault Tolerant Strategies
    Fault tolerance in computer system is achieved through
     redundancy in hardware, software, information, and/or
     computations. Such redundancy can be implemented in
     static, dynamic, or hybrid configurations.
    Fault tolerance can be achieved by many techniques:
        Fault masking is any process that prevents faults in a system
         from introducing errors. Example: Error correcting memories and
         majority voting.
        Reconfiguration is the process of eliminating faulty component
         from a system and restoring the system to some operational state.




CprE 458/558                 G. Manimaran (ISU)                          2
Reconfiguration Approach
    Fault detection is the process of recognizing that a fault
     has occurred. Fault detection is often required before any
     recovery procedure can be initiated.
    Fault location is the process of determining where a fault
     has occurred so that an appropriate recovery can be
     initiated.
    Fault containment is the process of isolating a fault and
     preventing the effects of that fault from propagating
     throughout the system.
    Fault recovery is the process of remaining operational or
     regaining operational status via reconfiguration even in the
     presence of faults.

CprE 458/558             G. Manimaran (ISU)                     3
The Concept of Redundancy
    Redundancy is simply the addition of information,
     resources, or time beyond what is needed for normal
     system operation.
    Hardware redundancy is the addition of extra
     hardware, usually for the purpose either detecting or
     tolerating faults.
    Software redundancy is the addition of extra software,
     beyond what is needed to perform a given function, to
     detect and possibly tolerate faults.
    Information redundancy is the addition of extra
     information beyond that required to implement a given
     function; for example, error detection codes.


CprE 458/558            G. Manimaran (ISU)                    4
The Concept of Redundancy (Cont’d)
    Time redundancy uses additional time to perform the
     functions of a system such that fault detection and often
     fault tolerance can be achieved. Transient faults are
     tolerated by this.

    The use of redundancy can provide additional capabilities
     within a system. But, redundancy can have very important
     impact on a system's performance, size, weight, power
     consumption, and reliability.




CprE 458/558             G. Manimaran (ISU)                      5
Hardware Redundancy
    Passive techniques use the concept of fault masking.
     These techniques are designed to achieve fault tolerance
     without requiring any action on the part of the system.
     Relies on voting mechanisms.
    Active techniques achieve fault tolerance by detecting
     the existence of faults and performing some action to
     remove the faulty hardware from the system. That is,
     active techniques use fault detection, fault location, and
     fault recovery in an attempt to achieve fault tolerance.




CprE 458/558             G. Manimaran (ISU)                       6
Hardware Redundancy (Cont’d)


    Hybrid techniques combine the attractive features of
     both the passive
     and active approaches.
        Fault masking is used in hybrid systems to prevent erroneous
         results from being generated.
        Fault detection, location, and recovery are also used to improve
         fault tolerance by removing faulty hardware and replacing it with
         spares.




CprE 458/558                 G. Manimaran (ISU)                              7
Hardware Redundancy - A Taxonomy

     Title:
     hard-fault.fig
     Creator:
     fig2dev Version 3.1 Patchlevel 2
     Preview:
     This EPS picture was not saved
     with a preview included in it.
     Comment:
     This EPS picture will print to a
     PostScript printer, but not to
     other types of printers.




CprE 458/558                            G. Manimaran (ISU)   8
Triple Modular Redundancy (TMR)


          Title:
          tmr1.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   9
Software Redundancy - to Detect
        Software Faults
    There are two popular approaches: N-Version
     Programming (NVP) and Recovery Blocks (RB).

    NVP is a forward recovery scheme - it masks faults.
    NVP: multiple versions of the same task is executed
     concurrently.
    NVP relies on voting.

    RB is a backward error recovery scheme.
    RB: the versions of a task are executed serially.
    RB relies on acceptance test.

CprE 458/558              G. Manimaran (ISU)               10
N-Version Programming (NVP)
    NVP is based on the principle of design diversity, that is coding a
     software module by different teams of programmers, to have multiple
     versions.

    Diversity can also be introduced by employing different algorithms for
     obtaining the same solution or by choosing different programming
     languages.

    NVP can tolerate both hardware and software faults.

    Correlated faults are not tolerated by the NVP.

    In NVP, deciding the number of versions required to ensure acceptable
     levels of software reliability is an important design consideration.


CprE 458/558                  G. Manimaran (ISU)                          11
N-Version Programming (Cont’d)

          Title:
          nvp.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   12
Recovery Blocks (RB)
    RB uses multiple alternates (backups) to perform the same
     function; one module (task) is primary and the others are
     secondary.

    The primary task executes first. When the primary task
     completes execution, its outcome is checked by an
     acceptance test.

    If the output is not acceptable, a secondary task executes
     after undoing the effects of primary (i.e., rolling back to
     the state at which primary was invoked) until either an
     acceptable output is obtained or the alternates are
     exhausted.

CprE 458/558             G. Manimaran (ISU)                    13
Recovery Blocks (Cont’d)
         Title:
         rblocks.fig
         Creator:
         fig2dev Version 3.1 Patchlevel 2
         Preview:
         This EPS picture was not saved
         with a preview included in it.
         Comment:
         This EPS picture will print to a
         PostScript printer, but not to
         other types of printers.




CprE 458/558                                G. Manimaran (ISU)   14
Recovery Blocks (Cont’d)
    The acceptance tests are usually sanity checks; these
     consist of making sure that the output is within a certain
     acceptable range or that the output does not change at
     more than the allowed maximum rate.

    Selecting the range for acceptance test is crucial. If the
     allowed ranges are too small, the acceptance tests may
     label correct outputs as bad. If they are too large, the
     probability that incorrect outputs will be accepted is more.

    RB can tolerate software faults because the alternates are
     usually implemented with different approaches; RB is also
     known as Primary-Backup approach.

CprE 458/558              G. Manimaran (ISU)                      15

Contenu connexe

Tendances

Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process CommunicationAdeel Rasheed
 
Deadlock avoidance and prevention .. computer networking
Deadlock avoidance and prevention .. computer networkingDeadlock avoidance and prevention .. computer networking
Deadlock avoidance and prevention .. computer networkingTamannaSharma70
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelismprashantdahake
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Chapter 4 a interprocess communication
Chapter 4 a interprocess communicationChapter 4 a interprocess communication
Chapter 4 a interprocess communicationAbDul ThaYyal
 
fault-tolerance-slide.ppt
fault-tolerance-slide.pptfault-tolerance-slide.ppt
fault-tolerance-slide.pptShailendra61
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sortMadhu Bala
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed SystemsOptimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systemsmridul mishra
 
Group Communication (Distributed computing)
Group Communication (Distributed computing)Group Communication (Distributed computing)
Group Communication (Distributed computing)Sri Prasanna
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Anshul gour
 
Firewall Design and Implementation
Firewall Design and ImplementationFirewall Design and Implementation
Firewall Design and Implementationajeet singh
 
Election algorithms
Election algorithmsElection algorithms
Election algorithmsAnkush Kumar
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating SystemsDr Sandeep Kumar Poonia
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Sri Prasanna
 

Tendances (20)

Heap Management
Heap ManagementHeap Management
Heap Management
 
Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process Communication
 
Deadlock avoidance and prevention .. computer networking
Deadlock avoidance and prevention .. computer networkingDeadlock avoidance and prevention .. computer networking
Deadlock avoidance and prevention .. computer networking
 
Distributed Deadlock Detection.ppt
Distributed Deadlock Detection.pptDistributed Deadlock Detection.ppt
Distributed Deadlock Detection.ppt
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Chapter 4 a interprocess communication
Chapter 4 a interprocess communicationChapter 4 a interprocess communication
Chapter 4 a interprocess communication
 
fault-tolerance-slide.ppt
fault-tolerance-slide.pptfault-tolerance-slide.ppt
fault-tolerance-slide.ppt
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed SystemsOptimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systems
 
Operating system - Deadlock
Operating system - DeadlockOperating system - Deadlock
Operating system - Deadlock
 
Group Communication (Distributed computing)
Group Communication (Distributed computing)Group Communication (Distributed computing)
Group Communication (Distributed computing)
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)
 
Firewall Design and Implementation
Firewall Design and ImplementationFirewall Design and Implementation
Firewall Design and Implementation
 
Election algorithms
Election algorithmsElection algorithms
Election algorithms
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)
 

En vedette

N-version programming
N-version programmingN-version programming
N-version programmingshabnam0102
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data BaseSiva Rushi
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)Pradeep Kumar TS
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentationardhita banu adji
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)Riswan
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008courseAnuj Sharma
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsWayne Jones Jnr
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systemscoolmirza143
 
Real Time Systems & RTOS
Real Time Systems & RTOSReal Time Systems & RTOS
Real Time Systems & RTOSVishwa Mohan
 
DISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementDISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementRasan Samarasinghe
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems Maurvi04
 

En vedette (20)

N-version programming
N-version programmingN-version programming
N-version programming
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
Vxworks
VxworksVxworks
Vxworks
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentation
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008course
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time Systems
 
Ch21 real time software engineering
Ch21 real time software engineeringCh21 real time software engineering
Ch21 real time software engineering
 
DFD level-0 to 1
DFD level-0 to 1DFD level-0 to 1
DFD level-0 to 1
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systems
 
Real Time Systems & RTOS
Real Time Systems & RTOSReal Time Systems & RTOS
Real Time Systems & RTOS
 
In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
 
DISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementDISE - Software Testing and Quality Management
DISE - Software Testing and Quality Management
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems
 

Similaire à Fault tolerance

1 introduction
1 introduction1 introduction
1 introductionhanmya
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE2
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultAM Publications
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...vtunotesbysree
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisIDES Editor
 
Presentation
PresentationPresentation
Presentations1150056
 
Presentation
PresentationPresentation
Presentations1150056
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSFelipe Prado
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindseteva
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSvampugani
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share.Gastón. .Bx.
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsIRJET Journal
 

Similaire à Fault tolerance (20)

1 introduction
1 introduction1 introduction
1 introduction
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software Fault
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage Analysis
 
580 584
580 584580 584
580 584
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Ch20
Ch20Ch20
Ch20
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
 

Dernier

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Dernier (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Fault tolerance

  • 1. CprE 458/558: Real-Time Systems Lecture 17 Fault-tolerant design techniques CprE 458/558 G. Manimaran (ISU)
  • 2. Fault Tolerant Strategies  Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations. Such redundancy can be implemented in static, dynamic, or hybrid configurations.  Fault tolerance can be achieved by many techniques:  Fault masking is any process that prevents faults in a system from introducing errors. Example: Error correcting memories and majority voting.  Reconfiguration is the process of eliminating faulty component from a system and restoring the system to some operational state. CprE 458/558 G. Manimaran (ISU) 2
  • 3. Reconfiguration Approach  Fault detection is the process of recognizing that a fault has occurred. Fault detection is often required before any recovery procedure can be initiated.  Fault location is the process of determining where a fault has occurred so that an appropriate recovery can be initiated.  Fault containment is the process of isolating a fault and preventing the effects of that fault from propagating throughout the system.  Fault recovery is the process of remaining operational or regaining operational status via reconfiguration even in the presence of faults. CprE 458/558 G. Manimaran (ISU) 3
  • 4. The Concept of Redundancy  Redundancy is simply the addition of information, resources, or time beyond what is needed for normal system operation.  Hardware redundancy is the addition of extra hardware, usually for the purpose either detecting or tolerating faults.  Software redundancy is the addition of extra software, beyond what is needed to perform a given function, to detect and possibly tolerate faults.  Information redundancy is the addition of extra information beyond that required to implement a given function; for example, error detection codes. CprE 458/558 G. Manimaran (ISU) 4
  • 5. The Concept of Redundancy (Cont’d)  Time redundancy uses additional time to perform the functions of a system such that fault detection and often fault tolerance can be achieved. Transient faults are tolerated by this.  The use of redundancy can provide additional capabilities within a system. But, redundancy can have very important impact on a system's performance, size, weight, power consumption, and reliability. CprE 458/558 G. Manimaran (ISU) 5
  • 6. Hardware Redundancy  Passive techniques use the concept of fault masking. These techniques are designed to achieve fault tolerance without requiring any action on the part of the system. Relies on voting mechanisms.  Active techniques achieve fault tolerance by detecting the existence of faults and performing some action to remove the faulty hardware from the system. That is, active techniques use fault detection, fault location, and fault recovery in an attempt to achieve fault tolerance. CprE 458/558 G. Manimaran (ISU) 6
  • 7. Hardware Redundancy (Cont’d)  Hybrid techniques combine the attractive features of both the passive and active approaches.  Fault masking is used in hybrid systems to prevent erroneous results from being generated.  Fault detection, location, and recovery are also used to improve fault tolerance by removing faulty hardware and replacing it with spares. CprE 458/558 G. Manimaran (ISU) 7
  • 8. Hardware Redundancy - A Taxonomy Title: hard-fault.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 8
  • 9. Triple Modular Redundancy (TMR) Title: tmr1.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 9
  • 10. Software Redundancy - to Detect Software Faults  There are two popular approaches: N-Version Programming (NVP) and Recovery Blocks (RB).  NVP is a forward recovery scheme - it masks faults.  NVP: multiple versions of the same task is executed concurrently.  NVP relies on voting.  RB is a backward error recovery scheme.  RB: the versions of a task are executed serially.  RB relies on acceptance test. CprE 458/558 G. Manimaran (ISU) 10
  • 11. N-Version Programming (NVP)  NVP is based on the principle of design diversity, that is coding a software module by different teams of programmers, to have multiple versions.  Diversity can also be introduced by employing different algorithms for obtaining the same solution or by choosing different programming languages.  NVP can tolerate both hardware and software faults.  Correlated faults are not tolerated by the NVP.  In NVP, deciding the number of versions required to ensure acceptable levels of software reliability is an important design consideration. CprE 458/558 G. Manimaran (ISU) 11
  • 12. N-Version Programming (Cont’d) Title: nvp.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 12
  • 13. Recovery Blocks (RB)  RB uses multiple alternates (backups) to perform the same function; one module (task) is primary and the others are secondary.  The primary task executes first. When the primary task completes execution, its outcome is checked by an acceptance test.  If the output is not acceptable, a secondary task executes after undoing the effects of primary (i.e., rolling back to the state at which primary was invoked) until either an acceptable output is obtained or the alternates are exhausted. CprE 458/558 G. Manimaran (ISU) 13
  • 14. Recovery Blocks (Cont’d) Title: rblocks.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 14
  • 15. Recovery Blocks (Cont’d)  The acceptance tests are usually sanity checks; these consist of making sure that the output is within a certain acceptable range or that the output does not change at more than the allowed maximum rate.  Selecting the range for acceptance test is crucial. If the allowed ranges are too small, the acceptance tests may label correct outputs as bad. If they are too large, the probability that incorrect outputs will be accepted is more.  RB can tolerate software faults because the alternates are usually implemented with different approaches; RB is also known as Primary-Backup approach. CprE 458/558 G. Manimaran (ISU) 15