SlideShare une entreprise Scribd logo
1  sur  34
Taming Data Corruptions
 in Distributed Systems
   Marco Serafini (Yahoo! Research BCN)
Infrastructure dependability
o Service availability, data durability
o In presence of hardware faults
o Current approaches tolerate crashes
Crashes
o Assumptions
   o A server (process) suddenly stops
   o Until then, only correct steps



                                    Crash




                Time
Data corruptions
o What if there are data corruptions?
  o The state of a process may be corrupted
  o The process may make incorrect steps before stopping


                         Data
                      corruptions




                            Time
Data corruptions
o What if there are data corruptions?
  o The state of a process may be corrupted
  o The process may make incorrect steps before stopping
                  NOT COVERED!
                         Data
                      corruptions




                            Time
Sources of data corruptions
o Commodity disks are known to be unreliable
   o Faulty firmware, bad sectors etc.
o RAM: ECC errors are frequent
   o Production machines only see detected errors
      Coverage not known
o Interconnects and CPUs also fail
   o Faulty drivers or bit flips
A horror story
An 8-hour system-wide outage due to a single hardware fault
What happened?
o Quoted from the Amazon service health dashboard
   o “A handful of messages had a single bit corrupted”
   o “The message was still intelligible, but the system state
     information was incorrect”
   o “We used MD5 checksums throughout the system (but
     not) for this particular internal state information”
   o “(The corruption) spread throughout the system causing
     the symptoms described above”
Error propagation
min




 u                                            x
                             Event
           Event handling   handling
 v                                            y

           mout                        min


      Process i                   Process j
Common practice
o Manual placement of ad-hoc
 error detection checks
  o Application knowledge
  o Time consuming
o Hard to structure without
  fault model
o No error isolation guarantee
Research: Byzantine faults
     o Byzantine model
        o Faulty nodes controlled by an adversary
        o Worst-case model


                              Byzantine
                                fault




                                  Time
11
Byzantine fault model
o Black-box model of faulty processes: adversarial
o Hardening for error isolation [Nysiad NSDI 2008]
  o Based on state machine replication
  o Replication and performance costs

                                      Agreement on requests

                                         Servers




                             Client
Byzantine faults
o Byzantine hardening covers attacks and bugs…
o … assuming, e.g., design diversity of replicas
  o Unpractical in most systems  no real adoption



      Attacks              Bugs           Data corruptions




      Security             V&V             ASC Hardening
A new approach to
    min
            error isolation

      u               Event       Event            x
                     handling     handling
      v                                            y
              mout                     min

            Process i                  Process j

1. General model of process behavior
2. Arbitrary State Corruption (ASC) fault model
3. Guarantee error isolation through hardening
A new approach to
    min
            error isolation

      u               Event       Event            x
                     handling     handling
      v                                            y
              mout                     min

            Process i                  Process j

1. General model of process behavior
2. Arbitrary Correia, D. Ferro(ASC)F. Junqueira
    with M. State Corruption and fault model
3. Guarantee error isolation through Conference
   2012 Usenix Annual Technical hardening
Process and fault models
    Defining Arbitrary State Corruptions
Process model
                                min
1) Event Dispatching

                       Upon receive message <REQ, r> do
                             if v > 5 then
                                      u = r + v + 5;
 2) Event Handling           else
                                      u = r + v;
                                                            State
                             v = u;
                             send <WRITE, v> to process p




3) Message sending
                               mout
ASC fault model
o An Arbitrary State Corruption can make a process
   o Crash
   o Assign an arbitrary value to any variable
   o Start the execution from an arbitrary instruction

        v      5                      v     12

        z      10                     z      7

        PC     20                     PC    320
Fault frequency
o One fault for every processed input message

                                   min
  1) Event Dispatching

                         Upon receive message <REQ, r> do
                               if v > 5 then
                                        u = r + v + 5;

    2) Event Handling          else
                                        u= r + v;
                                                              State
                               v = u;
                               send <WRITE, v> to process p




   3) Message sending              mout
Fault diversity
o A corrupted variable is different from its replica

 v       5         5                      v       12          5

 z      10         10                     z        7          41

 PC     20                                PC     320

      original   replica                        original   replica

o Only holds immediately after the fault
   o Can be invalidated if instructions modify the variable
Error propagation
o Fault diversity does not hold
o Hardening preserves diversity

                            Fault
               Original               Replica
                          diversity
           u



           v                ?
ASC hardening
From ASC faults to crashes and message omissions
From ASC to crashes
o Transparent: to the hardened process
o Local: no process replication on multiple machines
o Untrusted: can have faults while executing hardening

          min


                u
                            Event handling
                v

                           mout

                    HARDENING RUNTIME
PASC library
    Process         Replica
     state           state

                               PASC checks


    EH1       EH2        EH3


                                       User- defined
          PASC runtime
                                       Transparent



github.com/yahoo/pasc
Evaluation
Hardening an echo server




o Little computation, network bound, no overhead
o PBFT is a reference (Nysiad not available)
Hardening
State Machine Replication
                6
                                PBFT
                         PASC Paxos
                5       Unprot. Paxos
Latency in ms




                4
                                                        - 15 %
                3                             + 70 %

                2

                1

                0
                    0   20     40       60   80   100   120      140
                               Throughput in Kops/s
Zookeeper (core)
Memory overhead
Scalability
                                   100
      Max. throughput (kops/sec)    90

                                    80

                                    70

                                    60

                                    50

                                    40
                                                                     PASC sKV
                                    30                               Unprot. sKV
                                    20

                                    10

                                     0
                                         1      3        5       7

                                             Number of servers

o SimpleKV: eventually consistent store, no replication
   o Scales similarly with hardening
   o No server “wasted” for replication
PASC fault coverage
  o Injected random bit flips in Paxos
     o Code corruptions: bytecode and binary code
     o State corruptions: pointers and primitive values

               Code corruptions           State corruptions
              Unprot         PASC        Unprot           PASC
 Undet.         3             0            93               0
  Det.           -             1            -              330
  Crash        1640          1663         2301            2066
Not manif.     1213          1193         2843            2841
  Total        2856          2856         5237            5237
Wrap up
o Hardware data corruptions are a real danger
o Proposed new systematic approach
   o BFT not realistic
   o Ad-hoc approaches are not systematic
o Hardening algorithm for error isolation
   o Local: does not require replication
   o Efficient: PASC-Paxos has up to 70% more throughput
     than PBFT
   o High fault coverage
Directions
o Systematic protection of Yahoo! infrastructure against
  data corruptions
o ASC just scratched the surface – some todos
  o Reduce memory footprint
  o Support for external memory (disks/SSDs)
  o Hardening of legacy code
  o Theoretical foundations
Thank you
serafini@yahoo-inc.com

Contenu connexe

Similaire à PASC fault tolerance

Inside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware AnalysisInside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware AnalysisChong-Kuan Chen
 
PhD Thesis Diogo Mónica
PhD Thesis Diogo MónicaPhD Thesis Diogo Mónica
PhD Thesis Diogo MónicaDiogo Mónica
 
Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...
Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...
Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...Diego Kreutz
 
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...Iosif Itkin
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go BadSteve Loughran
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsRoberto Natella
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosBrent Salisbury
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approachDerek Chang
 
Security Challenges of Antivirus Engines, Products and Systems
Security Challenges of Antivirus Engines, Products and SystemsSecurity Challenges of Antivirus Engines, Products and Systems
Security Challenges of Antivirus Engines, Products and SystemsAntiy Labs
 
War stories from building a public cloud
War stories from building a public cloudWar stories from building a public cloud
War stories from building a public cloudWSO2
 
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)Nate Lawson
 
Virtual Machines Security Internals: Detection and Exploitation
 Virtual Machines Security Internals: Detection and Exploitation Virtual Machines Security Internals: Detection and Exploitation
Virtual Machines Security Internals: Detection and ExploitationMattia Salvi
 
Cansec West 2009
Cansec West 2009Cansec West 2009
Cansec West 2009abhicc285
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Idit Levine
 
[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and Visualization[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and VisualizationChenChunYu2
 
Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...
Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...
Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...camunda services GmbH
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise ApplicationsDaniel Oh
 

Similaire à PASC fault tolerance (20)

Inside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware AnalysisInside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware Analysis
 
PhD Thesis Diogo Mónica
PhD Thesis Diogo MónicaPhD Thesis Diogo Mónica
PhD Thesis Diogo Mónica
 
Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...
Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...
Identity Providers-as-a-Service built as Cloud-of-Clouds: challenges and oppo...
 
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
Introduction into Fault-tolerant Distributed Algorithms and their Modeling (P...
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow Demos
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approach
 
Security Challenges of Antivirus Engines, Products and Systems
Security Challenges of Antivirus Engines, Products and SystemsSecurity Challenges of Antivirus Engines, Products and Systems
Security Challenges of Antivirus Engines, Products and Systems
 
War stories from building a public cloud
War stories from building a public cloudWar stories from building a public cloud
War stories from building a public cloud
 
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
 
Virtual Machines Security Internals: Detection and Exploitation
 Virtual Machines Security Internals: Detection and Exploitation Virtual Machines Security Internals: Detection and Exploitation
Virtual Machines Security Internals: Detection and Exploitation
 
Cansec West 2009
Cansec West 2009Cansec West 2009
Cansec West 2009
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017
 
[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and Visualization[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and Visualization
 
Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...
Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...
Paul Lungu: Microservices Integration: Challenges and Solutions - Camunda Day...
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
 
6620handout4o
6620handout4o6620handout4o
6620handout4o
 
Ch3-2
Ch3-2Ch3-2
Ch3-2
 

Dernier

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Dernier (20)

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

PASC fault tolerance

  • 1. Taming Data Corruptions in Distributed Systems Marco Serafini (Yahoo! Research BCN)
  • 2. Infrastructure dependability o Service availability, data durability o In presence of hardware faults o Current approaches tolerate crashes
  • 3. Crashes o Assumptions o A server (process) suddenly stops o Until then, only correct steps Crash Time
  • 4. Data corruptions o What if there are data corruptions? o The state of a process may be corrupted o The process may make incorrect steps before stopping Data corruptions Time
  • 5. Data corruptions o What if there are data corruptions? o The state of a process may be corrupted o The process may make incorrect steps before stopping NOT COVERED! Data corruptions Time
  • 6. Sources of data corruptions o Commodity disks are known to be unreliable o Faulty firmware, bad sectors etc. o RAM: ECC errors are frequent o Production machines only see detected errors  Coverage not known o Interconnects and CPUs also fail o Faulty drivers or bit flips
  • 7. A horror story An 8-hour system-wide outage due to a single hardware fault
  • 8. What happened? o Quoted from the Amazon service health dashboard o “A handful of messages had a single bit corrupted” o “The message was still intelligible, but the system state information was incorrect” o “We used MD5 checksums throughout the system (but not) for this particular internal state information” o “(The corruption) spread throughout the system causing the symptoms described above”
  • 9. Error propagation min u x Event Event handling handling v y mout min Process i Process j
  • 10. Common practice o Manual placement of ad-hoc error detection checks o Application knowledge o Time consuming o Hard to structure without fault model o No error isolation guarantee
  • 11. Research: Byzantine faults o Byzantine model o Faulty nodes controlled by an adversary o Worst-case model Byzantine fault Time 11
  • 12. Byzantine fault model o Black-box model of faulty processes: adversarial o Hardening for error isolation [Nysiad NSDI 2008] o Based on state machine replication o Replication and performance costs Agreement on requests Servers Client
  • 13. Byzantine faults o Byzantine hardening covers attacks and bugs… o … assuming, e.g., design diversity of replicas o Unpractical in most systems  no real adoption Attacks Bugs Data corruptions Security V&V ASC Hardening
  • 14. A new approach to min error isolation u Event Event x handling handling v y mout min Process i Process j 1. General model of process behavior 2. Arbitrary State Corruption (ASC) fault model 3. Guarantee error isolation through hardening
  • 15. A new approach to min error isolation u Event Event x handling handling v y mout min Process i Process j 1. General model of process behavior 2. Arbitrary Correia, D. Ferro(ASC)F. Junqueira with M. State Corruption and fault model 3. Guarantee error isolation through Conference 2012 Usenix Annual Technical hardening
  • 16. Process and fault models Defining Arbitrary State Corruptions
  • 17. Process model min 1) Event Dispatching Upon receive message <REQ, r> do if v > 5 then u = r + v + 5; 2) Event Handling else u = r + v; State v = u; send <WRITE, v> to process p 3) Message sending mout
  • 18. ASC fault model o An Arbitrary State Corruption can make a process o Crash o Assign an arbitrary value to any variable o Start the execution from an arbitrary instruction v 5 v 12 z 10 z 7 PC 20 PC 320
  • 19. Fault frequency o One fault for every processed input message min 1) Event Dispatching Upon receive message <REQ, r> do if v > 5 then u = r + v + 5; 2) Event Handling else u= r + v; State v = u; send <WRITE, v> to process p 3) Message sending mout
  • 20. Fault diversity o A corrupted variable is different from its replica v 5 5 v 12 5 z 10 10 z 7 41 PC 20 PC 320 original replica original replica o Only holds immediately after the fault o Can be invalidated if instructions modify the variable
  • 21. Error propagation o Fault diversity does not hold o Hardening preserves diversity Fault Original Replica diversity u v ?
  • 22. ASC hardening From ASC faults to crashes and message omissions
  • 23. From ASC to crashes o Transparent: to the hardened process o Local: no process replication on multiple machines o Untrusted: can have faults while executing hardening min u Event handling v mout HARDENING RUNTIME
  • 24. PASC library Process Replica state state PASC checks EH1 EH2 EH3 User- defined PASC runtime Transparent github.com/yahoo/pasc
  • 26. Hardening an echo server o Little computation, network bound, no overhead o PBFT is a reference (Nysiad not available)
  • 27. Hardening State Machine Replication 6 PBFT PASC Paxos 5 Unprot. Paxos Latency in ms 4 - 15 % 3 + 70 % 2 1 0 0 20 40 60 80 100 120 140 Throughput in Kops/s
  • 30. Scalability 100 Max. throughput (kops/sec) 90 80 70 60 50 40 PASC sKV 30 Unprot. sKV 20 10 0 1 3 5 7 Number of servers o SimpleKV: eventually consistent store, no replication o Scales similarly with hardening o No server “wasted” for replication
  • 31. PASC fault coverage o Injected random bit flips in Paxos o Code corruptions: bytecode and binary code o State corruptions: pointers and primitive values Code corruptions State corruptions Unprot PASC Unprot PASC Undet. 3 0 93 0 Det. - 1 - 330 Crash 1640 1663 2301 2066 Not manif. 1213 1193 2843 2841 Total 2856 2856 5237 5237
  • 32. Wrap up o Hardware data corruptions are a real danger o Proposed new systematic approach o BFT not realistic o Ad-hoc approaches are not systematic o Hardening algorithm for error isolation o Local: does not require replication o Efficient: PASC-Paxos has up to 70% more throughput than PBFT o High fault coverage
  • 33. Directions o Systematic protection of Yahoo! infrastructure against data corruptions o ASC just scratched the surface – some todos o Reduce memory footprint o Support for external memory (disks/SSDs) o Hardening of legacy code o Theoretical foundations

Notes de l'éditeur

  1. User Impact: &gt;10 Million users unable to use a given service. Revenue Impact: &gt;$100K. Brand Impact: Outage requires press release. Top Tier Revenue Property Impact (see list below)
  2. search.yahoo.com sponsored text ads are not displaying in the North placement. Sponsored Ads are instead being moved to the east placement. There was a limit for the number of different data dictionary match types that QP can handle (720 types). The DD built and pushed the night of included an additional 400 types, slowly incrementing over the course of the months, and finally exceeding the l
  3. SPEND MORE HERE
  4. ----- Meeting Notes (6/8/12 16:43) -----TODO: more detailed figure of how the runtime looks like- event handler- replica state
  5. ----- Meeting Notes (6/8/12 15:57) -----Simple exampleno overhead because little computation and network bound
  6. ----- Meeting Notes (6/8/12 11:51) -----too many plots, remove the ones for batching one----- Meeting Notes (6/8/12 15:57) -----more concrete example----- Meeting Notes (6/8/12 16:47) -----stress that PASC is not SMR. Paxos is built on top of PASC. Maybe have a bullet
  7. ----- Meeting Notes (6/8/12 11:51) -----use bars with one value (max tput) per setting
  8. ----- Meeting Notes (6/8/12 15:57) -----Does PASC really detect corruptions?