SlideShare une entreprise Scribd logo
1  sur  13
Automated Fault-Tolerance Testing
Ajay Vaddadi, Groupon
April 11th, 2016
About Me and My Team
● Lead Internal Tools Developer @ Groupon.
● Employed with Groupon for 4 years.
● Built various Internal tools ranging from Test Automation Frameworks to
Continuous Performance and Deployment Tools (Story for another time …)
● My Team -
○ Adithya Nagarajan (Team Manager and Co-Author on the Paper)
○ Amrutha Badami
○ Kavin Arasu
Fault Tolerance - What is it and why is it important ?
● Fault Tolerance is an ability of computer software to continue its normal
operation despite the presence of system or hardware faults.
● Fault in Software terms can vary from high latency from dependent or sub-
systems to complete failure of the System under test (SUT).
● Fault injection can easily be simulated and overall environment behaviour
can be observed to make changes in design to be better tolerant to the
failure.
● Many downtimes and economic losses would have been avoided if were
tested for fault tolerance -
○ ATT Complete Network Failure (1990) - SPOF switch failure.
○ Amazon Outage (2011) - Interestingly Netflix was not affected.
About Groupon …
Engineering Scale @ Groupon
● Microservices Based Architecture
● 500+ Production Services
● In-house, Self-Managed Data
Centres spread across NA,
Europe & Asia Pacific
● Complex Interdependencies
between Services
What are the Requirements ?
● Ability to Simulate Failures and Identify resiliency issues within the
Groupon Architecture
● Ability to understand the Failure propagation pattern when a failure occurs.
● Ability to understand service architecture to better identify and group faults
on a unique set of application servers/apps/machines.
● Ability to continuously monitor critical metrics during the faults on the SUT
and dependent Services and terminate when anomaly is detected.
● Ability to recover the System under Test to pre-fault state.
● Companies like Google, Amazon, Twitter and Netflix conduct
fault tolerance testing on their platform.
● NETFLIX was one of the early players of in this Fault testing
domain by developing widely popular CHAOS MONKEY /
SIMIAN ARMY
● Other tools such as Varien, Pantera are mocked versions of
SIMIAN ARMY developed in other ecosystems like python.
● Either very company specific and are not reusable OR highly
geared towards Cloud based Infra such as Amazon Web
Services etc.
What is already done ?
Available Tools : Are we reinventing the wheel ?
ScrewDriver - Our Proposed Solution ..
● Capsule Builder - Generates time based expirable, machine specific
capsule/container to be deployed on the machines specified by Topology service.
● Controller Core - Starts/Stops the fault/capsule with proper authentication. During
fault execution works with Anomaly detector using Publisher/Subscriber model.
● Anomaly Detector - Interacts with Metric Client and identifies Anomalies. If found -
aborts the current fault execution. Sometimes, Static Threshold checks are not
enough to identify anomalies and need to delve into 2nd and 3rd derivatives.
● Notification Module - Communicates the status about the fault execution and post-
execution report to the service owners.
Controller (and it’s Friends)
Controller is a combination of 4 major modules focussing on 3 Key aspects - Building and Deploying Capsule,
Injecting Fault(Core), Monitoring the Fault(Anomaly Detector).
How does it work?
Topology Translation Engine
Service to parse and understand the building blocks (components/layers) of any service. Picks the machines and
the corresponding faults to be executed and identifies the dependencies for the given service to monitor.
● Consumes topology YAML file for a given service and identifies underlying
components/layers, machines, monitors and dependencies for the service.
● Information is pushed to GraphDB(Neo4j) for persistent storage.
● GraphDB allows us to identify dependency chain between systems without complex
joins.
● Identifies list of machines for fault to be executed based on various selection criteria
like - Traffic %, Rack, Colo, Random etc.
How does it work?
Capsule : Self Contained Execution Engine.
Self contained module which acts as an agent and executes faults on the Machine under Test.
● Deployed on demand to the machines under test. Self contained entity on its own,
only dependent on Tower for basic instructions
● Bootstrap checks ensures that the capsule was not tampered with, only used on the
machines it was signed for and also expires after a specific TTL.
● All the actions performed by Capsule are time bound thus ensuring no system will
go into unresponsive state and will automatically trigger fault termination and system
restore to bring back the system to healthy state as soon as possible.
● Executes the faults based on the Ansible styled - Fault Playbook (user defined
steps).
How does it work?
Milestones and Roadmap
● Basic Prototype was completed in January 2016 and it showed promising results.
● Plan to run this tool extensively against Groupon services over next few months (Q2
2016).
● Plan to publish a follow up article with Detailed Implementation, Learnings and Issues
Faced by Q3 2016.
● Long Term goal to Open Source the tool and evangelize for adaptation across industry
and contribution from academia.
Questions?

Contenu connexe

Tendances

Introduction to Unified Functional Testing 12 (UFT)
Introduction to Unified Functional Testing 12 (UFT)Introduction to Unified Functional Testing 12 (UFT)
Introduction to Unified Functional Testing 12 (UFT)Archana Krushnan
 
Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений"
Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений" Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений"
Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений" Fwdays
 
100 tests per second - 40 releases per week
100 tests per second - 40 releases per week100 tests per second - 40 releases per week
100 tests per second - 40 releases per weekLars Thorup
 
Test automation
Test automationTest automation
Test automationXavier Yin
 
Java Exceptions Best Practices
Java Exceptions Best PracticesJava Exceptions Best Practices
Java Exceptions Best PracticesArtur Mkrtchyan
 
Top 5 pitfalls of software test automatiion
Top 5 pitfalls of software test automatiionTop 5 pitfalls of software test automatiion
Top 5 pitfalls of software test automatiionekatechserv
 
Case study: QTP to Selenium migration
Case study: QTP to Selenium migrationCase study: QTP to Selenium migration
Case study: QTP to Selenium migrationTarun Lalwani
 
Qtp training session I
Qtp training session IQtp training session I
Qtp training session IAisha Mazhar
 
Automated Regression Test Pitch Presentation
Automated Regression Test Pitch PresentationAutomated Regression Test Pitch Presentation
Automated Regression Test Pitch PresentationMark Hesketh
 
Session 01 - Introduction to UFT and Features - Slides
Session 01 - Introduction to UFT and Features - SlidesSession 01 - Introduction to UFT and Features - Slides
Session 01 - Introduction to UFT and Features - Slidesrajaselv
 
How to Select the Right Automation Testing Tool
How to Select the Right Automation Testing ToolHow to Select the Right Automation Testing Tool
How to Select the Right Automation Testing ToolExist
 
Introduction to Automation Testing
Introduction to Automation TestingIntroduction to Automation Testing
Introduction to Automation TestingArchana Krushnan
 
Choosing the Best Open Source Test Automation Tool for You
Choosing the Best Open Source Test Automation Tool for YouChoosing the Best Open Source Test Automation Tool for You
Choosing the Best Open Source Test Automation Tool for YouPerfecto by Perforce
 
Selenium Camp 2016 - Kiev, Ukraine
Selenium Camp 2016 -  Kiev, UkraineSelenium Camp 2016 -  Kiev, Ukraine
Selenium Camp 2016 - Kiev, UkraineJustin Ison
 
Automation Testing
Automation TestingAutomation Testing
Automation TestingRajat Tiwari
 
Java Exceptions and Exception Handling
 Java  Exceptions and Exception Handling Java  Exceptions and Exception Handling
Java Exceptions and Exception HandlingMaqdamYasir
 
Merge hells - Feature Toggles to the rescue
Merge hells - Feature Toggles to the rescueMerge hells - Feature Toggles to the rescue
Merge hells - Feature Toggles to the rescueLeena N
 

Tendances (20)

Introduction to Unified Functional Testing 12 (UFT)
Introduction to Unified Functional Testing 12 (UFT)Introduction to Unified Functional Testing 12 (UFT)
Introduction to Unified Functional Testing 12 (UFT)
 
Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений"
Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений" Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений"
Александр Махомет "Feature Flags. Уменьшаем риски при выпуске изменений"
 
100 tests per second - 40 releases per week
100 tests per second - 40 releases per week100 tests per second - 40 releases per week
100 tests per second - 40 releases per week
 
Test automation
Test automationTest automation
Test automation
 
Java Exceptions Best Practices
Java Exceptions Best PracticesJava Exceptions Best Practices
Java Exceptions Best Practices
 
Top 5 pitfalls of software test automatiion
Top 5 pitfalls of software test automatiionTop 5 pitfalls of software test automatiion
Top 5 pitfalls of software test automatiion
 
Case study: QTP to Selenium migration
Case study: QTP to Selenium migrationCase study: QTP to Selenium migration
Case study: QTP to Selenium migration
 
Ug. marketplace testing
Ug. marketplace testingUg. marketplace testing
Ug. marketplace testing
 
Qtp training session I
Qtp training session IQtp training session I
Qtp training session I
 
Feature toggling
Feature togglingFeature toggling
Feature toggling
 
test
testtest
test
 
Automated Regression Test Pitch Presentation
Automated Regression Test Pitch PresentationAutomated Regression Test Pitch Presentation
Automated Regression Test Pitch Presentation
 
Session 01 - Introduction to UFT and Features - Slides
Session 01 - Introduction to UFT and Features - SlidesSession 01 - Introduction to UFT and Features - Slides
Session 01 - Introduction to UFT and Features - Slides
 
How to Select the Right Automation Testing Tool
How to Select the Right Automation Testing ToolHow to Select the Right Automation Testing Tool
How to Select the Right Automation Testing Tool
 
Introduction to Automation Testing
Introduction to Automation TestingIntroduction to Automation Testing
Introduction to Automation Testing
 
Choosing the Best Open Source Test Automation Tool for You
Choosing the Best Open Source Test Automation Tool for YouChoosing the Best Open Source Test Automation Tool for You
Choosing the Best Open Source Test Automation Tool for You
 
Selenium Camp 2016 - Kiev, Ukraine
Selenium Camp 2016 -  Kiev, UkraineSelenium Camp 2016 -  Kiev, Ukraine
Selenium Camp 2016 - Kiev, Ukraine
 
Automation Testing
Automation TestingAutomation Testing
Automation Testing
 
Java Exceptions and Exception Handling
 Java  Exceptions and Exception Handling Java  Exceptions and Exception Handling
Java Exceptions and Exception Handling
 
Merge hells - Feature Toggles to the rescue
Merge hells - Feature Toggles to the rescueMerge hells - Feature Toggles to the rescue
Merge hells - Feature Toggles to the rescue
 

En vedette

Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...
Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...
Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...Dianne Marsh
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformSudhir Tonse
 
I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...Gareth Bowles
 
Каталог кабельных креплений ПЭОТЭК
Каталог кабельных креплений ПЭОТЭККаталог кабельных креплений ПЭОТЭК
Каталог кабельных креплений ПЭОТЭКAleksandr Yakovlev
 
The Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single StepThe Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single StepBruce Wong
 
Imágenes representando conflicto
Imágenes representando conflicto Imágenes representando conflicto
Imágenes representando conflicto Elver Hernandez
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixJosh Evans
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at NetflixGareth Bowles
 
TMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in RoboticsTMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in RoboticsIosif Itkin
 
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...Iosif Itkin
 
TMPA-2015: Automated process of creating test scenarios for financial protoco...
TMPA-2015: Automated process of creating test scenarios for financial protoco...TMPA-2015: Automated process of creating test scenarios for financial protoco...
TMPA-2015: Automated process of creating test scenarios for financial protoco...Iosif Itkin
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemIosif Itkin
 
TMPA-2015: Multi-Module Application Tracing in z/OS Environment
TMPA-2015: Multi-Module Application Tracing in z/OS EnvironmentTMPA-2015: Multi-Module Application Tracing in z/OS Environment
TMPA-2015: Multi-Module Application Tracing in z/OS EnvironmentIosif Itkin
 
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade SystemsTMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade SystemsIosif Itkin
 
TMPA-2015: Lexical analysis of dynamically formed string expressions
TMPA-2015: Lexical analysis of dynamically formed string expressionsTMPA-2015: Lexical analysis of dynamically formed string expressions
TMPA-2015: Lexical analysis of dynamically formed string expressionsIosif Itkin
 
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...Iosif Itkin
 
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...Iosif Itkin
 
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...Iosif Itkin
 

En vedette (20)

Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...
Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...
Open Business Conference: Continuous Delivery At Netflix -- Powered by Open S...
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud Platform
 
I Don't Test Often ...
I Don't Test Often ...I Don't Test Often ...
I Don't Test Often ...
 
D1.03 ppt market research-v5
D1.03 ppt market research-v5D1.03 ppt market research-v5
D1.03 ppt market research-v5
 
Каталог кабельных креплений ПЭОТЭК
Каталог кабельных креплений ПЭОТЭККаталог кабельных креплений ПЭОТЭК
Каталог кабельных креплений ПЭОТЭК
 
The Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single StepThe Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single Step
 
Tema 2 apps
Tema 2 apps Tema 2 apps
Tema 2 apps
 
Imágenes representando conflicto
Imágenes representando conflicto Imágenes representando conflicto
Imágenes representando conflicto
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
 
Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at Netflix
 
TMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in RoboticsTMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in Robotics
 
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
 
TMPA-2015: Automated process of creating test scenarios for financial protoco...
TMPA-2015: Automated process of creating test scenarios for financial protoco...TMPA-2015: Automated process of creating test scenarios for financial protoco...
TMPA-2015: Automated process of creating test scenarios for financial protoco...
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 
TMPA-2015: Multi-Module Application Tracing in z/OS Environment
TMPA-2015: Multi-Module Application Tracing in z/OS EnvironmentTMPA-2015: Multi-Module Application Tracing in z/OS Environment
TMPA-2015: Multi-Module Application Tracing in z/OS Environment
 
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade SystemsTMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
 
TMPA-2015: Lexical analysis of dynamically formed string expressions
TMPA-2015: Lexical analysis of dynamically formed string expressionsTMPA-2015: Lexical analysis of dynamically formed string expressions
TMPA-2015: Lexical analysis of dynamically formed string expressions
 
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
 
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
 
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
 

Similaire à Automated Fault Tolerance Testing

Monitoring & alerting presentation sabin&mustafa
Monitoring & alerting presentation sabin&mustafaMonitoring & alerting presentation sabin&mustafa
Monitoring & alerting presentation sabin&mustafaLama K Banna
 
Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?Aspire Systems
 
WSO2Con Asia 2014 - Effective Test Automation in an Agile Environment
WSO2Con Asia 2014 - Effective Test Automation in an Agile EnvironmentWSO2Con Asia 2014 - Effective Test Automation in an Agile Environment
WSO2Con Asia 2014 - Effective Test Automation in an Agile EnvironmentWSO2
 
The Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security TestingThe Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security TestingMatt Tesauro
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsPeter Mounce
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
Testing strategies and best practices using MUnit
Testing strategies and best practices using MUnitTesting strategies and best practices using MUnit
Testing strategies and best practices using MUnitJimmy Attia
 
Progressive Mobile Test Automation
Progressive Mobile Test AutomationProgressive Mobile Test Automation
Progressive Mobile Test AutomationRakhi Jain Rohatgi
 
Automated Exploratory Testing
Automated Exploratory TestingAutomated Exploratory Testing
Automated Exploratory TestingJustin Ison
 
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...
DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...
DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...Journal For Research
 
Justin Ison
Justin IsonJustin Ison
Justin IsonCodeFest
 
Project Onion unit test environment
Project Onion unit test environmentProject Onion unit test environment
Project Onion unit test environmentAbhinav Jha
 
Software testing mtech project in jalandhar
Software testing mtech project in jalandharSoftware testing mtech project in jalandhar
Software testing mtech project in jalandhardeepikakaler1
 

Similaire à Automated Fault Tolerance Testing (20)

Monitoring & alerting presentation sabin&mustafa
Monitoring & alerting presentation sabin&mustafaMonitoring & alerting presentation sabin&mustafa
Monitoring & alerting presentation sabin&mustafa
 
unit-5 SPM.pptx
unit-5 SPM.pptxunit-5 SPM.pptx
unit-5 SPM.pptx
 
Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?
 
WSO2Con Asia 2014 - Effective Test Automation in an Agile Environment
WSO2Con Asia 2014 - Effective Test Automation in an Agile EnvironmentWSO2Con Asia 2014 - Effective Test Automation in an Agile Environment
WSO2Con Asia 2014 - Effective Test Automation in an Agile Environment
 
Wso2con test-automation
Wso2con test-automationWso2con test-automation
Wso2con test-automation
 
The Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security TestingThe Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security Testing
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
Testing strategies and best practices using MUnit
Testing strategies and best practices using MUnitTesting strategies and best practices using MUnit
Testing strategies and best practices using MUnit
 
Android testing part i
Android testing part iAndroid testing part i
Android testing part i
 
Progressive Mobile Test Automation
Progressive Mobile Test AutomationProgressive Mobile Test Automation
Progressive Mobile Test Automation
 
Automation for developers
Automation for developersAutomation for developers
Automation for developers
 
TestIstanbul 2015
TestIstanbul 2015TestIstanbul 2015
TestIstanbul 2015
 
Automated Exploratory Testing
Automated Exploratory TestingAutomated Exploratory Testing
Automated Exploratory Testing
 
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...
DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...
DEPLOYMENT OF CALABASH AUTOMATION FRAMEWORK TO ANALYZE THE PERFORMANCE OF AN ...
 
Justin Ison
Justin IsonJustin Ison
Justin Ison
 
Project Onion unit test environment
Project Onion unit test environmentProject Onion unit test environment
Project Onion unit test environment
 
Software testing mtech project in jalandhar
Software testing mtech project in jalandharSoftware testing mtech project in jalandhar
Software testing mtech project in jalandhar
 

Dernier

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 

Dernier (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

Automated Fault Tolerance Testing

  • 1. Automated Fault-Tolerance Testing Ajay Vaddadi, Groupon April 11th, 2016
  • 2. About Me and My Team ● Lead Internal Tools Developer @ Groupon. ● Employed with Groupon for 4 years. ● Built various Internal tools ranging from Test Automation Frameworks to Continuous Performance and Deployment Tools (Story for another time …) ● My Team - ○ Adithya Nagarajan (Team Manager and Co-Author on the Paper) ○ Amrutha Badami ○ Kavin Arasu
  • 3. Fault Tolerance - What is it and why is it important ? ● Fault Tolerance is an ability of computer software to continue its normal operation despite the presence of system or hardware faults. ● Fault in Software terms can vary from high latency from dependent or sub- systems to complete failure of the System under test (SUT). ● Fault injection can easily be simulated and overall environment behaviour can be observed to make changes in design to be better tolerant to the failure. ● Many downtimes and economic losses would have been avoided if were tested for fault tolerance - ○ ATT Complete Network Failure (1990) - SPOF switch failure. ○ Amazon Outage (2011) - Interestingly Netflix was not affected.
  • 5. Engineering Scale @ Groupon ● Microservices Based Architecture ● 500+ Production Services ● In-house, Self-Managed Data Centres spread across NA, Europe & Asia Pacific ● Complex Interdependencies between Services
  • 6. What are the Requirements ? ● Ability to Simulate Failures and Identify resiliency issues within the Groupon Architecture ● Ability to understand the Failure propagation pattern when a failure occurs. ● Ability to understand service architecture to better identify and group faults on a unique set of application servers/apps/machines. ● Ability to continuously monitor critical metrics during the faults on the SUT and dependent Services and terminate when anomaly is detected. ● Ability to recover the System under Test to pre-fault state.
  • 7. ● Companies like Google, Amazon, Twitter and Netflix conduct fault tolerance testing on their platform. ● NETFLIX was one of the early players of in this Fault testing domain by developing widely popular CHAOS MONKEY / SIMIAN ARMY ● Other tools such as Varien, Pantera are mocked versions of SIMIAN ARMY developed in other ecosystems like python. ● Either very company specific and are not reusable OR highly geared towards Cloud based Infra such as Amazon Web Services etc. What is already done ? Available Tools : Are we reinventing the wheel ?
  • 8. ScrewDriver - Our Proposed Solution ..
  • 9. ● Capsule Builder - Generates time based expirable, machine specific capsule/container to be deployed on the machines specified by Topology service. ● Controller Core - Starts/Stops the fault/capsule with proper authentication. During fault execution works with Anomaly detector using Publisher/Subscriber model. ● Anomaly Detector - Interacts with Metric Client and identifies Anomalies. If found - aborts the current fault execution. Sometimes, Static Threshold checks are not enough to identify anomalies and need to delve into 2nd and 3rd derivatives. ● Notification Module - Communicates the status about the fault execution and post- execution report to the service owners. Controller (and it’s Friends) Controller is a combination of 4 major modules focussing on 3 Key aspects - Building and Deploying Capsule, Injecting Fault(Core), Monitoring the Fault(Anomaly Detector). How does it work?
  • 10. Topology Translation Engine Service to parse and understand the building blocks (components/layers) of any service. Picks the machines and the corresponding faults to be executed and identifies the dependencies for the given service to monitor. ● Consumes topology YAML file for a given service and identifies underlying components/layers, machines, monitors and dependencies for the service. ● Information is pushed to GraphDB(Neo4j) for persistent storage. ● GraphDB allows us to identify dependency chain between systems without complex joins. ● Identifies list of machines for fault to be executed based on various selection criteria like - Traffic %, Rack, Colo, Random etc. How does it work?
  • 11. Capsule : Self Contained Execution Engine. Self contained module which acts as an agent and executes faults on the Machine under Test. ● Deployed on demand to the machines under test. Self contained entity on its own, only dependent on Tower for basic instructions ● Bootstrap checks ensures that the capsule was not tampered with, only used on the machines it was signed for and also expires after a specific TTL. ● All the actions performed by Capsule are time bound thus ensuring no system will go into unresponsive state and will automatically trigger fault termination and system restore to bring back the system to healthy state as soon as possible. ● Executes the faults based on the Ansible styled - Fault Playbook (user defined steps). How does it work?
  • 12. Milestones and Roadmap ● Basic Prototype was completed in January 2016 and it showed promising results. ● Plan to run this tool extensively against Groupon services over next few months (Q2 2016). ● Plan to publish a follow up article with Detailed Implementation, Learnings and Issues Faced by Q3 2016. ● Long Term goal to Open Source the tool and evangelize for adaptation across industry and contribution from academia.