SlideShare a Scribd company logo
1 of 66
Download to read offline
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Arun Gupta, @arungupta
Principal Open Source Technologist,
Amazon Web Services
Using Chaos to Bring Resiliency
to Your Applications in
Kubernetes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Failures are a given and
everything will eventually
fail over time.
https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
Amazon 2006
GameDay: Creating
Resiliency Through
Destruction
Jesse Robbins
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Monkeys
https://github.com/Netflix/SimianArmy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Resilience
Ability of a system to adapt
to changes, failures, and disturbances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering is the discipline of
experimenting on a distributed system in
order to build confidence in the system’s
capability to withstand turbulent
conditions in production
Credit: https://www.flickr.com/photos/loseryouthcrew/8775130600/
https://principlesofchaos.org/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Bad things will happen to your system,
no matter how well designed it is
You cannot become ignorant to it
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Break your systems on purpose
Find out their weaknesses and
fix them before they break when least expected
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos doesn’t cause problems.
It reveals them.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Application level
• Host failure
• Resource attacks (CPU, memory, …)
• Network attacks (dependencies, latency, …)
• Region attacks!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Where do you inject Chaos?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero
”Normal” behavior of your system
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Business metric
https://medium.com/netflix-
techblog/sps-the-pulse-of-
netflix-streaming-
ae4db0e05f8a
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• a service gives 404 or 503?
• latency increases by 300ms?
• the port is not accessible?
• security group rules changed?
• the database stops?
• excessive number of requests come?
• iptables are wiped out?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pick hypothesis
Scope the experiment
Identify metrics
Notify the organization
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Start with very small
As close as possible to production
Minimize the blast radius.
Have an emergency STOP!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Users
Canary deployment
99%
users
1%
users
Start with...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Time to detect?
Time for notification? And escalation?
Time to public notification?
Time for graceful degradation to kick-in?
Time for self healing to happen?
Time to recovery—partial and full?
Time to all-clear and stable?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
DON’T blame that one person…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
PostMortems—COE (Correction of Errors)
The 5 WHYs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fix
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Failure free operations require
experience with failure.
http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Reconciles desired and actual state for pods
Distributes pods across AZs
Automatic health-check based restarts
Rolling deployment of a service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster with Amazon EKS
AWS managed
Customer account
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster with Amazon EKS
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Region and Availability Zones
Control Plane is highly available
Master and Workers are configured in ASG
Master instance type auto-scaling
Etcd is HA and backed up every hour
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos in a Kubernetes cluster
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
x
x
Health check?
Dead node?
x
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio
Chaos Toolkit
Kube Monkey
PowerfulSeal
Gremlin
Simian Army
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio
Intelligent routing
and load balancing
Resilience across
languages and
platforms
Fleet-wide policy
enforcement
In-depth
telemetry
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Timeouts
Bounded retries with timeout budget
Concurrent connections limit and request load
Active health checks (periodic)
Passive health checks (circuit breakers)
AZ-aware load balancing with automatic failover
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Timing failures
• Increased network latency
• Overloaded upstream service
• Crashes
• HTTP error codes
• TCP connection failures
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fault injection using Istio—timeout
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting
spec:
hosts:
- greeting
http:
- fault:
delay:
fixedDelay: 10s
percent: 100
route:
- destination:
host: greeting
subset: greeting-hello
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fault injection using Istio—HTTP abort
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting
spec:
hosts:
- greeting
http:
- fault:
abort:
httpStatus: 500
percent: 100
route:
- destination:
host: greeting
subset: greeting-hello
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio traffic management
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting-virtual-service
spec:
hosts:
- greeting
http:
- route:
- destination:
host: greeting
subset: greeting-hello
weight: 75
- destination:
host: greeting
subset: greeting-howdy
weight: 25
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
- name: greeting-howdy
labels:
greeting: howdy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio circuit breaker
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://istio.io/docs/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit
Open API for Chaos Engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
CLI-driven
Experiments declared in JSON/YAML files
Open specification
Extensible: Kubernetes, AWS, Spring, others
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit follows the principles of chaos
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
query a system to observe a behavior
• Check state of a pod with a specific label
• Multiple probes to define steady state
real-world events
• Terminate a deployment
• Multiple actions simulate events
Types of probe and method
• Process: Run a binary
• HTTP: Invoke a HTTP endpoint
• Python: Call a Python function to perform richer operations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit metadata
{
"version": "1.0.0",
"title": "Terminating the greeting service should not impact users",
"description": "How does the greeting service unavailbility impacts our users? Do they see
an error or does the webapp gets slower?",
"tags": [
"kubernetes",
"aws"
],
"configuration": {
"web_app_url": {
"type": "env",
"key": "WEBAPP_URL"
}
},
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit steady state & hypothesis
"steady-state-hypothesis": {
"title": "Services are all available and healthy",
"probes": [
{
"type": "probe",
"name": "alive-and-healthy",
"tolerance": true,
"provider": {
"type": "python",
"module": "chaosk8s.pod.probes",
"func": "pods_in_phase",
"arguments": {
"label_selector": "app=webapp-pod",
"phase": "Running",
"ns": "default"
}
}
},
{
"type": "probe",
"name": "application-must-respond-normally",
"tolerance": 200,
"provider": {
"type": "http",
"url": "${web_app_url}",
"timeout": 3
}
}
]
},
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit experiment & verify
"method": [
{
"type": "action",
"name": "terminate-greeting-service",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=greeter-pod",
"ns": "default"
}
}
},
{
"type": "probe",
"name": "fetch-application-logs",
"provider": {
"type": "python",
"module": "chaosk8s.pod.probes",
"func": "read_pod_logs",
"arguments": {
"label_selector": "app=webapp-pod",
"last": "20s",
"ns": "default"
}
}
}
],
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit run
$ chaos run experiments/experiment.json
[2018-03-10 14:42:38 INFO] Validating the experiment's syntax
[2018-03-10 14:42:38 INFO] Experiment looks valid
[2018-03-10 14:42:38 INFO] Running experiment: Terminate the greeting service should not impact users
[2018-03-10 14:42:38 INFO] Steady state hypothesis: Services are all available and healthy
[2018-03-10 14:42:38 INFO] Probe: application-should-be-alive-and-healthy
[2018-03-10 14:42:38 INFO] Probe: application-must-respond-normally
[2018-03-10 14:42:39 INFO] Steady state hypothesis is met!
[2018-03-10 14:42:39 INFO] Action: terminate-greeting-service
[2018-03-10 14:42:40 INFO] Probe: fetch-application-logs
[2018-03-10 14:42:41 INFO] Steady state hypothesis: Services are all available and healthy
[2018-03-10 14:42:41 INFO] Probe: application-should-be-alive-and-healthy
[2018-03-10 14:42:42 INFO] Probe: application-must-respond-normally
[2018-03-10 14:42:45 ERROR] => failed: activity took too long to complete
[2018-03-10 14:42:45 CRITICAL] Steady state probe 'application-must-respond-normally' is not in the
given tolerance so failing this experiment
[2018-03-10 14:42:45 INFO] Let's rollback...
[2018-03-10 14:42:45 INFO] No declared rollbacks, let's move on.
[2018-03-10 14:42:45 INFO] Experiment ended with status: failed
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://github.com/chaostoolkit/chaostoolkit/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Implementation of Netflix’s Chaos Monkey for Kubernetes
Randomly deletes pods in the cluster
Applications opt-in using annotations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Run Kube-Monkey—create configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: kube-system
data:
config.toml: |
[kubemonkey]
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["kube-system"]
whitelisted_namespaces = [""]
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kube-Monkey application opt-in
apiVersion: apps/v1
kind: Deployment
. . .
template:
metadata:
labels:
app: greeting
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim-pods
kube-monkey/mtbf: 2
kube-monkey/kill-mode: random-max-percent
kube-monkey/kill-value: 40
spec:
containers:
- name: greeting
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://github.com/asobti/kube-monkey
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering working group @ CNCF
https://github.com/chaoseng/wg-chaoseng
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering mind map
https://bit.ly/2uKOJMQ
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
You don’t chose the moment,
the moment chooses you.
You only choose how prepared
you are, when it does.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thank you!

More Related Content

What's hot

An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
 
AWS Control Tower introduces Terraform account provisioning and customization
AWS Control Tower introduces Terraform account provisioning and customizationAWS Control Tower introduces Terraform account provisioning and customization
AWS Control Tower introduces Terraform account provisioning and customizationDhaval Soni
 
Introduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsIntroduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsAmazon Web Services
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SREAraf Karsh Hamid
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Amazon Web Services
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto ScalingAmazon Web Services
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgNils Meder
 
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...Amazon Web Services
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Amazon Web Services
 
Plan Advanced AWS Networking Architectures - SRV323 - Chicago AWS Summit
Plan Advanced AWS Networking Architectures - SRV323 - Chicago AWS SummitPlan Advanced AWS Networking Architectures - SRV323 - Chicago AWS Summit
Plan Advanced AWS Networking Architectures - SRV323 - Chicago AWS SummitAmazon Web Services
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Web Services
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeMartin Schütte
 
AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...
AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...
AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...Edureka!
 

What's hot (20)

Amazon EKS Deep Dive
Amazon EKS Deep DiveAmazon EKS Deep Dive
Amazon EKS Deep Dive
 
Fundamentals of AWS Security
Fundamentals of AWS SecurityFundamentals of AWS Security
Fundamentals of AWS Security
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
AWS Control Tower introduces Terraform account provisioning and customization
AWS Control Tower introduces Terraform account provisioning and customizationAWS Control Tower introduces Terraform account provisioning and customization
AWS Control Tower introduces Terraform account provisioning and customization
 
Introduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsIntroduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless Applications
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
 
Cloud Migration: A How-To Guide
Cloud Migration: A How-To GuideCloud Migration: A How-To Guide
Cloud Migration: A How-To Guide
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
 
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...
Monitor All Your Things: Amazon CloudWatch in Action with BBC (DEV302) - AWS ...
 
Deep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWSDeep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWS
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
 
Plan Advanced AWS Networking Architectures - SRV323 - Chicago AWS Summit
Plan Advanced AWS Networking Architectures - SRV323 - Chicago AWS SummitPlan Advanced AWS Networking Architectures - SRV323 - Chicago AWS Summit
Plan Advanced AWS Networking Architectures - SRV323 - Chicago AWS Summit
 
CI/CD on AWS
CI/CD on AWSCI/CD on AWS
CI/CD on AWS
 
AWS Cloud Security Fundamentals
AWS Cloud Security FundamentalsAWS Cloud Security Fundamentals
AWS Cloud Security Fundamentals
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...
AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...
AWS IAM Tutorial | Identity And Access Management (IAM) | AWS Training Videos...
 

Similar to Chaos Engineering with Kubernetes

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringAmazon Web Services
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Amazon Web Services
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
 
Using chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsUsing chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsJohn Varghese
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Yan Cui
 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedAWS User Group Bengaluru
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Amazon Web Services
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSDThomas Delteil
 
Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18CodeOps Technologies LLP
 
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Amazon Web Services
 
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Amazon Web Services
 
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksImprove Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksAmazon Web Services
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your FirewallAmazon Web Services
 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayAmazon Web Services
 
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018Amazon Web Services
 
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Amazon Web Services
 

Similar to Chaos Engineering with Kubernetes (20)

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
Using chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsUsing chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applications
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
 
Intro to SageMaker
Intro to SageMakerIntro to SageMaker
Intro to SageMaker
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
 
Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18
 
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
 
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
 
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksImprove Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your Firewall
 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat Way
 
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
How to Avoid Common Mistakes at Scale: AWS Developer Workshop at Web Summit 2018
 
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
 

More from Arun Gupta

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdfArun Gupta
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Arun Gupta
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerArun Gupta
 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Arun Gupta
 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open SourceArun Gupta
 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using KubernetesArun Gupta
 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native ApplicationsArun Gupta
 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMArun Gupta
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Arun Gupta
 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteArun Gupta
 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Arun Gupta
 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitArun Gupta
 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeArun Gupta
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017Arun Gupta
 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftArun Gupta
 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersArun Gupta
 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!Arun Gupta
 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersArun Gupta
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessArun Gupta
 

More from Arun Gupta (20)

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019
 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open Source
 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using Kubernetes
 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native Applications
 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAM
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018
 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 Keynote
 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018
 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv Summit
 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's Landscape
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017
 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShift
 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developers
 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!
 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to Containers
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Chaos Engineering with Kubernetes

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Arun Gupta, @arungupta Principal Open Source Technologist, Amazon Web Services Using Chaos to Bring Resiliency to Your Applications in Kubernetes
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Failures are a given and everything will eventually fail over time. https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://www.youtube.com/watch?v=zoz0ZjfrQ9s Amazon 2006 GameDay: Creating Resiliency Through Destruction Jesse Robbins
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Monkeys https://github.com/Netflix/SimianArmy
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Resilience Ability of a system to adapt to changes, failures, and disturbances
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production Credit: https://www.flickr.com/photos/loseryouthcrew/8775130600/ https://principlesofchaos.org/
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Bad things will happen to your system, no matter how well designed it is You cannot become ignorant to it
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Break your systems on purpose Find out their weaknesses and fix them before they break when least expected
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos doesn’t cause problems. It reveals them.
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Application level • Host failure • Resource attacks (CPU, memory, …) • Network attacks (dependencies, latency, …) • Region attacks!
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Where do you inject Chaos?
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero ”Normal” behavior of your system
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Business metric https://medium.com/netflix- techblog/sps-the-pulse-of- netflix-streaming- ae4db0e05f8a
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • a service gives 404 or 503? • latency increases by 300ms? • the port is not accessible? • security group rules changed? • the database stops? • excessive number of requests come? • iptables are wiped out?
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pick hypothesis Scope the experiment Identify metrics Notify the organization
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Start with very small As close as possible to production Minimize the blast radius. Have an emergency STOP!
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Users Canary deployment 99% users 1% users Start with...
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Time to detect? Time for notification? And escalation? Time to public notification? Time for graceful degradation to kick-in? Time for self healing to happen? Time to recovery—partial and full? Time to all-clear and stable?
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark DON’T blame that one person…
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark PostMortems—COE (Correction of Errors) The 5 WHYs
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fix
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Failure free operations require experience with failure. http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Reconciles desired and actual state for pods Distributes pods across AZs Automatic health-check based restarts Rolling deployment of a service
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster with Amazon EKS AWS managed Customer account
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster with Amazon EKS mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 Kubectl
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Region and Availability Zones Control Plane is highly available Master and Workers are configured in ASG Master instance type auto-scaling Etcd is HA and backed up every hour
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos in a Kubernetes cluster mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 Kubectl x x Health check? Dead node? x
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio Chaos Toolkit Kube Monkey PowerfulSeal Gremlin Simian Army
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio Intelligent routing and load balancing Resilience across languages and platforms Fleet-wide policy enforcement In-depth telemetry
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Timeouts Bounded retries with timeout budget Concurrent connections limit and request load Active health checks (periodic) Passive health checks (circuit breakers) AZ-aware load balancing with automatic failover
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Timing failures • Increased network latency • Overloaded upstream service • Crashes • HTTP error codes • TCP connection failures
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fault injection using Istio—timeout apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting spec: hosts: - greeting http: - fault: delay: fixedDelay: 10s percent: 100 route: - destination: host: greeting subset: greeting-hello --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fault injection using Istio—HTTP abort apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting spec: hosts: - greeting http: - fault: abort: httpStatus: 500 percent: 100 route: - destination: host: greeting subset: greeting-hello
  • 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio traffic management apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting-virtual-service spec: hosts: - greeting http: - route: - destination: host: greeting subset: greeting-hello weight: 75 - destination: host: greeting subset: greeting-howdy weight: 25 --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello - name: greeting-howdy labels: greeting: howdy
  • 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio circuit breaker apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello trafficPolicy: connectionPool: tcp: maxConnections: 100
  • 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://istio.io/docs/
  • 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit Open API for Chaos Engineering
  • 51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark CLI-driven Experiments declared in JSON/YAML files Open specification Extensible: Kubernetes, AWS, Spring, others
  • 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit follows the principles of chaos
  • 53. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark query a system to observe a behavior • Check state of a pod with a specific label • Multiple probes to define steady state real-world events • Terminate a deployment • Multiple actions simulate events Types of probe and method • Process: Run a binary • HTTP: Invoke a HTTP endpoint • Python: Call a Python function to perform richer operations
  • 54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit metadata { "version": "1.0.0", "title": "Terminating the greeting service should not impact users", "description": "How does the greeting service unavailbility impacts our users? Do they see an error or does the webapp gets slower?", "tags": [ "kubernetes", "aws" ], "configuration": { "web_app_url": { "type": "env", "key": "WEBAPP_URL" } },
  • 55. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit steady state & hypothesis "steady-state-hypothesis": { "title": "Services are all available and healthy", "probes": [ { "type": "probe", "name": "alive-and-healthy", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.pod.probes", "func": "pods_in_phase", "arguments": { "label_selector": "app=webapp-pod", "phase": "Running", "ns": "default" } } }, { "type": "probe", "name": "application-must-respond-normally", "tolerance": 200, "provider": { "type": "http", "url": "${web_app_url}", "timeout": 3 } } ] },
  • 56. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit experiment & verify "method": [ { "type": "action", "name": "terminate-greeting-service", "provider": { "type": "python", "module": "chaosk8s.pod.actions", "func": "terminate_pods", "arguments": { "label_selector": "app=greeter-pod", "ns": "default" } } }, { "type": "probe", "name": "fetch-application-logs", "provider": { "type": "python", "module": "chaosk8s.pod.probes", "func": "read_pod_logs", "arguments": { "label_selector": "app=webapp-pod", "last": "20s", "ns": "default" } } } ],
  • 57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit run $ chaos run experiments/experiment.json [2018-03-10 14:42:38 INFO] Validating the experiment's syntax [2018-03-10 14:42:38 INFO] Experiment looks valid [2018-03-10 14:42:38 INFO] Running experiment: Terminate the greeting service should not impact users [2018-03-10 14:42:38 INFO] Steady state hypothesis: Services are all available and healthy [2018-03-10 14:42:38 INFO] Probe: application-should-be-alive-and-healthy [2018-03-10 14:42:38 INFO] Probe: application-must-respond-normally [2018-03-10 14:42:39 INFO] Steady state hypothesis is met! [2018-03-10 14:42:39 INFO] Action: terminate-greeting-service [2018-03-10 14:42:40 INFO] Probe: fetch-application-logs [2018-03-10 14:42:41 INFO] Steady state hypothesis: Services are all available and healthy [2018-03-10 14:42:41 INFO] Probe: application-should-be-alive-and-healthy [2018-03-10 14:42:42 INFO] Probe: application-must-respond-normally [2018-03-10 14:42:45 ERROR] => failed: activity took too long to complete [2018-03-10 14:42:45 CRITICAL] Steady state probe 'application-must-respond-normally' is not in the given tolerance so failing this experiment [2018-03-10 14:42:45 INFO] Let's rollback... [2018-03-10 14:42:45 INFO] No declared rollbacks, let's move on. [2018-03-10 14:42:45 INFO] Experiment ended with status: failed
  • 58. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://github.com/chaostoolkit/chaostoolkit/
  • 59. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Implementation of Netflix’s Chaos Monkey for Kubernetes Randomly deletes pods in the cluster Applications opt-in using annotations
  • 60. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Run Kube-Monkey—create configuration apiVersion: v1 kind: ConfigMap metadata: name: kube-monkey-config-map namespace: kube-system data: config.toml: | [kubemonkey] run_hour = 8 start_hour = 10 end_hour = 16 blacklisted_namespaces = ["kube-system"] whitelisted_namespaces = [""]
  • 61. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kube-Monkey application opt-in apiVersion: apps/v1 kind: Deployment . . . template: metadata: labels: app: greeting kube-monkey/enabled: enabled kube-monkey/identifier: monkey-victim-pods kube-monkey/mtbf: 2 kube-monkey/kill-mode: random-max-percent kube-monkey/kill-value: 40 spec: containers: - name: greeting
  • 62. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://github.com/asobti/kube-monkey
  • 63. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering working group @ CNCF https://github.com/chaoseng/wg-chaoseng
  • 64. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering mind map https://bit.ly/2uKOJMQ
  • 65. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark You don’t chose the moment, the moment chooses you. You only choose how prepared you are, when it does.
  • 66. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thank you!