SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
SELF-HEALING SERVERLESS APPLICATIONS
PDX Serverless Meetup June 2018
NATE TAGGART
AWS | LAMBDA FEATURES PAGE
AWS Lambda invokes your code only when
needed and automatically scales to support the
rate of incoming requests without requiring you
to configure anything. There is no limit to the
number of requests your code can handle.
The Promise:
SELF-HEALING SERVERLESS APPLICATIONS | PG2
AWS | LAMBDA FEATURES PAGE
The Reality:
AWS Lambda invokes your code only when
needed and automatically scales to support the
rate of incoming requests without requiring you
to configure anything. There is no limit to the
number of requests your code can handle.
s
architecture
sometimes
certain
s
es
every
can
but
areproperly
^
(suggested edits)
SELF-HEALING SERVERLESS APPLICATIONS | PG3
What to expect

when you’re not expecting.
SELF-HEALING SERVERLESS APPLICATIONS | PG4
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
SELF-HEALING SERVERLESS APPLICATIONS | PG5
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
Synchronous invocations:
• Function fails
• Returns error to caller
• Logs timestamp, error message,
& stack trace to CloudWatch
Asynchronous invocations:
• Retries up to three times (or
more if reading from a stream)
• Caller is unaware of error
• Logs timestamp, error message,
& stack trace to CloudWatch
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
An event triggers your
Lambda to run, but raises
an unhandled exception
in your code.
SELF-HEALING SERVERLESS APPLICATIONS | PG6
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
Synchronous invocations:
• Lambda returns error to caller
(if client hasn’t timed out)
• Logs timestamp and error
message to CloudWatch
Asynchronous invocations:
• Retries up to three times (more
if reading from stream)
• Caller is unaware of error
• Logs timestamp & error
message to CloudWatch
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
An event triggers your
Lambda to run, but
execution does not
complete within the
configured maximum
execution time.
(Lambda’s default
configuration is a 

3-second timeout.)
SELF-HEALING SERVERLESS APPLICATIONS | PG7
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
When noisy:
• Behaves as Uncaught
Exception
• Visible in CloudWatch, but may
be difficult to diagnose without
event visibility
When silent:
• Unexpected application
behavior
• Can be lost permanently
• Can tank performance and
dramatically spike costs
An event triggers your
Lambda to run, but the
message is malformed or
state is improperly
provided causing
unexpected behavior.
SELF-HEALING SERVERLESS APPLICATIONS | PG8
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
Unbuffered invocations:
• Fails to invoke
• No retry
• Visible in CloudWatch metrics,
but not in logs
Buffered invocations:
• Initially fails to invoke
• Will eventually continue
reading from stream as volume
drops
Your application becomes
throttled as more Lambda
instances are required
than are allowed to be
concurrently running by
AWS for your account.
Your compute can’t scale
high enough.
SELF-HEALING SERVERLESS APPLICATIONS | PG9
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
Unbuffered invocations:
• Fails to invoke
• No retry
• Visible in CloudWatch metrics,
nothing in logs

(but really non-obvious)
Buffered invocations:
• Initially fails to invoke
• Will eventually continue
reading from stream as volume
drops
Your application becomes
throttled as more new
Lambda instances are
required than are allowed
to spawn by AWS for your
account.
Your compute can’t scale
fast enough.
SELF-HEALING SERVERLESS APPLICATIONS | PG10
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
Upstream bottlenecks:
• Fails to invoke
• No retry
• Visible in CloudWatch, as long
as you know where to look
Downstream bottlenecks:
• Can throw error, timeout, 

and/or distribute failures to
other functions.
• Can cause cascading failures
• Can tank performance and
dramatically spike costs
Your application is
throttled due to
throughput pressure
upstream or downstream
of your Lambda.
Your architecture can’t
scale enough.
SELF-HEALING SERVERLESS APPLICATIONS | PG11
Introducing:
Self-Healing Serverless Applications
SELF-HEALING SERVERLESS APPLICATIONS | PG12
Self-Healing Design Principles
LEADING PRACTICES FOR RESILIENT SYSTEMS
STANDARDIZE FAIL GRACEFULLY
• Reroute and unblock
• Automate known
solutions
• Notify a human
SELF-HEALING SERVERLESS APPLICATIONS | PG13
Learn to fail.
• Introduce universal
instrumentation
• Collect event-centric
diagnostics
• Give everyone visibility
PLAN FOR FAILURE
• Identify service limits
• Use self-throttling
• Consider alternative
resource types
SELF-HEALING SERVERLESS APPLICATIONS | PG14
Scenario: Uncaught Exceptions
WHEN THINGS BREAK AND YOU DON’T KNOW WHY
PROBLEM
Lambda periodically fails.
Error messages and stack
traces are visible in
CloudWatch logs. Failing
events are lost, making
reproduction difficult.
KEY PRINCIPLES
• Introduce universal
instrumentation
• Collect event-centric
diagnostics
• Give everyone visibility
SOLUTION
• Use function wrapper or
decorator pattern
• Capture and log events
which fail
SELF-HEALING SERVERLESS APPLICATIONS | PG15
Decrease time to resolution by capturing event data.
Event Diagnostics Wrapper Example
SELF-HEALING SERVERLESS APPLICATIONS | PG16
WHEN YOUR LAMBDAS AREN’T GETTING INVOKED
PROBLEM
API Gateway hits
throughput limits and fails
to invoke Lambda on
every request.
KEY PRINCIPLES
• Identify service limits
• Use self-throttling
• Notify a human
SOLUTION
• Implement retries with
exponential backoff
logic for 429 responses
• Raise alarm on:
4XXError
Scenario: Upstream bottleneck
SELF-HEALING SERVERLESS APPLICATIONS | PG17
Don’t overlook client-side solutions to backend failures.
SELF-HEALING SERVERLESS APPLICATIONS | PG18
WHEN EXECUTION TAKES TOO LONG
PROBLEM
Lambda is periodically
timing out.
KEY PRINCIPLES
• Introduce universal
instrumentation
• Use self-throttling
• Consider alternative
resource types
SOLUTION
• Use function wrapper or
decorator pattern
• Evaluate Fargate or
alternative long-running
resources
Scenario: Timeouts
SELF-HEALING SERVERLESS APPLICATIONS | PG19
Enforce your own limits.
Timeout Wrapper Example
SELF-HEALING SERVERLESS APPLICATIONS | PG20
WHEN FAILURES ARE BLOCKING THE REST OF THE STREAM
PROBLEM
Lambda exceptions and/or
timeouts are blocking
processing of a Kinesis
shard.
KEY PRINCIPLES
• Reroute and unblock
• Automate known
solutions
• Consider alternative
resource types
SOLUTION
• Introduce state machine-
type logic
• Move bad messages to
alternate stream
• Potentially architect with
Fargate or SNS
Scenario: Stream processing gets “stuck”
SELF-HEALING SERVERLESS APPLICATIONS | PG21
Small failures are preferable to large ones.
PROBLEM
Your Lambdas have scaled
up but are depleting your
RDS database connection
pools.
KEY PRINCIPLES
• Identify service limits
• Automate known
solutions
• Give everyone visibility
SOLUTION
• Always close database
connections
• Scale your database
• Map your dependencies
Scenario: Downstream bottleneck
WHEN LAMBDA IS OUT-SCALING YOUR DATABASE
SELF-HEALING SERVERLESS APPLICATIONS | PG22
Scale dependencies, too.
Want to try?
app.stackery.io/sign-up
TIMEOUT HANDLING UNIVERSAL INSTRUMENTATION
ERROR RE-ROUTINGBUILD STANDARDIZATION
EVENT DIAGNOSTICS SHARED HEALTH DASHBOARD
@stackeryio

Contenu connexe

Tendances

Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
Matt Tesauro
 

Tendances (20)

Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Client-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command PatternClient-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command Pattern
 
(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
Netflix conductor
Netflix conductorNetflix conductor
Netflix conductor
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
DevOps Days Tel Aviv - Serverless Architecture
DevOps Days Tel Aviv - Serverless ArchitectureDevOps Days Tel Aviv - Serverless Architecture
DevOps Days Tel Aviv - Serverless Architecture
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Top conf serverlezz
Top conf   serverlezzTop conf   serverlezz
Top conf serverlezz
 
Technology | Serverless
Technology | ServerlessTechnology | Serverless
Technology | Serverless
 
Going serverless with aws
Going serverless with awsGoing serverless with aws
Going serverless with aws
 
Next generation pipelines
Next generation pipelinesNext generation pipelines
Next generation pipelines
 
Paasta: Application Delivery at Yelp
Paasta: Application Delivery at YelpPaasta: Application Delivery at Yelp
Paasta: Application Delivery at Yelp
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Fast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda FunctionsFast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda Functions
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
The future of paas is serverless
The future of paas is serverlessThe future of paas is serverless
The future of paas is serverless
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 

Similaire à PDX Serverless Meetup - Self-Healing Serverless Applications

ServerlessPresentation
ServerlessPresentationServerlessPresentation
ServerlessPresentation
Rohit Kumar
 

Similaire à PDX Serverless Meetup - Self-Healing Serverless Applications (20)

Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
AWS Jungle - Lambda
AWS Jungle - LambdaAWS Jungle - Lambda
AWS Jungle - Lambda
 
Operating Your Production API
Operating Your Production APIOperating Your Production API
Operating Your Production API
 
Going Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGardenGoing Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGarden
 
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Stephen Liedig: Building Serverless Backends with AWS Lambda and API GatewayStephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
 
Building serverless backends - Tech talk 5 May 2017
Building serverless backends - Tech talk 5 May 2017Building serverless backends - Tech talk 5 May 2017
Building serverless backends - Tech talk 5 May 2017
 
Operating your Production API
Operating your Production APIOperating your Production API
Operating your Production API
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
 
ServerlessPresentation
ServerlessPresentationServerlessPresentation
ServerlessPresentation
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
 
Building Resilient Serverless Systems with Non-Serverless Components
Building Resilient Serverless Systems with Non-Serverless ComponentsBuilding Resilient Serverless Systems with Non-Serverless Components
Building Resilient Serverless Systems with Non-Serverless Components
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
 
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of Serverless
 
What's New with AWS Lambda
What's New with AWS LambdaWhat's New with AWS Lambda
What's New with AWS Lambda
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
 
Serverless Architectures and Continuous Delivery
Serverless Architectures and Continuous DeliveryServerless Architectures and Continuous Delivery
Serverless Architectures and Continuous Delivery
 
Serverless use cases .NET Fest
Serverless use cases .NET FestServerless use cases .NET Fest
Serverless use cases .NET Fest
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Dernier (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 

PDX Serverless Meetup - Self-Healing Serverless Applications

  • 1. SELF-HEALING SERVERLESS APPLICATIONS PDX Serverless Meetup June 2018 NATE TAGGART
  • 2. AWS | LAMBDA FEATURES PAGE AWS Lambda invokes your code only when needed and automatically scales to support the rate of incoming requests without requiring you to configure anything. There is no limit to the number of requests your code can handle. The Promise: SELF-HEALING SERVERLESS APPLICATIONS | PG2
  • 3. AWS | LAMBDA FEATURES PAGE The Reality: AWS Lambda invokes your code only when needed and automatically scales to support the rate of incoming requests without requiring you to configure anything. There is no limit to the number of requests your code can handle. s architecture sometimes certain s es every can but areproperly ^ (suggested edits) SELF-HEALING SERVERLESS APPLICATIONS | PG3
  • 4. What to expect
 when you’re not expecting. SELF-HEALING SERVERLESS APPLICATIONS | PG4
  • 5. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR SELF-HEALING SERVERLESS APPLICATIONS | PG5 • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking
  • 6. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR Synchronous invocations: • Function fails • Returns error to caller • Logs timestamp, error message, & stack trace to CloudWatch Asynchronous invocations: • Retries up to three times (or more if reading from a stream) • Caller is unaware of error • Logs timestamp, error message, & stack trace to CloudWatch • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking An event triggers your Lambda to run, but raises an unhandled exception in your code. SELF-HEALING SERVERLESS APPLICATIONS | PG6
  • 7. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR Synchronous invocations: • Lambda returns error to caller (if client hasn’t timed out) • Logs timestamp and error message to CloudWatch Asynchronous invocations: • Retries up to three times (more if reading from stream) • Caller is unaware of error • Logs timestamp & error message to CloudWatch • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking An event triggers your Lambda to run, but execution does not complete within the configured maximum execution time. (Lambda’s default configuration is a 
 3-second timeout.) SELF-HEALING SERVERLESS APPLICATIONS | PG7
  • 8. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking When noisy: • Behaves as Uncaught Exception • Visible in CloudWatch, but may be difficult to diagnose without event visibility When silent: • Unexpected application behavior • Can be lost permanently • Can tank performance and dramatically spike costs An event triggers your Lambda to run, but the message is malformed or state is improperly provided causing unexpected behavior. SELF-HEALING SERVERLESS APPLICATIONS | PG8
  • 9. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking Unbuffered invocations: • Fails to invoke • No retry • Visible in CloudWatch metrics, but not in logs Buffered invocations: • Initially fails to invoke • Will eventually continue reading from stream as volume drops Your application becomes throttled as more Lambda instances are required than are allowed to be concurrently running by AWS for your account. Your compute can’t scale high enough. SELF-HEALING SERVERLESS APPLICATIONS | PG9
  • 10. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking Unbuffered invocations: • Fails to invoke • No retry • Visible in CloudWatch metrics, nothing in logs
 (but really non-obvious) Buffered invocations: • Initially fails to invoke • Will eventually continue reading from stream as volume drops Your application becomes throttled as more new Lambda instances are required than are allowed to spawn by AWS for your account. Your compute can’t scale fast enough. SELF-HEALING SERVERLESS APPLICATIONS | PG10
  • 11. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking Upstream bottlenecks: • Fails to invoke • No retry • Visible in CloudWatch, as long as you know where to look Downstream bottlenecks: • Can throw error, timeout, 
 and/or distribute failures to other functions. • Can cause cascading failures • Can tank performance and dramatically spike costs Your application is throttled due to throughput pressure upstream or downstream of your Lambda. Your architecture can’t scale enough. SELF-HEALING SERVERLESS APPLICATIONS | PG11
  • 13. Self-Healing Design Principles LEADING PRACTICES FOR RESILIENT SYSTEMS STANDARDIZE FAIL GRACEFULLY • Reroute and unblock • Automate known solutions • Notify a human SELF-HEALING SERVERLESS APPLICATIONS | PG13 Learn to fail. • Introduce universal instrumentation • Collect event-centric diagnostics • Give everyone visibility PLAN FOR FAILURE • Identify service limits • Use self-throttling • Consider alternative resource types
  • 15. Scenario: Uncaught Exceptions WHEN THINGS BREAK AND YOU DON’T KNOW WHY PROBLEM Lambda periodically fails. Error messages and stack traces are visible in CloudWatch logs. Failing events are lost, making reproduction difficult. KEY PRINCIPLES • Introduce universal instrumentation • Collect event-centric diagnostics • Give everyone visibility SOLUTION • Use function wrapper or decorator pattern • Capture and log events which fail SELF-HEALING SERVERLESS APPLICATIONS | PG15 Decrease time to resolution by capturing event data.
  • 16. Event Diagnostics Wrapper Example SELF-HEALING SERVERLESS APPLICATIONS | PG16
  • 17. WHEN YOUR LAMBDAS AREN’T GETTING INVOKED PROBLEM API Gateway hits throughput limits and fails to invoke Lambda on every request. KEY PRINCIPLES • Identify service limits • Use self-throttling • Notify a human SOLUTION • Implement retries with exponential backoff logic for 429 responses • Raise alarm on: 4XXError Scenario: Upstream bottleneck SELF-HEALING SERVERLESS APPLICATIONS | PG17 Don’t overlook client-side solutions to backend failures.
  • 19. WHEN EXECUTION TAKES TOO LONG PROBLEM Lambda is periodically timing out. KEY PRINCIPLES • Introduce universal instrumentation • Use self-throttling • Consider alternative resource types SOLUTION • Use function wrapper or decorator pattern • Evaluate Fargate or alternative long-running resources Scenario: Timeouts SELF-HEALING SERVERLESS APPLICATIONS | PG19 Enforce your own limits.
  • 20. Timeout Wrapper Example SELF-HEALING SERVERLESS APPLICATIONS | PG20
  • 21. WHEN FAILURES ARE BLOCKING THE REST OF THE STREAM PROBLEM Lambda exceptions and/or timeouts are blocking processing of a Kinesis shard. KEY PRINCIPLES • Reroute and unblock • Automate known solutions • Consider alternative resource types SOLUTION • Introduce state machine- type logic • Move bad messages to alternate stream • Potentially architect with Fargate or SNS Scenario: Stream processing gets “stuck” SELF-HEALING SERVERLESS APPLICATIONS | PG21 Small failures are preferable to large ones.
  • 22. PROBLEM Your Lambdas have scaled up but are depleting your RDS database connection pools. KEY PRINCIPLES • Identify service limits • Automate known solutions • Give everyone visibility SOLUTION • Always close database connections • Scale your database • Map your dependencies Scenario: Downstream bottleneck WHEN LAMBDA IS OUT-SCALING YOUR DATABASE SELF-HEALING SERVERLESS APPLICATIONS | PG22 Scale dependencies, too.
  • 23. Want to try? app.stackery.io/sign-up TIMEOUT HANDLING UNIVERSAL INSTRUMENTATION ERROR RE-ROUTINGBUILD STANDARDIZATION EVENT DIAGNOSTICS SHARED HEALTH DASHBOARD