Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Building resilient serverless systems with non-serverless components - Cardiff 2020

Serverless functions have the ability to scale almost infinitely. While great for compute, it can be a MAJOR PROBLEM for other downstream resources. In this talk, we'll discuss strategies and patterns to create highly resilient serverless apps that mitigate pressure on "non-serverless" systems.

  • Soyez le premier à commenter

Building resilient serverless systems with non-serverless components - Cardiff 2020

  1. 1. Building Resilient Serverless Systems with Non-Serverless Components Jeremy Daly CTO, AlertMe.news @jeremy_daly
  2. 2. Jeremy Daly • CTO at AlertMe.news • Consult with companies building in the cloud • 20+ year veteran of technology startups • Started working with AWS in 2009 and started using Lambda in 2015 • Blogger (jeremydaly.com), OSS contributor, speaker • Publish the Off-by-none serverless newsletter • Host of the Serverless Chats podcast @jeremy_daly
  3. 3. Agenda • What is resiliency and what is serverless? • Working with “less-than-scalable” RDBMS • Using unreliable APIs • Managing API quotas • Decoupling our services • Other non-serverless components @jeremy_daly
  4. 4. What is resiliency? @jeremy_daly “The ability of a software solution to absorb the impact of a problem in one or more parts of a system, while continuing to provide an acceptable service level to the business.” ~ IBM IT’S NOT ABOUT PREVENTING FAILURE IT’S UNDERSTANDING HOWTO GRACEFULLY DEAL WITH IT
  5. 5. What does it mean to be Serverless? • No server management • Flexible scaling • Pay for value • Automated high availability @jeremy_daly Flexible scaling 👈
  6. 6. What does it mean to be Serverless? @jeremy_daly ElastiCache RDS EMR Amazon ES Redshift Fargate Anything “on EC2”Lambda Cognito Kinesis S3 DynamoDB SQS SNS API Gateway CloudWatch AppSync IoT Comprehend Serverless Managed Not Serverless DocumentDB (MongoDB) Managed Streaming for Kaca Definitely
  7. 7. Everything has limits! • Reserved Concurrency 🚦 • FunctionTimeouts ⏳ • Memory Limits 🧠 • NetworkThroughput 🚰 Some components are better than others @jeremy_daly Know Your Limits
  8. 8. Simple ServerlessWeb Service Client API Gateway Lambda DynamoDB @jeremy_daly Highly Scalable Highly Scalable Highly Scalable
  9. 9. “I want my, I want my, I want my SQL” ~ Dire Straits
  10. 10. Simple ServerlessWeb Service Client API Gateway Lambda @jeremy_daly Highly Scalable Highly Scalable NotThat Scalable 😳 RDS ^ not so RDBMS and FaaS don’t play nicely together: • Concurrency model doesn’t allow connection pooling • Limited number of DB connections available • Recycled containers create zombies
  11. 11. Ways to Manage DB Connections • Increase max_connections setting • Limit concurrent executions • Lower your connection timeouts • Limit connections per username • Close connection before function ends @jeremy_daly 🤞 😡 ⚠ 🎲 😱 👎
  12. 12. BetterWays to Manage DB Connections • Implement a good caching strategy 💾 • Buffer events for throttling and durability 🏋 • Utilize a proxy service 🛰 • Manage connections ourselves 🤔 @jeremy_daly 👎
  13. 13. miss Implement a good caching strategy Client API Gateway RDSLambda Elasticache Key Points: • Create new RDS connections ONLY on misses • Make sureTTLs are set appropriately • Include the ability to invalidate cache @jeremy_daly YOU STILL NEEDTO SIZEYOUR DATABASE CLUSTERS APPROPRIATELY
  14. 14. Do you really need immediate feedback? Synchronous Communication Services can be invoked by other services and must wait for a reply. This is considered a blocking request, because the invoking service cannot finish executing until a response is received. Asynchronous Communication 🚀 This is a non-blocking request. A service can invoke (or trigger) another service directly or it can use another type of communication channel to queue information.The service typically only needs to wait for confirmation (ack) that the request was received. @jeremy_daly
  15. 15. RDS Buffer events for throttling and durability Client API Gateway SQS Queue SQS (DLQ) Lambda Lambda (throttled) ack “Asynchronous” Request Synchronous Request @jeremy_daly Key Points: • SQS adds durability • Throttled Lambdas reduce downstream pressure • Failed events are stored for further inspection/replay Limit the concurrency to match RDS throughput x Utilize Service Integrations
  16. 16. Utilize a Proxy Service • PgBouncer 🏀 • SQL Relay 🏃 @jeremy_daly Client API Gateway Lambda RDSEC2x Fargate 🙀 • Amazon RDS Proxy (Preview) In a “serverless” application? FOR SHAME! 😿
  17. 17. Manage connections ourselves 1. Count open connections 2. Close connection if connection ratio threshold exceeded 3. Close sleeping connections with high time values 4. Retry connections with exponential back off @jeremy_daly
  18. 18. Serverless MySQL https://github.com/jeremydaly/serverless-mysql @jeremy_daly
  19. 19. Count open connections @jeremy_daly Query the processlist to get the total number of active connections
  20. 20. Close connection if over ratio threshold @jeremy_daly If we exceed the connection ratio Calculate our timeout Try to kill zombies If no zombies, terminate connection Else, just try to kill zombies
  21. 21. Close sleeping connections with high time values @jeremy_daly Query processlist for zombies Kill zombies
  22. 22. Retry connections with exponential back off @jeremy_daly If error trying to connect Retry with Jitter
  23. 23. Does this really work? @jeremy_daly • Aurora Serverless (2 ACUs) • 90 connections available • 1,024 MB of memory • 500 users/sec for one minute • Avg. response time was 41 ms • ZERO ERRORS
  24. 24. We shouldn’t have to do this! @jeremy_daly Amazon Aurora Serverless Aurora Serverless DATA API Doesn’t solve the max_connections issue Slower throughput, not quite ready for synchronous workloads Amazon RDS Proxy Added cost, still doesn’t address scalability issues *PREVIEW* 🥰
  25. 25. Third-Party APIs
  26. 26. Manage calls to third-party APIs • Implement a good caching strategy 💾 • Buffer events for throttling and durability 🏋 • Implement circuit breakers 🚦 @jeremy_daly
  27. 27. DynamoDB Stripe API The Circuit Breaker Client API Gateway Lambda Key Points: • Cache your cache with warm functions • Use a reasonable failure count • Understand idempotency Status Check CLOSED OPEN Increment Failure Count HALF OPEN “Everything fails all the time.” ~WernerVogels @jeremy_daly 🔥 🔥 🔥 🔥 🔥 Elasticache or
  28. 28. What about quotas? • Concurrency has no effect on frequency ⏰ • Stateless functions are not coordinated 😿 • Step Functions StandardWorkflows would be very expensive 💰 • Adding state wouldn’t prevent needless invocations 🗑 @jeremy_daly
  29. 29. Can we build a better system? • 100% serverless • Cost effective • Scalable • Resilient • Efficient • Coordinated @jeremy_daly
  30. 30. Lambda Orchestrator (concurrency 1) The Lambda Orchestrator DynamoDB LambdaWorker LambdaWorker LambdaWorker Concurrent Executions of the SAME function SQS (DLQ) @jeremy_daly CloudWatch Rule (trigger every minute) SQS QueueSQS (DLQ) Status? Gmail API 250 Quota Units per minute
  31. 31. Decoupling Our Services
  32. 32. Multicasting with SNS Key Points: • SNS has a “well-defined API” • Decouples downstream processes • Allows multiple subscribers with message filters Client SNS “Asynchronous” Request ack Event Service @jeremy_daly HTTP SMS Lambda SQS Email SQS (DLQ) FUN FACT: SNS to SQS is “guaranteed” (100,010 retries)
  33. 33. @jeremy_daly Multicasting with EventBridge Key Points: • Allows multiple subscribers with RULES, PATTERNS and FILTERS • Forward events to other accounts • 24 hours of automated retries Asynchronous “PutEvents” Request ack w/ event id Amazon EventBridge Lambda SQS Client Step Function Event Bus +16 others
  34. 34. Key Points: • Filter events to selectively trigger services • Manage throttling/quotas per service • Use Lambda Destinations with asynchronous events Stripe API @jeremy_daly Distribute &Throttle ack SQS Queue Lambda (concurrency 25) Client API Gateway Lambda Order Service "total": [{ "numeric": [ ”>", 0 ]}] RDS SQS Queue Lambda (concurrency 10) SMS Alerting Service Twilio API SQS Queue Lambda (concurrency 5) Billing Service "detail-type": [ "ORDER COMPLETE" ] EventBridge
  35. 35. Other non-serverless components • Managed Services • Other cloud services (MongoDB Atlas, ElasticSearch, etc.) • Legacy Systems • Our own serverless APIs 🤔 @jeremy_daly
  36. 36. Non-serverless components are inevitable • Know the limits of your components • Use a good caching strategy • Embrace asynchronous processes • Buffer and throttle events to distributed systems • Utilize eventual consistency @jeremy_daly 👈
  37. 37. Things I’m working on… Blog: JeremyDaly.com Podcast: ServerlessChats.com Newsletter: Osynone.io DDBToolbox: DynamoDBToolbox.com Lambda API: LambdaAPI.com GitHub: github.com/jeremydaly Twitter: @jeremy_daly @jeremy_daly
  38. 38. ThankYou! Jeremy Daly jeremy@jeremydaly.com @jeremy_daly