SlideShare a Scribd company logo
1 of 35
Download to read offline
FinDevOps:
Site Reliability in the
Serverless Age
Erik Peterson
CEO & Founder
CloudZero
erik@cloudzero.com | @silvexis
@silvexis
Hello
• CEO and Founder of CloudZero
• I’m recovering from the application security
industry, now 100% focused on Cloud
Management and Serverless Computing
• Have been building systems on AWS since 2008
• Previously
• Veracode, HP, SPI Dynamics, GuardedNet,
Sanctum
• United Nations IAEA, US Department of State,
SunTrust, Moody’s Investors
erik@cloudzero.com | @silvexis
@silvexis
Reliability
How does Serverless change things?
FinDevOps
The Future
@silvexis
Serverless?
WHAT IS
SERVERLESS?
A technology abstraction that
enables you to focus a larger
percentage of your time delivering
customer value
@silvexis
@silvexis
Serverless is a Spectrum
0% 100%50% less opsmore ops
EC2
RDS
Redshift
ElastiCache
Elasticsearch
Aurora(RDS)
ECS
ECS(Fargate)
Kinesis
Aurora(Serverless)
DynamoDB
APIGateway
StepFunctions
SQS
SNSS3
Lambda
EFS
@silvexis
@silvexis
CLOUD IS AN OPERATING SYSTEM
SERVERLESS IS ITS NATIVE CODE
@silvexis
@silvexis
You can’t
“lift and shift” your
way into Serverless
This implies that your culture and process must also change
@silvexis
How Do We Build Reliable
Serverless Systems?
@silvexis
@silvexis
Werner Vogels
CTO Amazon Web Services
@silvexis
So what does
reliability
even mean?
@silvexis
Reliability is the
trustworthiness of a system’s
ability to delight the
customer
@silvexis
@silvexis
Forces that
drive
reliability
• DevOps (culture)
• Eliminate silos
• Accept failure
• MTTR > MTBF
• ++Feature velocity
• Measure everything
• Site Reliability
Engineering (practice)
• Availability
• Latency
• Performance
• Efficiency
• Change management
• Monitoring
• Emergency response
• Capacity planning
• Provisioning
@silvexis
How does Serverless affect
these forces?
@silvexis
@silvexis
Serverless
effect on Site
Reliability
Engineering
CLOUD SLA’S & AUTOSCALING
OBSERVABILITY
SERVICE LIMIT PLANNING
COST
AVAILABILITY
LATENCY
PERFORMANCE
EFFICIENCY
CHANGE MANAGEMENT
MONITORING
EMERGENCY RESPONSE
CAPACITY PLANNING
PROVISIONING
Congrats! Your still on call!
Still your problem
Harder to understand
More tracking, less management
Automation
@silvexis
AVAILABILITY -> CLOUD SLA’S & AUTO SCALING
Secret to happiness is letting go
@silvexis
Cost? I thought
Serverless Was
Free?
EFFICENCY -> COST
@silvexis
Can we get real for a second:
FaaS is NOT Serverless
CloudWatch Logs
$1.79$15
$0.89
$789!!!
$12
FaaS Cost: $(1.79)
Other Cost: $(818.68)
Avg. Per Day Operations
100% Serverless, and 100% not free
@silvexis
• Observability is a measure of how well the state of a system can be
determined from the analysis of its outputs.
• Emergent properties will be the bane of your existence.
• Dumping all your logs somewhere will not solve your problems.
• Focus on sampling the outputs of your system to understand
connective tissue & be able to do analysis without a priori
knowledge.
MONITORING -> OBSERVABILITY
@silvexis
FYI: This isn’t Analysis
@silvexis
CAPACITY PLANNING -> SERVICE LIMITS
• Capacity is built in but, Serverless systems have limits and constraints.
• You will hit them once you are in prod and under heavy customer load…on a Friday…at 6pm
• It can be very very hard to figure out when the limits are being hit in a large system with
many moving parts. Here are just a few examples:
• Maximum number of concurrent
executions per AWS account
(1000, changeable)
• Immediate Concurrency Increase
(500 or more per min, depends on
region, fixed)
AWS Lambda API Gateway
• Integration timeout (29
sec max, fixed)
• Max Payload size (10mb,
fixed)
• S3 will asynchronously
call Lambda
• Lambda polls DynamoDB
Streams only once per
second, per shard
Serverless Invocation Limits
Examples: Examples: Examples:
@silvexis
Serverless
effect on
DevOps
Stop faking DevOps
MTTR > MTBF must be RELIGION
Cost must be a 1st class metric
@silvexis
Stop Faking DevOps
• DevOps wasn’t a merger, it
was a hostile takeover
• If you see an Ops team, you
blew it
• Effective Serverless
engineering teams must take
ownership of operations
DEV
OPS
@silvexis
MTTR > MTBF
must be
RELIGION
Remember everything fails, all
the time.
REALLY
@silvexis
Would you Deploy at 6pm on a Friday?
@silvexis
It’s time we had the talk about cost
If you have infinite scale, it follows you must also have infinite wallet
At what point will you degrade your customers experience to save
your wallet?
A system that puts your company out of business is not a reliable
system
@silvexis
Thinking about Cost and Architecture
• Lets come back to this chart for a second
CloudWatch Logs
$1.79$15
$0.89
$789
$12
This is could be a big problem
Question: Could it be worse?
@silvexis
Thinking about Cost and Architecture
• Of course it can be worse!
Writes 100
files
Invoked
100x
1000
records
Written
Invoked
100x
Invoked 3x per
transaction Writes 1000 files
What happens at step 7?
1
2
3
4
5
6
7
Hint: It’s both costly and catastrophic
@silvexis
Thinking about Cost and Architecture
• Denial of Wallet is a very real problem
• What if you are only responsible for a small part of this system?
• What if your part is in a separate AWS Account or from a 3rd party?
• How do you detect when this is even happening?
7
@silvexis
Cost must become a first-
class metric to build
effective cloud systems
@silvexis
What is a First-Class Metric?
It’s real-time It has context
You can measure it
You have a clear
definition of good and
bad
@silvexis
DevOps -> FinDevOps
• Cost becomes a first class operational
metric
• First suggested by Simon Wardley @
ServerlessConf in 2018 as FinDev
• A merger of financial, development and
operations practices
• Understands the tight correlation
between cost and well architected
systems
@silvexis
@silvexis
@silvexis
FinDevOps
will replace
DevOps
Ultimately FinDevOps isn’t just
about monitoring cost
Your cloud spend is an investment,
and you should be tracking the
performance of this investment
Serverless enables a clear path to
mapping revenue generating IT
activities against engineering and
IT costs
@silvexis
Tracking gross margins and COGS are what really matters
Serverless makes this possible
CloudWatch Logs
$(1.79)$(15)
$(0.89)
$(789)
$(12)
FaaS Cost: $(1.79)
Other Cost: $(818.68)
Avg. Per Day Operations
Where FinDevOps will take us
Revenue: $9324.04
Profit: $8503.57
Who cares? We are RICH!
$9324.04
@silvexis
Thank You!
erik@cloudzero.com
@silvexis

More Related Content

What's hot

What's hot (20)

Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
Cloud Lessons Learned: 3 Cloud Case Studies
Cloud Lessons Learned: 3 Cloud Case StudiesCloud Lessons Learned: 3 Cloud Case Studies
Cloud Lessons Learned: 3 Cloud Case Studies
 
AWS Customer Presentation- Melrose
AWS Customer Presentation- MelroseAWS Customer Presentation- Melrose
AWS Customer Presentation- Melrose
 
re:Invent Recap Breakfast
re:Invent Recap Breakfastre:Invent Recap Breakfast
re:Invent Recap Breakfast
 
AWS re:Invent 2016: AWS Training Opportunities (DCS202 )
AWS re:Invent 2016: AWS Training Opportunities (DCS202 )AWS re:Invent 2016: AWS Training Opportunities (DCS202 )
AWS re:Invent 2016: AWS Training Opportunities (DCS202 )
 
Serverless is the future... or is it?
Serverless is the future... or is it?Serverless is the future... or is it?
Serverless is the future... or is it?
 
Sloppy Little Serverless Stories
Sloppy Little Serverless StoriesSloppy Little Serverless Stories
Sloppy Little Serverless Stories
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
 
AWS Customer Presentation - How TubeMogul uses AWS
AWS Customer Presentation - How TubeMogul uses AWSAWS Customer Presentation - How TubeMogul uses AWS
AWS Customer Presentation - How TubeMogul uses AWS
 
Netflix Development Patterns for Scale, Performance & Availability (DMG206) |...
Netflix Development Patterns for Scale, Performance & Availability (DMG206) |...Netflix Development Patterns for Scale, Performance & Availability (DMG206) |...
Netflix Development Patterns for Scale, Performance & Availability (DMG206) |...
 
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR HadoopCrunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)
 
Serverless computing
Serverless computingServerless computing
Serverless computing
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
Build, run, and scale your Java applications end to end
Build, run, and scale your Java applications end to endBuild, run, and scale your Java applications end to end
Build, run, and scale your Java applications end to end
 
DevOps and AWS
DevOps and AWSDevOps and AWS
DevOps and AWS
 
Automating Cloud Operations: Tips from Managed Services
Automating Cloud Operations: Tips from Managed ServicesAutomating Cloud Operations: Tips from Managed Services
Automating Cloud Operations: Tips from Managed Services
 
(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...
(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...
(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...
 
Container Management with Amazon ECS
Container Management with Amazon ECSContainer Management with Amazon ECS
Container Management with Amazon ECS
 

Similar to Site reliability in the Serverless age - Serverless Boston 2019

T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
Amazon Web Services
 

Similar to Site reliability in the Serverless age - Serverless Boston 2019 (20)

DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
 
Site reliability in the serverless age - Serverless Boston Meetup
Site reliability in the serverless age  - Serverless Boston MeetupSite reliability in the serverless age  - Serverless Boston Meetup
Site reliability in the serverless age - Serverless Boston Meetup
 
From AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesFrom AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy Pieces
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
(ARC309) Getting to Microservices: Cloud Architecture Patterns
(ARC309) Getting to Microservices: Cloud Architecture Patterns(ARC309) Getting to Microservices: Cloud Architecture Patterns
(ARC309) Getting to Microservices: Cloud Architecture Patterns
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
ENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million users
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
Serverless Real-time Tracking & Analysis
Serverless Real-time Tracking & AnalysisServerless Real-time Tracking & Analysis
Serverless Real-time Tracking & Analysis
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
 
Moving Viadeo to AWS
Moving Viadeo to AWSMoving Viadeo to AWS
Moving Viadeo to AWS
 
Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014
 
How to Architect AWS for Mission-Critical Applications
How to Architect AWS for Mission-Critical ApplicationsHow to Architect AWS for Mission-Critical Applications
How to Architect AWS for Mission-Critical Applications
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Moving Viadeo to AWS (2015)
Moving Viadeo to AWS (2015)Moving Viadeo to AWS (2015)
Moving Viadeo to AWS (2015)
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20
 
Accelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and KubernetesAccelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and Kubernetes
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Site reliability in the Serverless age - Serverless Boston 2019

  • 1. FinDevOps: Site Reliability in the Serverless Age Erik Peterson CEO & Founder CloudZero erik@cloudzero.com | @silvexis
  • 2. @silvexis Hello • CEO and Founder of CloudZero • I’m recovering from the application security industry, now 100% focused on Cloud Management and Serverless Computing • Have been building systems on AWS since 2008 • Previously • Veracode, HP, SPI Dynamics, GuardedNet, Sanctum • United Nations IAEA, US Department of State, SunTrust, Moody’s Investors erik@cloudzero.com | @silvexis
  • 3. @silvexis Reliability How does Serverless change things? FinDevOps The Future @silvexis Serverless?
  • 4. WHAT IS SERVERLESS? A technology abstraction that enables you to focus a larger percentage of your time delivering customer value @silvexis
  • 5. @silvexis Serverless is a Spectrum 0% 100%50% less opsmore ops EC2 RDS Redshift ElastiCache Elasticsearch Aurora(RDS) ECS ECS(Fargate) Kinesis Aurora(Serverless) DynamoDB APIGateway StepFunctions SQS SNSS3 Lambda EFS @silvexis
  • 6. @silvexis CLOUD IS AN OPERATING SYSTEM SERVERLESS IS ITS NATIVE CODE @silvexis
  • 7. @silvexis You can’t “lift and shift” your way into Serverless This implies that your culture and process must also change
  • 8. @silvexis How Do We Build Reliable Serverless Systems? @silvexis
  • 11. @silvexis Reliability is the trustworthiness of a system’s ability to delight the customer @silvexis
  • 12. @silvexis Forces that drive reliability • DevOps (culture) • Eliminate silos • Accept failure • MTTR > MTBF • ++Feature velocity • Measure everything • Site Reliability Engineering (practice) • Availability • Latency • Performance • Efficiency • Change management • Monitoring • Emergency response • Capacity planning • Provisioning
  • 13. @silvexis How does Serverless affect these forces? @silvexis
  • 14. @silvexis Serverless effect on Site Reliability Engineering CLOUD SLA’S & AUTOSCALING OBSERVABILITY SERVICE LIMIT PLANNING COST AVAILABILITY LATENCY PERFORMANCE EFFICIENCY CHANGE MANAGEMENT MONITORING EMERGENCY RESPONSE CAPACITY PLANNING PROVISIONING Congrats! Your still on call! Still your problem Harder to understand More tracking, less management Automation
  • 15. @silvexis AVAILABILITY -> CLOUD SLA’S & AUTO SCALING Secret to happiness is letting go
  • 16. @silvexis Cost? I thought Serverless Was Free? EFFICENCY -> COST
  • 17. @silvexis Can we get real for a second: FaaS is NOT Serverless CloudWatch Logs $1.79$15 $0.89 $789!!! $12 FaaS Cost: $(1.79) Other Cost: $(818.68) Avg. Per Day Operations 100% Serverless, and 100% not free
  • 18. @silvexis • Observability is a measure of how well the state of a system can be determined from the analysis of its outputs. • Emergent properties will be the bane of your existence. • Dumping all your logs somewhere will not solve your problems. • Focus on sampling the outputs of your system to understand connective tissue & be able to do analysis without a priori knowledge. MONITORING -> OBSERVABILITY
  • 20. @silvexis CAPACITY PLANNING -> SERVICE LIMITS • Capacity is built in but, Serverless systems have limits and constraints. • You will hit them once you are in prod and under heavy customer load…on a Friday…at 6pm • It can be very very hard to figure out when the limits are being hit in a large system with many moving parts. Here are just a few examples: • Maximum number of concurrent executions per AWS account (1000, changeable) • Immediate Concurrency Increase (500 or more per min, depends on region, fixed) AWS Lambda API Gateway • Integration timeout (29 sec max, fixed) • Max Payload size (10mb, fixed) • S3 will asynchronously call Lambda • Lambda polls DynamoDB Streams only once per second, per shard Serverless Invocation Limits Examples: Examples: Examples:
  • 21. @silvexis Serverless effect on DevOps Stop faking DevOps MTTR > MTBF must be RELIGION Cost must be a 1st class metric
  • 22. @silvexis Stop Faking DevOps • DevOps wasn’t a merger, it was a hostile takeover • If you see an Ops team, you blew it • Effective Serverless engineering teams must take ownership of operations DEV OPS
  • 23. @silvexis MTTR > MTBF must be RELIGION Remember everything fails, all the time. REALLY
  • 24. @silvexis Would you Deploy at 6pm on a Friday?
  • 25. @silvexis It’s time we had the talk about cost If you have infinite scale, it follows you must also have infinite wallet At what point will you degrade your customers experience to save your wallet? A system that puts your company out of business is not a reliable system
  • 26. @silvexis Thinking about Cost and Architecture • Lets come back to this chart for a second CloudWatch Logs $1.79$15 $0.89 $789 $12 This is could be a big problem Question: Could it be worse?
  • 27. @silvexis Thinking about Cost and Architecture • Of course it can be worse! Writes 100 files Invoked 100x 1000 records Written Invoked 100x Invoked 3x per transaction Writes 1000 files What happens at step 7? 1 2 3 4 5 6 7 Hint: It’s both costly and catastrophic
  • 28. @silvexis Thinking about Cost and Architecture • Denial of Wallet is a very real problem • What if you are only responsible for a small part of this system? • What if your part is in a separate AWS Account or from a 3rd party? • How do you detect when this is even happening? 7
  • 29. @silvexis Cost must become a first- class metric to build effective cloud systems
  • 30. @silvexis What is a First-Class Metric? It’s real-time It has context You can measure it You have a clear definition of good and bad
  • 31. @silvexis DevOps -> FinDevOps • Cost becomes a first class operational metric • First suggested by Simon Wardley @ ServerlessConf in 2018 as FinDev • A merger of financial, development and operations practices • Understands the tight correlation between cost and well architected systems @silvexis
  • 33. @silvexis FinDevOps will replace DevOps Ultimately FinDevOps isn’t just about monitoring cost Your cloud spend is an investment, and you should be tracking the performance of this investment Serverless enables a clear path to mapping revenue generating IT activities against engineering and IT costs
  • 34. @silvexis Tracking gross margins and COGS are what really matters Serverless makes this possible CloudWatch Logs $(1.79)$(15) $(0.89) $(789) $(12) FaaS Cost: $(1.79) Other Cost: $(818.68) Avg. Per Day Operations Where FinDevOps will take us Revenue: $9324.04 Profit: $8503.57 Who cares? We are RICH! $9324.04