SlideShare une entreprise Scribd logo
1  sur  40
How to Build Production-Ready
Microservices
Michael Kehoe
Staff Site Reliability Engineer
Today’s
agenda
1 Introductions
2 Tenets of Readiness
3 Creating Measurable guidelines
4 Measuring readiness
5 Key Learning’s
6 Q&A
Introduction
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Worked on:
• Networks
• Microservices
• Traffic Engineering
• Databases
Production-SRE Team @ LinkedIn
$ WHOAMI
• Disaster Recovery - Planning &
Automation
• Incident Response – Process &
Automation
• Visibility Engineering – Making use of
operational data
• Reliability Principles – Defining best
practice & automating it
What is a Production Ready
Service?
O’Reilly 2017
Susan J. Fowler
Production-Ready
Microservices
“A production-ready application or service is
one that can be trusted to serve production
traffic…”
S U S A N J . F O W L E R
“… We trust it to behave reasonably, we trust it
to perform reliably, we trust it to get the job
done and to do its job well with very little
downtime.”
S U S A N J . F O W L E R
Tenets of readiness
Tenets of
Readine
ss
1 Stability & Reliability
2 Scalability & Performance
3 Fault Tolerance and DR
4 Monitoring
5 Documentation
Stability & Reliability
Tenets of Readiness
STABILITY
• Stable development cycle
• Stable deployment cycle
Tenets of Readiness
RELIABILITY
• Dependency Management
• Onboarding + Deprecation procedures
• Routing + Discovery
Scalability & Performance
Tenets of Readiness
SCALABILITY
• Understanding growth-scales
• Resource awareness
• Dependency scaling
Tenets of Readiness
PERFORMANCE
• Constant performance evaluation
• Traffic management
• Capacity Planning
Fault Tolerance & DR
Tenets of Readiness
FAULT TOLERANCE
• Avoiding Single Points of Failure (SPOF)
• Resiliency Engineering
Tenets of Readiness
DISASTER RECOVERY
• Understand failure scenario’s
• Disaster Recovery Plans
• Incident Management
Monitoring
Tenets of Readiness
MONITORING
• Dashboards + Alerting for:
• Service
• Resource Allocation
• Infrastructure
• All alerts are actionable and have pre-
documented procedures.
• Logging
Documentation
Tenets of Readiness
DOCUMENTATION
• Have one central landing-place for
documentation for the service
• Review of documentation from Engineer/
SRE/ Partners
• Reviewed Regularly
Tenets of Readiness
DOCUMENTATION
• What should documentation include:
• Key information (ports/ hostnames
etc)
• Description
• Architecture Diagram
• API description
• Oncall information
• Onboarding information
Creating Measurable
Guidelines
Creating Measurable Guidelines
• Not all guidelines directly translate into
something measurable
• You may need to look outcomes of
specific guidance to create measurable
guidelines
Creating Measurable Guidelines
EXAMPLE
• Stability:
• Stable development cycle
• Stable deployment process
• Stable introduction and deprecation
procedures
Creating Measurable Guidelines
EXAMPLE
• Stable development cycle
 Is the unit-test coverage above X %?
 Has this code-base been built in the
last week?
 Is there a staging environment for the
application?
Creating Measurable Guidelines
EXAMPLE
• Stable deployment process
 Has the application been deployed
recently?
 What is the successful deployment
percentage?
Measuring Readiness
Measuring Readiness
WHY?
Ensuring that services
are built and operated
in a standard manner
Standardization
Ensuring that services
are trustworthy
Quality
Assurance
Measuring Readiness
HOW?
Create a manual
checklist
Manual
Checklists
Automate the discovery
and measurement of
readiness
Automated
Scorecards
Breakdown of scores by team
Automated Service Discovery
Service Score Card
Automated Scorecards
Team Overview
Service Score Card
Automated Scorecards
Breakdown of readiness results
Service Score Card
Automated Scorecards
Key Learnings
Key Learnings
A set of guidelines for
what it means for
your service to be
‘ready’
Create
Automate the
checking and scoring
of these guidelines
Automate
Set the expectation
between
Engineering/ Product/
SRE that these
guidelines have to be
met
Evangelize
Q & A
Building Production-Ready Microservices: DevopsExchangeSF

Contenu connexe

Tendances

SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...New Relic
 
Quality Enablement - Agile Practices with Quality Enablement
Quality Enablement -  Agile Practices with Quality Enablement Quality Enablement -  Agile Practices with Quality Enablement
Quality Enablement - Agile Practices with Quality Enablement Microsoft Visual Studio
 
2016 State of DevOps Report Webinar
2016 State of DevOps Report Webinar2016 State of DevOps Report Webinar
2016 State of DevOps Report WebinarPuppet
 
DevOps drivein - Mind the Gap
DevOps drivein - Mind the GapDevOps drivein - Mind the Gap
DevOps drivein - Mind the GapSerena Software
 
Diving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsDiving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsJules Pierre-Louis
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability EngineeringMark Underwood
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
Devops: Security's big opportunity by Peter Chestna
Devops: Security's big opportunity by Peter ChestnaDevops: Security's big opportunity by Peter Chestna
Devops: Security's big opportunity by Peter ChestnaDevSecCon
 
DevOps: A Value Proposition
DevOps: A Value PropositionDevOps: A Value Proposition
DevOps: A Value PropositionNicole Forsgren
 
Continuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps SuccessContinuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps SuccessTechWell
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsMarc Hornbeek
 
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Hannes Lenke
 
The Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit AmsterdamThe Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit AmsterdamMirco Hering
 
DevOpsDays - Pick any Three - Devops from scratch
DevOpsDays - Pick any Three - Devops from scratchDevOpsDays - Pick any Three - Devops from scratch
DevOpsDays - Pick any Three - Devops from scratchPete Cheslock
 
3 Steps to Expand DevOps and Automation Throughout the Enterprise
3 Steps to Expand DevOps and Automation Throughout the Enterprise3 Steps to Expand DevOps and Automation Throughout the Enterprise
3 Steps to Expand DevOps and Automation Throughout the EnterprisePuppet
 
Scaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesDeborah Schalm
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs
 

Tendances (20)

SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Quality Enablement - Agile Practices with Quality Enablement
Quality Enablement -  Agile Practices with Quality Enablement Quality Enablement -  Agile Practices with Quality Enablement
Quality Enablement - Agile Practices with Quality Enablement
 
2016 State of DevOps Report Webinar
2016 State of DevOps Report Webinar2016 State of DevOps Report Webinar
2016 State of DevOps Report Webinar
 
DevOps drivein - Mind the Gap
DevOps drivein - Mind the GapDevOps drivein - Mind the Gap
DevOps drivein - Mind the Gap
 
Diving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsDiving Deeper into DevOps Deployments
Diving Deeper into DevOps Deployments
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability Engineering
 
SRE in Apiary
SRE in ApiarySRE in Apiary
SRE in Apiary
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Devops: Security's big opportunity by Peter Chestna
Devops: Security's big opportunity by Peter ChestnaDevops: Security's big opportunity by Peter Chestna
Devops: Security's big opportunity by Peter Chestna
 
SRE vs DevOps
SRE vs DevOpsSRE vs DevOps
SRE vs DevOps
 
DevOps: A Value Proposition
DevOps: A Value PropositionDevOps: A Value Proposition
DevOps: A Value Proposition
 
Continuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps SuccessContinuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps Success
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE Assessments
 
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
 
The Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit AmsterdamThe Anti-Transformation transformation @DevOps Summit Amsterdam
The Anti-Transformation transformation @DevOps Summit Amsterdam
 
DevOpsDays - Pick any Three - Devops from scratch
DevOpsDays - Pick any Three - Devops from scratchDevOpsDays - Pick any Three - Devops from scratch
DevOpsDays - Pick any Three - Devops from scratch
 
3 Steps to Expand DevOps and Automation Throughout the Enterprise
3 Steps to Expand DevOps and Automation Throughout the Enterprise3 Steps to Expand DevOps and Automation Throughout the Enterprise
3 Steps to Expand DevOps and Automation Throughout the Enterprise
 
Scaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBees
 
Building DevOps Toolchain
Building DevOps ToolchainBuilding DevOps Toolchain
Building DevOps Toolchain
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 

Similaire à Building Production-Ready Microservices: DevopsExchangeSF

QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfPhil Johnson
 
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayCode Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayTodd Palino
 
Microdeployments for microservices dev ops nashville
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashvilleNathaniel (Ned) Bauerle
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsCygnet Infotech
 
How to Leverage SAFe 5.0 for Your Enterprise Cloud Strategy
How to Leverage SAFe 5.0 for Your Enterprise Cloud StrategyHow to Leverage SAFe 5.0 for Your Enterprise Cloud Strategy
How to Leverage SAFe 5.0 for Your Enterprise Cloud StrategyCprime
 
Leveraging DevOps Principles for Release and Deploy
Leveraging DevOps Principles for Release and DeployLeveraging DevOps Principles for Release and Deploy
Leveraging DevOps Principles for Release and DeploySerena Software
 
5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network OperationsJames Kelly
 
Infrastructure as code with test approach
Infrastructure as code with test approachInfrastructure as code with test approach
Infrastructure as code with test approachEnrique Carbonell
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesPrecisely
 
Prepare the sled in summer and project release at its beginning
Prepare the sled in summer and project release at its beginningPrepare the sled in summer and project release at its beginning
Prepare the sled in summer and project release at its beginningVadym Fedorov
 
Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?Jeff Jakubiak
 
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Microsoft Décideurs IT
 
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Microsoft Technet France
 
DevOps Vancouver Meetup - WSBC Progress
DevOps Vancouver Meetup - WSBC ProgressDevOps Vancouver Meetup - WSBC Progress
DevOps Vancouver Meetup - WSBC ProgressAndre Kaminski
 

Similaire à Building Production-Ready Microservices: DevopsExchangeSF (20)

QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
 
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayCode Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
 
Microdeployments for microservices dev ops nashville
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashville
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOps
 
How to Leverage SAFe 5.0 for Your Enterprise Cloud Strategy
How to Leverage SAFe 5.0 for Your Enterprise Cloud StrategyHow to Leverage SAFe 5.0 for Your Enterprise Cloud Strategy
How to Leverage SAFe 5.0 for Your Enterprise Cloud Strategy
 
Leveraging DevOps Principles for Release and Deploy
Leveraging DevOps Principles for Release and DeployLeveraging DevOps Principles for Release and Deploy
Leveraging DevOps Principles for Release and Deploy
 
5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations
 
What is DevOps?
What is DevOps?What is DevOps?
What is DevOps?
 
Infrastructure as code with test approach
Infrastructure as code with test approachInfrastructure as code with test approach
Infrastructure as code with test approach
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
 
Prepare the sled in summer and project release at its beginning
Prepare the sled in summer and project release at its beginningPrepare the sled in summer and project release at its beginning
Prepare the sled in summer and project release at its beginning
 
JC_Gabuya_Resume
JC_Gabuya_ResumeJC_Gabuya_Resume
JC_Gabuya_Resume
 
Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?Who needs EA… when we have DevOps?
Who needs EA… when we have DevOps?
 
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
 
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.
 
DevOps Vancouver Meetup - WSBC Progress
DevOps Vancouver Meetup - WSBC ProgressDevOps Vancouver Meetup - WSBC Progress
DevOps Vancouver Meetup - WSBC Progress
 

Plus de Michael Kehoe

AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016Michael Kehoe
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 

Plus de Michael Kehoe (20)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 

Dernier

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 

Dernier (20)

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 

Building Production-Ready Microservices: DevopsExchangeSF

Notes de l'éditeur

  1. Michael So we’re apart of a team at LinkedIn called Production-SRE The key tenants of production-sre at LinkedIn is: Assist in restoring stability during site-critical issues Developing applications to reduce MTTD and MTTR Provide direction and guidelines for site-troubleshooting Build tools for efficient site-issue troubleshooting, issue detection and correlation As this presentation goes on, you’ll notice how an Event Correlation system fits into these
  2. In this context, stability is about having a consistent pre-production experience Development Continuous Integration with central code repo and code review Reproducible builds Unit/ Integration testing Deployment Simple repeatable deploys Canary/ Dark Canary/ Staging Canary testing
  3. Dependency management Unreliability in microservices usually comes from either changes in inbound traffic or changes in behavior from downstream traffic Knowing all of these and understanding impact matters Onboarding/ Deprecation Documented manner to start using API’s Access Control Best Practices Deprecation ACL/ Firewalls Code Cleanup Routing + Discovery Is there a standard way to discover how to get to your service Does your application have reliable health-checks Does your load-balancer respect these Circuit breakers/ Degraders
  4. Growth Scales: How your service scales with the business goals/ metrics How your application scales as it gets more traffic (how do you make it serve more traffic) What resource “bounds” the application throughput Resource Awareness: What is your resource usage What are your bottlenecks Horizontal vs Vertical scaling (don’t do this) Dependency Scaling: Going back to reliability…how do your downstreams scale with your services growth
  5. Performance This is essential Ideally should be done every time a change is made to the service (deployment/ AB flag) Measure and report performance Traffic Management QoS Scaling for bursts/ failover Capacity: This really ties everything together Do you have the right numbers to know how many resources you’ll need in the future
  6. SPOF: In 2018, this really shouldn’t be a problem. Hardware failures Rack failures Do the right thing early to avoid problems down the road Resiliency Engineering May be known as chaos engineering Deliberately break your service to find weakpoints and look to make things fail more gracefully Outages happen, be prepared for them
  7. Failure Scenarios: Understand what ways your service can break – resiliency engineering can help here if you’re ensure Have plans to respond to these Disaster Recovery: For larger scale outages, what’s your plan IM: What’s the process to manage & respond to the outage Gamedays are a great way to test this
  8. Dashboards: Dashboards are for high-level system health Not for regression validation Monitor service/ resources/ infrastructure All alerts should be actionable and have pre-planned responses Anything other than this creates alert fatigue Logging is an underrated aspect of software development. During resiliency testing, see what the logs say and if they’re helpful
  9. Documentation: Possibly the hardest tenet to execute on Should have one place for all documentation (wiki/ website etc) Documentation should be peer-reviewed by Engineers/ SRE/ Partners Everyone should be writing this documentation All documentation should be reviewed every 3-6 months
  10. Problem is general concepts of readiness is not something you can directly translate into something that’s True/ False and can be scored or measured
  11. Here is an example looking at the ‘Stability Tenet
  12. Here is some potential outcomes of implementing some of the stability concepts
  13. So why measure readiness? Readiness helps ensure two key outcomes Standardization One key concept Susan points out in the book is that services should be built in the same manner QA If we have built the service correctly, we should be able to trust that the service will be reliable in Production
  14. So how do we measure this. Manual Checklists This should work ok for a small number of services, but becomes very very difficult as you have increased agility and an increased number of services Automated Scorecards This is a great way to build a scalable, reliable set of checks that can be constantly updated and provide reporting
  15. Create: First step is to agree on what your company defines as production ready Automate: I strongly suggest automating these checks Evangelize: Once you have everything in place, evangelize, set the expectation, make it apart of company culture