SlideShare une entreprise Scribd logo
1  sur  88
Building Site Reliability
Engineering:
A Crash Course
Amin Astaneh, Acquia Inc.
Who am I?
● Senior Manager, SRE at Acquia
● Was in Operations Team from Dec
2010 - Nov 2015
● Built and Lead the Site Reliability
Engineering Team
Agenda
● What is SRE?
● Why Do SRE?
● Acquia, Pre-SRE
● How Acquia Does SRE
● Building an SRE Competency
● How to Hire SREs?
● 1-Year Retrospective
What is SRE?
What is SRE?
“What happens when a software engineer is tasked with what used to be called
operations.”
- Ben Treynor, Google
What is SRE?
SRE takes the manual processes associated with Operations..
What is SRE?
..and replaces them with automation using software engineering.
What is SRE?
They also use a set of methodologies and best practices that help engineering
teams create a mature and sustainable process for service ownership.
How Does This Relate to DevOps?
DevOps is a set of values, tools, and processes that allow teams to best deliver
value to the customer.
Therefore, SRE can be considered a specific implementation of DevOps.
SRE Practices
(according to Google)
1)Hire only coders.
2) Have SLO(s) for your service.
What are SLOs?
● SLI: Service Level Indicators (What to Measure)
● SLOs: Service Level Objectives (Targets for Measurements)
● SLAs: Service Level Agreements (Consequences for Missing Targets)
3) Measure and report performance
against the SLO(s).
4) Use Error Budgets and gate launches
on them.
5) Have a common staffing pool for SRE
and developers.
6) Cap SRE operational load at 50%.
7) Have excess Ops work overflow to the
Dev Team.
8) Share 5% of Ops work with the Dev
Team.
9) Oncall teams should have at least
eight people at one location, or 6 people
at each of multiple locations.
10) Aim for a maximum of two events per
oncall shift.
11) Do a postmortem for every event.
12) Postmortems are blameless and
focus on process and technology, not
people.
Why Do SRE?
Scale
Improve Employees’ Quality of Life
REDUCE COST
Acquia, Pre-SRE
Things We Tried First
● Implemented Kanban for Ops to make work visible and maximize throughput
● Did ‘Tier 2 Sprints’ to build automation for the team
● Generated team metrics to influence decision-making
“People Metrics: How to Use Team Data to Produce Positive Change”
https://events.drupal.org/dublin2016/sessions/people-metrics
How Acquia Does SRE
How Acquia Does SRE
Acquia SRE was commissioned as the driving force of our DevOps Initiative,
which has the following core values:
● Eliminate Toil
● No Capes
● Deliver With Empathy
● Own Your Service
● Own Your Business
● Own Customer Success
Acquia SRE vs Google SRE
● We embed engineers on teams, rather than build teams that run services on
behalf of engineers
● The entire engineering team (plus the SRE) is expected to ‘own their service’,
with the SRE providing leadership on how to best handle those
responsibilities
● The SRE identifies risk as part of their day-to-day and brings improvement
opportunities directly to the Product Manager for prioritization
Acquia SRE vs Google SRE
● We evaluate with Engineering and Product what the most critical projects are
on a quarterly basis, and allocate the team to best meet the present need
● We still reserve the right to remove engineers if an engagement becomes
untenable, though it has not yet been necessary
● We have a heavy focus on time tracking to aid in toil reduction
8) Share 5% of Ops work with the Dev
Team.
8) Share 5% of Ops work with the Dev
Team.
8) Ops work IS the responsibility of the
Dev Team.
Building A SRE Competency
Get Management Buy-In
SRE Won’t Work Without Two Things
● Authority to stop releases when the error budget has been
exhausted
● Authority to overflow operational work to the dev team
when operational load > 50%
This must be given from lead of engineering/product efforts.
DO NOT CONTINUE UNLESS YOU HAVE THESE!
How Do You Get Buy-In?
Establish a Sense of Urgency!
https://events.drupal.org/baltimore2017/sessions/%C2%A1viva-la-revoluci%C3%B3n-how-
start-devops-transformation-your-workplace
Automatically Measure Toil
SRE Operational Load Dashboard
Operational Responsibility Assessment
Operational Responsibility Assessment
● Based on the Capability Maturity Model (https://en.wikipedia.org/wiki/Capability_Maturity_Model)
● Evaluates the following responsibilities:
○ Routine Tasks
○ Emergency Response
○ Monitoring and Metrics
○ Capacity Planning
○ Change Management
○ New Product Introduction and Removal
○ Service Deploy and Decommissioning
○ Performance and Efficiency
○ Information Security
Operational Responsibility Assessment
Each responsibility is scored from 1-5:
1. Initial: Chaotic. Undocumented, ad-hoc, and require individual heroics.
2. Repeatable: Documented sufficiently so they can be repeated with the same
results.
3. Defined: Roles and responsibilities for the process are defined and
confirmed.
4. Managed: The process is quantitatively managed in accordance with agreed-
upon metrics.
5. Optimizing: Process management includes deliberate process
Operational Responsibility Assessment
● Assess your services often! (we suggest quarterly)
● Take findings/risks and create tasks for improvement
● Publish your results and share them with your organization
● Do not tie ORA results to KPIs, incentives, etc
READ APPENDIX A!
Blameless Post Mortems
Blameless Post Mortems
● Document timeline of the incident
● With the team, determine:
○ What went well
○ What didn’t go well (process failures, technical root cause)
○ What was lucky (or circumstantial)
● For each thing that didn’t go well or was circumstantial:
○ File an action item to address it
○ Make sure they have clear acceptance criteria/requirements (grooming)
○ Make sure they have a clear level of effort (sizing)
○ Prioritize in the backlog based on relative risk
● Openly share the post-mortem with the rest of the company
● Review with the team periodically
Launch Readiness Criteria
What is Launch Readiness Criteria?
● A set of guidelines that represent the minimum standard of what a new
product launch requires from an operational standpoint
● Expressed in terms of the Operational Responsibility Assessment
● Intended to address the major forms of risk without introducing needless
roadblocks into the product launch process
● A living document that is continuously maintained and kept relevant
● Inspired by: https://landing.google.com/sre/book/chapters/reliable-product-
launches.html
Example LRC Checklist Items
LRC Enablement
Example Service Pages
Example Service Dashboard
Example Code
Example Operational Runbooks
Example Post Mortem/RCA Template
Create an Onboarding Process
Create an Onboarding Process
● Implement an Incident Response Process
○ On-Call Rotation
○ Documentation for stakeholders on how to get help
○ Fundamentals: production access credentials, runbooks
● Perform/Publish an Operational Responsibility Assessment
● Define/Publish Service Level Objectives
● Create Monitoring/Alerting against SLOs
● Create Dashboards For SLO performance and remaining error budget
Weekly Office Hours
How To Hire SREs?
Hire Software Developers
Hire Software Developers
Hire Operations People
Hire Operations People
What Makes a Good SRE?
● It’s complicated
● You want someone with the ability to contribute to a software engineering
project..
● Yet is motivated by operational concerns and understands the subject matter
(Linux, TCP/IP, monitoring, performance, config management..)
● Is willing to be on-call
● Knowledge of agile practices as a method to suggest improvements
● ‘SRE Temperament’: can communicate their opinions on something in a way
that is persuasive and data-driven
Selling Points for Prospective SREs
● Toil capped at 50%, that means 50%+ project work at all times!
● Authority to stop flow of releases when service is too unreliable
● There is oncall, but responsibility is shared with the whole team
● Root causes of outages are tracked, prioritized, and addressed
These Create A Work Environment That Respects The SRE
1 Year Retrospective
What Went Well
What Went Well
● Launch Readiness Criteria is now a corporate standard
● Teams are independently performing their own blameless post mortems
● Teams are independently performing their own ORAs
● SRE influenced a grassroots reorg of Cloud Engineering around SOA
● More and more teams are taking an active role in on-call responsibilities
● Weekly Office Hours has been an effective tool for sharing ideas
What Didn’t Go Well
What Didn’t Go Well
● We struggled with getting SLOs and error budgets established for all services
● We didn’t get Launch Readiness out the door fast enough for new services
Current Improvements
Current Improvements
● SRE engagements now require the onboarding process before any other
work can take place:
○ Establish Incident Response Process
○ Perform Operational Responsibility Assessment
○ Defining Service Level Objectives
○ Establishing Monitoring and Alerting Against SLOs
○ Create Dashboards Displaying SLOs and Error Budgets
● Operational Stories are required to be prioritized proportional to the SRE
presence on an engineering team.
“When we were in Ops, it was simple, because our purpose was to simply address the incident.
Our purpose now is to address the problems of the business.
We are the vehicle of change. That’s hard work, but we can do it.”
Questions?
Amin Astaneh
T: @aastaneh
M: amin.astaneh@acquia.com

Contenu connexe

Tendances

What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)jeetendra mandal
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Abeer R
 
Site reliability engineering
Site reliability engineeringSite reliability engineering
Site reliability engineeringJason Loeffler
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!New Relic
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkMichae Blakeney
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)Setyo Legowo
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRESquadcast Inc
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLADr Ganesh Iyer
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps.com
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationDr Ganesh Iyer
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsRauno De Pasquale
 
SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewDr Ganesh Iyer
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 

Tendances (20)

What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
 
Site reliability engineering
Site reliability engineeringSite reliability engineering
Site reliability engineering
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
 
SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning Talk
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
 
SRE vs DevOps
SRE vs DevOpsSRE vs DevOps
SRE vs DevOps
 
Sre summary
Sre summarySre summary
Sre summary
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRE
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overview
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
 

En vedette

Acquia Partner Program Update
Acquia Partner Program UpdateAcquia Partner Program Update
Acquia Partner Program UpdateAcquia
 
Acquia Content Hub: Connect Technologies & Extend Systems to Source Content
Acquia Content Hub: Connect Technologies & Extend Systems to Source ContentAcquia Content Hub: Connect Technologies & Extend Systems to Source Content
Acquia Content Hub: Connect Technologies & Extend Systems to Source ContentAcquia
 
Customer Journey Orchestration: The Secret to Effective Omnichannel Experiences
Customer Journey Orchestration: The Secret to Effective Omnichannel ExperiencesCustomer Journey Orchestration: The Secret to Effective Omnichannel Experiences
Customer Journey Orchestration: The Secret to Effective Omnichannel ExperiencesAcquia
 
Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...
Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...
Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...Acquia
 
Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...
Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...
Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...Acquia
 
Episode 2: Define Customer Segments Using a Data-driven Approach
Episode 2: Define Customer Segments Using a Data-driven ApproachEpisode 2: Define Customer Segments Using a Data-driven Approach
Episode 2: Define Customer Segments Using a Data-driven ApproachAcquia
 
PHP Performance tuning for Drupal 8
PHP Performance tuning for Drupal 8PHP Performance tuning for Drupal 8
PHP Performance tuning for Drupal 8Acquia
 
Episode 5: Using Technology to Accelerate Your Personalization Initiative
Episode 5: Using Technology to Accelerate Your Personalization InitiativeEpisode 5: Using Technology to Accelerate Your Personalization Initiative
Episode 5: Using Technology to Accelerate Your Personalization InitiativeAcquia
 
Questions To Ask Before a Drupal Project Kickoff
Questions To Ask Before a Drupal Project KickoffQuestions To Ask Before a Drupal Project Kickoff
Questions To Ask Before a Drupal Project KickoffAcquia
 
Building a foundation for the future of digital experience (oct 31, 2017)
Building a foundation for the future of digital experience (oct 31, 2017)Building a foundation for the future of digital experience (oct 31, 2017)
Building a foundation for the future of digital experience (oct 31, 2017)Acquia
 
Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...
Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...
Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...Acquia
 
Personalization How-To: Driving Conversions with Acquia Lift
Personalization How-To: Driving Conversions with Acquia LiftPersonalization How-To: Driving Conversions with Acquia Lift
Personalization How-To: Driving Conversions with Acquia LiftAcquia
 
Episode 4: Personalization Best Practices
Episode 4: Personalization Best PracticesEpisode 4: Personalization Best Practices
Episode 4: Personalization Best PracticesAcquia
 
Personalization Using Acquia Lift 2.0
Personalization Using Acquia Lift 2.0Personalization Using Acquia Lift 2.0
Personalization Using Acquia Lift 2.0Boston Interactive
 
A Professional Software Engineer's Checklist
A Professional Software Engineer's ChecklistA Professional Software Engineer's Checklist
A Professional Software Engineer's ChecklistAcquia
 
Build Personalization into Your Culture: Create Engaging Experiences for Ever...
Build Personalization into Your Culture: Create Engaging Experiences for Ever...Build Personalization into Your Culture: Create Engaging Experiences for Ever...
Build Personalization into Your Culture: Create Engaging Experiences for Ever...Acquia
 
How to Use the Salesforce Suite with Drupal 8: A Quick Start Guide
How to Use the Salesforce Suite with Drupal 8: A Quick Start GuideHow to Use the Salesforce Suite with Drupal 8: A Quick Start Guide
How to Use the Salesforce Suite with Drupal 8: A Quick Start GuideAcquia
 
Webinar: Vodafone and The Connected Customer Journey [10.19.2017]
Webinar: Vodafone and The Connected Customer Journey [10.19.2017]Webinar: Vodafone and The Connected Customer Journey [10.19.2017]
Webinar: Vodafone and The Connected Customer Journey [10.19.2017]Acquia
 
Across the spectrum different approaches to progressively decoupled drupal (...
Across the spectrum  different approaches to progressively decoupled drupal (...Across the spectrum  different approaches to progressively decoupled drupal (...
Across the spectrum different approaches to progressively decoupled drupal (...Acquia
 
Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...
Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...
Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...Acquia
 

En vedette (20)

Acquia Partner Program Update
Acquia Partner Program UpdateAcquia Partner Program Update
Acquia Partner Program Update
 
Acquia Content Hub: Connect Technologies & Extend Systems to Source Content
Acquia Content Hub: Connect Technologies & Extend Systems to Source ContentAcquia Content Hub: Connect Technologies & Extend Systems to Source Content
Acquia Content Hub: Connect Technologies & Extend Systems to Source Content
 
Customer Journey Orchestration: The Secret to Effective Omnichannel Experiences
Customer Journey Orchestration: The Secret to Effective Omnichannel ExperiencesCustomer Journey Orchestration: The Secret to Effective Omnichannel Experiences
Customer Journey Orchestration: The Secret to Effective Omnichannel Experiences
 
Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...
Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...
Tomorrow’s Personalization Today: Increase User Engagement with Content in Co...
 
Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...
Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...
Drupal 8 Lessons From the Field: What is Continuous Delivery and Why it’s imp...
 
Episode 2: Define Customer Segments Using a Data-driven Approach
Episode 2: Define Customer Segments Using a Data-driven ApproachEpisode 2: Define Customer Segments Using a Data-driven Approach
Episode 2: Define Customer Segments Using a Data-driven Approach
 
PHP Performance tuning for Drupal 8
PHP Performance tuning for Drupal 8PHP Performance tuning for Drupal 8
PHP Performance tuning for Drupal 8
 
Episode 5: Using Technology to Accelerate Your Personalization Initiative
Episode 5: Using Technology to Accelerate Your Personalization InitiativeEpisode 5: Using Technology to Accelerate Your Personalization Initiative
Episode 5: Using Technology to Accelerate Your Personalization Initiative
 
Questions To Ask Before a Drupal Project Kickoff
Questions To Ask Before a Drupal Project KickoffQuestions To Ask Before a Drupal Project Kickoff
Questions To Ask Before a Drupal Project Kickoff
 
Building a foundation for the future of digital experience (oct 31, 2017)
Building a foundation for the future of digital experience (oct 31, 2017)Building a foundation for the future of digital experience (oct 31, 2017)
Building a foundation for the future of digital experience (oct 31, 2017)
 
Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...
Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...
Lightning Distribution for Drupal: Build Advanced Authoring Experiences in Dr...
 
Personalization How-To: Driving Conversions with Acquia Lift
Personalization How-To: Driving Conversions with Acquia LiftPersonalization How-To: Driving Conversions with Acquia Lift
Personalization How-To: Driving Conversions with Acquia Lift
 
Episode 4: Personalization Best Practices
Episode 4: Personalization Best PracticesEpisode 4: Personalization Best Practices
Episode 4: Personalization Best Practices
 
Personalization Using Acquia Lift 2.0
Personalization Using Acquia Lift 2.0Personalization Using Acquia Lift 2.0
Personalization Using Acquia Lift 2.0
 
A Professional Software Engineer's Checklist
A Professional Software Engineer's ChecklistA Professional Software Engineer's Checklist
A Professional Software Engineer's Checklist
 
Build Personalization into Your Culture: Create Engaging Experiences for Ever...
Build Personalization into Your Culture: Create Engaging Experiences for Ever...Build Personalization into Your Culture: Create Engaging Experiences for Ever...
Build Personalization into Your Culture: Create Engaging Experiences for Ever...
 
How to Use the Salesforce Suite with Drupal 8: A Quick Start Guide
How to Use the Salesforce Suite with Drupal 8: A Quick Start GuideHow to Use the Salesforce Suite with Drupal 8: A Quick Start Guide
How to Use the Salesforce Suite with Drupal 8: A Quick Start Guide
 
Webinar: Vodafone and The Connected Customer Journey [10.19.2017]
Webinar: Vodafone and The Connected Customer Journey [10.19.2017]Webinar: Vodafone and The Connected Customer Journey [10.19.2017]
Webinar: Vodafone and The Connected Customer Journey [10.19.2017]
 
Across the spectrum different approaches to progressively decoupled drupal (...
Across the spectrum  different approaches to progressively decoupled drupal (...Across the spectrum  different approaches to progressively decoupled drupal (...
Across the spectrum different approaches to progressively decoupled drupal (...
 
Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...
Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...
Acquia Lift for Site Builders: How to Define Campaigns, Set Up Tests, and Int...
 

Similaire à A Crash Course in Building Site Reliability

S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsRicardo Amaro
 
On the road to Engineering excellence
On the road to Engineering excellenceOn the road to Engineering excellence
On the road to Engineering excellenceAlexander Mrynskyi
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS
 
AWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence PillarAWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence PillarJonathan LaCour
 
Continuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps SuccessContinuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps SuccessTechWell
 
Unified process,agile process,process assesment ppt
Unified process,agile process,process assesment pptUnified process,agile process,process assesment ppt
Unified process,agile process,process assesment pptShweta Ghate
 
How Salesforce built a Scalable, World-Class, Performance Engineering Team
How Salesforce built a Scalable, World-Class, Performance Engineering TeamHow Salesforce built a Scalable, World-Class, Performance Engineering Team
How Salesforce built a Scalable, World-Class, Performance Engineering TeamSalesforce Developers
 
TDWI STL 20140613 Agile - Paul Holway
TDWI STL 20140613 Agile - Paul HolwayTDWI STL 20140613 Agile - Paul Holway
TDWI STL 20140613 Agile - Paul HolwayTDWI St. Louis
 
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfPhil Johnson
 
Keys to Successful Cohabitation: Governance and Autonomous Teams
Keys to Successful Cohabitation: Governance and Autonomous TeamsKeys to Successful Cohabitation: Governance and Autonomous Teams
Keys to Successful Cohabitation: Governance and Autonomous TeamsDevOps.com
 
Improving software quality for the future of connected vehicles
Improving software quality for the future of connected vehiclesImproving software quality for the future of connected vehicles
Improving software quality for the future of connected vehiclesDevon Bleibtrey
 
DevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday KumarDevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday KumaroGuild .
 
Dev ops != Dev+Ops
Dev ops != Dev+OpsDev ops != Dev+Ops
Dev ops != Dev+OpsShalu Ahuja
 

Similaire à A Crash Course in Building Site Reliability (20)

S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Fundamentals of Agile
Fundamentals of AgileFundamentals of Agile
Fundamentals of Agile
 
Agile at scale
Agile at scaleAgile at scale
Agile at scale
 
On the road to Engineering excellence
On the road to Engineering excellenceOn the road to Engineering excellence
On the road to Engineering excellence
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
 
AWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence PillarAWS Well-Architected Framework: Operational Excellence Pillar
AWS Well-Architected Framework: Operational Excellence Pillar
 
Agile webinar pack (2)
Agile webinar pack (2)Agile webinar pack (2)
Agile webinar pack (2)
 
Continuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps SuccessContinuous Testing: A Key to DevOps Success
Continuous Testing: A Key to DevOps Success
 
Unified process,agile process,process assesment ppt
Unified process,agile process,process assesment pptUnified process,agile process,process assesment ppt
Unified process,agile process,process assesment ppt
 
RESUME_RAJESH CHERUKURI
RESUME_RAJESH CHERUKURIRESUME_RAJESH CHERUKURI
RESUME_RAJESH CHERUKURI
 
How Salesforce built a Scalable, World-Class, Performance Engineering Team
How Salesforce built a Scalable, World-Class, Performance Engineering TeamHow Salesforce built a Scalable, World-Class, Performance Engineering Team
How Salesforce built a Scalable, World-Class, Performance Engineering Team
 
TDWI STL 20140613 Agile - Paul Holway
TDWI STL 20140613 Agile - Paul HolwayTDWI STL 20140613 Agile - Paul Holway
TDWI STL 20140613 Agile - Paul Holway
 
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
 
Keys to Successful Cohabitation: Governance and Autonomous Teams
Keys to Successful Cohabitation: Governance and Autonomous TeamsKeys to Successful Cohabitation: Governance and Autonomous Teams
Keys to Successful Cohabitation: Governance and Autonomous Teams
 
Demystifying Devops - Uday kumar
Demystifying Devops - Uday kumarDemystifying Devops - Uday kumar
Demystifying Devops - Uday kumar
 
Improving software quality for the future of connected vehicles
Improving software quality for the future of connected vehiclesImproving software quality for the future of connected vehicles
Improving software quality for the future of connected vehicles
 
DevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday KumarDevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday Kumar
 
Test i agile projekter af Gitte Ottosen, Sogeti
Test i agile projekter af Gitte Ottosen, SogetiTest i agile projekter af Gitte Ottosen, Sogeti
Test i agile projekter af Gitte Ottosen, Sogeti
 
Baksheesh.Singh.Gurudatta_Resume
Baksheesh.Singh.Gurudatta_ResumeBaksheesh.Singh.Gurudatta_Resume
Baksheesh.Singh.Gurudatta_Resume
 
Dev ops != Dev+Ops
Dev ops != Dev+OpsDev ops != Dev+Ops
Dev ops != Dev+Ops
 

Plus de Acquia

Acquia_Adcetera Webinar_Marketing Automation.pdf
Acquia_Adcetera Webinar_Marketing Automation.pdfAcquia_Adcetera Webinar_Marketing Automation.pdf
Acquia_Adcetera Webinar_Marketing Automation.pdfAcquia
 
Acquia Webinar Deck - 9_13 .pdf
Acquia Webinar Deck - 9_13 .pdfAcquia Webinar Deck - 9_13 .pdf
Acquia Webinar Deck - 9_13 .pdfAcquia
 
Taking Your Multi-Site Management at Scale to the Next Level
Taking Your Multi-Site Management at Scale to the Next LevelTaking Your Multi-Site Management at Scale to the Next Level
Taking Your Multi-Site Management at Scale to the Next LevelAcquia
 
CDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdfCDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdfAcquia
 
May Partner Bootcamp 2022
May Partner Bootcamp 2022May Partner Bootcamp 2022
May Partner Bootcamp 2022Acquia
 
April Partner Bootcamp 2022
April Partner Bootcamp 2022April Partner Bootcamp 2022
April Partner Bootcamp 2022Acquia
 
How to Unify Brand Experience: A Hootsuite Story
How to Unify Brand Experience: A Hootsuite Story How to Unify Brand Experience: A Hootsuite Story
How to Unify Brand Experience: A Hootsuite Story Acquia
 
Using Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CX
Using Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CXUsing Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CX
Using Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CXAcquia
 
Improve Code Quality and Time to Market: 100% Cloud-Based Development Workflow
Improve Code Quality and Time to Market: 100% Cloud-Based Development WorkflowImprove Code Quality and Time to Market: 100% Cloud-Based Development Workflow
Improve Code Quality and Time to Market: 100% Cloud-Based Development WorkflowAcquia
 
September Partner Bootcamp
September Partner BootcampSeptember Partner Bootcamp
September Partner BootcampAcquia
 
August partner bootcamp
August partner bootcampAugust partner bootcamp
August partner bootcampAcquia
 
July 2021 Partner Bootcamp
July  2021 Partner BootcampJuly  2021 Partner Bootcamp
July 2021 Partner BootcampAcquia
 
May Partner Bootcamp
May Partner BootcampMay Partner Bootcamp
May Partner BootcampAcquia
 
DRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASY
DRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASYDRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASY
DRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASYAcquia
 
Work While You Sleep: The CMO’s Guide to a 24/7/365 Lead Machine
Work While You Sleep: The CMO’s Guide to a 24/7/365 Lead MachineWork While You Sleep: The CMO’s Guide to a 24/7/365 Lead Machine
Work While You Sleep: The CMO’s Guide to a 24/7/365 Lead MachineAcquia
 
Acquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B Leads
Acquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B LeadsAcquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B Leads
Acquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B LeadsAcquia
 
April partner bootcamp deck cookieless future
April partner bootcamp deck  cookieless futureApril partner bootcamp deck  cookieless future
April partner bootcamp deck cookieless futureAcquia
 
How to enhance cx through personalised, automated solutions
How to enhance cx through personalised, automated solutionsHow to enhance cx through personalised, automated solutions
How to enhance cx through personalised, automated solutionsAcquia
 
DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...
DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...
DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...Acquia
 
Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021
Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021
Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021Acquia
 

Plus de Acquia (20)

Acquia_Adcetera Webinar_Marketing Automation.pdf
Acquia_Adcetera Webinar_Marketing Automation.pdfAcquia_Adcetera Webinar_Marketing Automation.pdf
Acquia_Adcetera Webinar_Marketing Automation.pdf
 
Acquia Webinar Deck - 9_13 .pdf
Acquia Webinar Deck - 9_13 .pdfAcquia Webinar Deck - 9_13 .pdf
Acquia Webinar Deck - 9_13 .pdf
 
Taking Your Multi-Site Management at Scale to the Next Level
Taking Your Multi-Site Management at Scale to the Next LevelTaking Your Multi-Site Management at Scale to the Next Level
Taking Your Multi-Site Management at Scale to the Next Level
 
CDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdfCDP for Retail Webinar with Appnovation - Q2 2022.pdf
CDP for Retail Webinar with Appnovation - Q2 2022.pdf
 
May Partner Bootcamp 2022
May Partner Bootcamp 2022May Partner Bootcamp 2022
May Partner Bootcamp 2022
 
April Partner Bootcamp 2022
April Partner Bootcamp 2022April Partner Bootcamp 2022
April Partner Bootcamp 2022
 
How to Unify Brand Experience: A Hootsuite Story
How to Unify Brand Experience: A Hootsuite Story How to Unify Brand Experience: A Hootsuite Story
How to Unify Brand Experience: A Hootsuite Story
 
Using Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CX
Using Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CXUsing Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CX
Using Personas to Guide DAM Results: How Life Time Pumped Up Their UX and CX
 
Improve Code Quality and Time to Market: 100% Cloud-Based Development Workflow
Improve Code Quality and Time to Market: 100% Cloud-Based Development WorkflowImprove Code Quality and Time to Market: 100% Cloud-Based Development Workflow
Improve Code Quality and Time to Market: 100% Cloud-Based Development Workflow
 
September Partner Bootcamp
September Partner BootcampSeptember Partner Bootcamp
September Partner Bootcamp
 
August partner bootcamp
August partner bootcampAugust partner bootcamp
August partner bootcamp
 
July 2021 Partner Bootcamp
July  2021 Partner BootcampJuly  2021 Partner Bootcamp
July 2021 Partner Bootcamp
 
May Partner Bootcamp
May Partner BootcampMay Partner Bootcamp
May Partner Bootcamp
 
DRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASY
DRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASYDRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASY
DRUPAL 7 END OF LIFE IS NEAR - MIGRATE TO DRUPAL 9 FAST AND EASY
 
Work While You Sleep: The CMO’s Guide to a 24/7/365 Lead Machine
Work While You Sleep: The CMO’s Guide to a 24/7/365 Lead MachineWork While You Sleep: The CMO’s Guide to a 24/7/365 Lead Machine
Work While You Sleep: The CMO’s Guide to a 24/7/365 Lead Machine
 
Acquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B Leads
Acquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B LeadsAcquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B Leads
Acquia webinar: Leveraging Drupal to Bury Your Sales Team In B2B Leads
 
April partner bootcamp deck cookieless future
April partner bootcamp deck  cookieless futureApril partner bootcamp deck  cookieless future
April partner bootcamp deck cookieless future
 
How to enhance cx through personalised, automated solutions
How to enhance cx through personalised, automated solutionsHow to enhance cx through personalised, automated solutions
How to enhance cx through personalised, automated solutions
 
DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...
DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...
DRUPAL MIGRATIONS AND DRUPAL 9 INNOVATION: HOW PAC-12 DELIVERED DIGITALLY FOR...
 
Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021
Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021
Customer Experience (CX): 3 Key Factors Shaping CX Redesign in 2021
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

A Crash Course in Building Site Reliability

  • 1. Building Site Reliability Engineering: A Crash Course Amin Astaneh, Acquia Inc.
  • 2. Who am I? ● Senior Manager, SRE at Acquia ● Was in Operations Team from Dec 2010 - Nov 2015 ● Built and Lead the Site Reliability Engineering Team
  • 3. Agenda ● What is SRE? ● Why Do SRE? ● Acquia, Pre-SRE ● How Acquia Does SRE ● Building an SRE Competency ● How to Hire SREs? ● 1-Year Retrospective
  • 5. What is SRE? “What happens when a software engineer is tasked with what used to be called operations.” - Ben Treynor, Google
  • 6. What is SRE? SRE takes the manual processes associated with Operations..
  • 7. What is SRE? ..and replaces them with automation using software engineering.
  • 8. What is SRE? They also use a set of methodologies and best practices that help engineering teams create a mature and sustainable process for service ownership.
  • 9. How Does This Relate to DevOps? DevOps is a set of values, tools, and processes that allow teams to best deliver value to the customer. Therefore, SRE can be considered a specific implementation of DevOps.
  • 12. 2) Have SLO(s) for your service.
  • 13. What are SLOs? ● SLI: Service Level Indicators (What to Measure) ● SLOs: Service Level Objectives (Targets for Measurements) ● SLAs: Service Level Agreements (Consequences for Missing Targets)
  • 14. 3) Measure and report performance against the SLO(s).
  • 15. 4) Use Error Budgets and gate launches on them.
  • 16. 5) Have a common staffing pool for SRE and developers.
  • 17. 6) Cap SRE operational load at 50%.
  • 18. 7) Have excess Ops work overflow to the Dev Team.
  • 19. 8) Share 5% of Ops work with the Dev Team.
  • 20. 9) Oncall teams should have at least eight people at one location, or 6 people at each of multiple locations.
  • 21. 10) Aim for a maximum of two events per oncall shift.
  • 22. 11) Do a postmortem for every event.
  • 23. 12) Postmortems are blameless and focus on process and technology, not people.
  • 25. Scale
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Things We Tried First ● Implemented Kanban for Ops to make work visible and maximize throughput ● Did ‘Tier 2 Sprints’ to build automation for the team ● Generated team metrics to influence decision-making “People Metrics: How to Use Team Data to Produce Positive Change” https://events.drupal.org/dublin2016/sessions/people-metrics
  • 35. How Acquia Does SRE Acquia SRE was commissioned as the driving force of our DevOps Initiative, which has the following core values: ● Eliminate Toil ● No Capes ● Deliver With Empathy ● Own Your Service ● Own Your Business ● Own Customer Success
  • 36. Acquia SRE vs Google SRE ● We embed engineers on teams, rather than build teams that run services on behalf of engineers ● The entire engineering team (plus the SRE) is expected to ‘own their service’, with the SRE providing leadership on how to best handle those responsibilities ● The SRE identifies risk as part of their day-to-day and brings improvement opportunities directly to the Product Manager for prioritization
  • 37. Acquia SRE vs Google SRE ● We evaluate with Engineering and Product what the most critical projects are on a quarterly basis, and allocate the team to best meet the present need ● We still reserve the right to remove engineers if an engagement becomes untenable, though it has not yet been necessary ● We have a heavy focus on time tracking to aid in toil reduction
  • 38. 8) Share 5% of Ops work with the Dev Team.
  • 39. 8) Share 5% of Ops work with the Dev Team.
  • 40. 8) Ops work IS the responsibility of the Dev Team.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45. Building A SRE Competency
  • 47. SRE Won’t Work Without Two Things ● Authority to stop releases when the error budget has been exhausted ● Authority to overflow operational work to the dev team when operational load > 50% This must be given from lead of engineering/product efforts. DO NOT CONTINUE UNLESS YOU HAVE THESE!
  • 48. How Do You Get Buy-In?
  • 49. Establish a Sense of Urgency! https://events.drupal.org/baltimore2017/sessions/%C2%A1viva-la-revoluci%C3%B3n-how- start-devops-transformation-your-workplace
  • 51. SRE Operational Load Dashboard
  • 53. Operational Responsibility Assessment ● Based on the Capability Maturity Model (https://en.wikipedia.org/wiki/Capability_Maturity_Model) ● Evaluates the following responsibilities: ○ Routine Tasks ○ Emergency Response ○ Monitoring and Metrics ○ Capacity Planning ○ Change Management ○ New Product Introduction and Removal ○ Service Deploy and Decommissioning ○ Performance and Efficiency ○ Information Security
  • 54. Operational Responsibility Assessment Each responsibility is scored from 1-5: 1. Initial: Chaotic. Undocumented, ad-hoc, and require individual heroics. 2. Repeatable: Documented sufficiently so they can be repeated with the same results. 3. Defined: Roles and responsibilities for the process are defined and confirmed. 4. Managed: The process is quantitatively managed in accordance with agreed- upon metrics. 5. Optimizing: Process management includes deliberate process
  • 55.
  • 56. Operational Responsibility Assessment ● Assess your services often! (we suggest quarterly) ● Take findings/risks and create tasks for improvement ● Publish your results and share them with your organization ● Do not tie ORA results to KPIs, incentives, etc
  • 59. Blameless Post Mortems ● Document timeline of the incident ● With the team, determine: ○ What went well ○ What didn’t go well (process failures, technical root cause) ○ What was lucky (or circumstantial) ● For each thing that didn’t go well or was circumstantial: ○ File an action item to address it ○ Make sure they have clear acceptance criteria/requirements (grooming) ○ Make sure they have a clear level of effort (sizing) ○ Prioritize in the backlog based on relative risk ● Openly share the post-mortem with the rest of the company ● Review with the team periodically
  • 61. What is Launch Readiness Criteria? ● A set of guidelines that represent the minimum standard of what a new product launch requires from an operational standpoint ● Expressed in terms of the Operational Responsibility Assessment ● Intended to address the major forms of risk without introducing needless roadblocks into the product launch process ● A living document that is continuously maintained and kept relevant ● Inspired by: https://landing.google.com/sre/book/chapters/reliable-product- launches.html
  • 70. Create an Onboarding Process ● Implement an Incident Response Process ○ On-Call Rotation ○ Documentation for stakeholders on how to get help ○ Fundamentals: production access credentials, runbooks ● Perform/Publish an Operational Responsibility Assessment ● Define/Publish Service Level Objectives ● Create Monitoring/Alerting against SLOs ● Create Dashboards For SLO performance and remaining error budget
  • 72. How To Hire SREs?
  • 77. What Makes a Good SRE? ● It’s complicated ● You want someone with the ability to contribute to a software engineering project.. ● Yet is motivated by operational concerns and understands the subject matter (Linux, TCP/IP, monitoring, performance, config management..) ● Is willing to be on-call ● Knowledge of agile practices as a method to suggest improvements ● ‘SRE Temperament’: can communicate their opinions on something in a way that is persuasive and data-driven
  • 78. Selling Points for Prospective SREs ● Toil capped at 50%, that means 50%+ project work at all times! ● Authority to stop flow of releases when service is too unreliable ● There is oncall, but responsibility is shared with the whole team ● Root causes of outages are tracked, prioritized, and addressed These Create A Work Environment That Respects The SRE
  • 81. What Went Well ● Launch Readiness Criteria is now a corporate standard ● Teams are independently performing their own blameless post mortems ● Teams are independently performing their own ORAs ● SRE influenced a grassroots reorg of Cloud Engineering around SOA ● More and more teams are taking an active role in on-call responsibilities ● Weekly Office Hours has been an effective tool for sharing ideas
  • 83. What Didn’t Go Well ● We struggled with getting SLOs and error budgets established for all services ● We didn’t get Launch Readiness out the door fast enough for new services
  • 85. Current Improvements ● SRE engagements now require the onboarding process before any other work can take place: ○ Establish Incident Response Process ○ Perform Operational Responsibility Assessment ○ Defining Service Level Objectives ○ Establishing Monitoring and Alerting Against SLOs ○ Create Dashboards Displaying SLOs and Error Budgets ● Operational Stories are required to be prioritized proportional to the SRE presence on an engineering team.
  • 86. “When we were in Ops, it was simple, because our purpose was to simply address the incident. Our purpose now is to address the problems of the business. We are the vehicle of change. That’s hard work, but we can do it.”
  • 88. Amin Astaneh T: @aastaneh M: amin.astaneh@acquia.com

Notes de l'éditeur

  1. Small, well-trained Ops team separate from the dev team
  2. Hockey-Stick growth of customers created hockey-stick growth of operational work. In particular, troubleshooting and fixing broken infrastructure in Acquia’s products.
  3. Ops became a constraint in service delivery