Ten Lessons of the DevOps Transition

Ten (Hard-Won) Lessons
of the DevOps Transition
Randy Shoup
@randyshoup
linkedin.com/in/randyshoup

1. Reorganize Teams
Around Ownership
• End-to-end Ownership
o Small, cross-functional team owns application / service from design to
deployment to retirement
o Team has inside it all skill sets needed to do the job
o Depends on other teams for supporting services
o Able to move very rapidly and independently
• “You build it, you run it”
o The same team that builds the software operates the software
o No separate maintenance or sustaining engineering team

1. Reorganize Teams
Around Ownership
• E.g., KIXEYE and MySQL
o Development team wrote the SQL, issued all the queries
o DBA / Ops team responsible for performance and uptime
o Splitting ownership between teams was counterproductive and disruptive
• Alternative strategies
o Centrally-maintained persistence service
OR
o Customer manages its own persistence

2. Lose the
Ticket Culture
Ticket Culture Ownership Culture
Do what is asked for Do what is needed
One-way communication Two-way collaboration
Goal is to close the ticket Goal is product success
Reactive approach Proactive approach
Reinforces silos Reinforces collaboration
Prioritizes process Prioritizes results

3. Replace Approvals
With Code
• Reduce or eliminate approval bodies
o E.g., eBay Architecture Review Board
o (-) Too late
o (-) Too slow
o (-) Too disengaged from details
• Package expertise in code
o Smart, experienced people build their knowledge into code
o Teams with specialized skills (databases, security, compliance, etc.) provide a
service, library, or tool

3. Replace Approvals
With Code
• E.g., Security at Google
o Provide secure foundations by maintaining lower-level libraries and services
o Provide self-service penetration tests, vulnerability assessments, etc.

The easiest way to “enforce” a
standard practice is with
working code.

4. Enforce a
Service Mentality
• Vendor-Customer Discipline
o Service team is a vendor; the products are its customers
o Service is useful only to the extent it provides value to its customers
• Customer can choose to use service or not (!)
o Customer team is responsible for deciding what is best for their use case
o Use the right tool for the right job
• Provides powerful incentives
o Service must be *strictly better* than the alternatives of build, buy, borrow

5. Charge for
Usage
• Charge customers for *usage* of the service
o Aligns economic incentives of customer and provider
o Motivates both sides to optimize efficiency
• Free usage leads to waste
o No incentive to control usage or find more efficient alternatives
• E.g., App Engine usage at Google
o Charging particularly egregious internal customer led to 10x reduction in usage

6. Prioritize
Quality
• Quality, Performance, and Reliability are “Priority-0
features”
o “Stop the line” if there is a degradation
o Equally important to users as product features or engaging user experience
• Developers write tests and code together
o Continuous testing of features, performance, load
o Confidence to make risky changes
• “Slow down to speed up”
o Catch bugs earlier, fail faster

6. Prioritize
Quality
• E.g., Development Process at Google
o Code reviews before submission
o Automated tests for everything
o Single searchable source code repository
 Internal Open Source Model
o Not “here is a bug report”
o Instead “here is the bug; here is the code fix; here is the test that verifies the fix”


7. Start Investing
in Testing
• Write functional tests around a component
o If you can only write a few tests, they should be meaningful ones
o End-to-end tests exercise more meaningful customer-visible capabilities than unit
tests
• Fail any build that breaks a test
• Keep ratcheting up the tests
o For every new feature, add tests for that feature
o For every new bug, add a test that reproduces the bug and verifies the fix

8. Actively Manage
Technical Debt
• Maintain sustainable and well-understood level of debt
o Denominated in engineering effort to fix
o Plan for how and when you will pay it off
o Track feature work vs. accrued debt over time
• “Don’t have time to do it right” ?
o WRONG  – Don’t have time to do it twice (!)
o The more constrained you are on time and resources, the more important it is to
do it solidly the first time

Vicious Cycle
of Technical Debt
Technical
Debt
“No time
to do it
right”
Quick-
and-dirty

Virtuous Cycle
of Investment
Solid
Foundation
Confidence
Faster and
Better
Invest in
Quality

9. Share
On-call Duties
• All members of the team rotate on-call responsibilities
o Strongest motivator to build in solid monitoring and diagnosis capabilities
o Best way to learn the real-world behavior of the system
o Best way to develop empathy for customers and other team members
• Train via on-call “apprenticeship”
o 1. Apprentice starts as secondary on-call, experienced engineer is primary
o 2. Apprentice is primary, experienced engineer is secondary
o 3. Apprentice graduates

10. Make Post-Mortems
Truly Blameless
• Overcoming blame culture takes work
o Institutional memory of blame is long
o E.g., Initial post-mortems at KIXEYE elicited tons of fear
• Constantly reinforce learning over blame
o When you say “blameless”, you have to really mean it (!)
o Don’t ask “what did you do?”, ask “what did you learn?”

10. Make Post-Mortems
Truly Blameless
• Open and Honest Discussion
o Document exactly what happened
o What went right
o What went wrong
• Focus on Learning and Improvement
o How should we change process, technology, documentation, etc.
o How could we have automated the problems away?
o How could we have diagnosed more quickly?
• Take fear and personalization out of it
 Engineers will compete to take personal responsibility (!)
 “Finally we can fix that broken system” 

Top Five
Takeaways
• 1. Reorganize Teams Around Ownership
• 2. Replace Approvals With Code
• 3. Prioritize Quality
• 4. Actively Manage Technical Debt
• 5. Make Post-Mortems Truly Blameless

What I Could
Use Help With
• Encouraging leaders to lose the blame culture
• Measuring productivity in a principled way
• Overcoming resistance to taking the pager

Thank You!
• @randyshoup
• linkedin.com/in/randyshoup

Ten Lessons of the DevOps Transition

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Ten Lessons of the DevOps Transition

Similaire à Ten Lessons of the DevOps Transition (20)

Plus de Randy Shoup

Plus de Randy Shoup (8)

Dernier

Dernier (20)

Ten Lessons of the DevOps Transition