Building Production-Ready Microservices: DevopsExchangeSF

•Télécharger en tant que PPTX, PDF•

0 j'aime•452 vues

Michael Kehoe talks about what production-ready means, how to build production-ready microservices and how to measure their readiness

Formation

How to Build Production-Ready
Microservices
Michael Kehoe
Staff Site Reliability Engineer

Today’s
agenda
1 Introductions
2 Tenets of Readiness
3 Creating Measurable guidelines
4 Measuring readiness
5 Key Learning’s
6 Q&A

Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Worked on:
• Networks
• Microservices
• Traffic Engineering
• Databases

Production-SRE Team @ LinkedIn
$ WHOAMI
• Disaster Recovery - Planning &
Automation
• Incident Response – Process &
Automation
• Visibility Engineering – Making use of
operational data
• Reliability Principles – Defining best
practice & automating it

O’Reilly 2017
Susan J. Fowler
Production-Ready
Microservices

“A production-ready application or service is
one that can be trusted to serve production
traffic…”
S U S A N J . F O W L E R

“… We trust it to behave reasonably, we trust it
to perform reliably, we trust it to get the job
done and to do its job well with very little
downtime.”
S U S A N J . F O W L E R

Tenets of
Readine
ss
1 Stability & Reliability
2 Scalability & Performance
3 Fault Tolerance and DR
4 Monitoring
5 Documentation

Tenets of Readiness
STABILITY
• Stable development cycle
• Stable deployment cycle

Tenets of Readiness
RELIABILITY
• Dependency Management
• Onboarding + Deprecation procedures
• Routing + Discovery

Tenets of Readiness
SCALABILITY
• Understanding growth-scales
• Resource awareness
• Dependency scaling

Tenets of Readiness
PERFORMANCE
• Constant performance evaluation
• Traffic management
• Capacity Planning

Tenets of Readiness
FAULT TOLERANCE
• Avoiding Single Points of Failure (SPOF)
• Resiliency Engineering

Tenets of Readiness
DISASTER RECOVERY
• Understand failure scenario’s
• Disaster Recovery Plans
• Incident Management

Tenets of Readiness
MONITORING
• Dashboards + Alerting for:
• Service
• Resource Allocation
• Infrastructure
• All alerts are actionable and have pre-
documented procedures.
• Logging

Tenets of Readiness
DOCUMENTATION
• Have one central landing-place for
documentation for the service
• Review of documentation from Engineer/
SRE/ Partners
• Reviewed Regularly

Tenets of Readiness
DOCUMENTATION
• What should documentation include:
• Key information (ports/ hostnames
etc)
• Description
• Architecture Diagram
• API description
• Oncall information
• Onboarding information

Creating Measurable Guidelines
• Not all guidelines directly translate into
something measurable
• You may need to look outcomes of
specific guidance to create measurable
guidelines

Creating Measurable Guidelines
EXAMPLE
• Stability:
• Stable development cycle
• Stable deployment process
• Stable introduction and deprecation
procedures

Creating Measurable Guidelines
EXAMPLE
• Stable development cycle
 Is the unit-test coverage above X %?
 Has this code-base been built in the
last week?
 Is there a staging environment for the
application?

Creating Measurable Guidelines
EXAMPLE
• Stable deployment process
 Has the application been deployed
recently?
 What is the successful deployment
percentage?

Measuring Readiness
WHY?
Ensuring that services
are built and operated
in a standard manner
Standardization
Ensuring that services
are trustworthy
Quality
Assurance

Measuring Readiness
HOW?
Create a manual
checklist
Manual
Checklists
Automate the discovery
and measurement of
readiness
Automated
Scorecards

Breakdown of scores by team
Automated Service Discovery
Service Score Card
Automated Scorecards

Team Overview
Service Score Card
Automated Scorecards

Breakdown of readiness results
Service Score Card
Automated Scorecards

Key Learnings
A set of guidelines for
what it means for
your service to be
‘ready’
Create
Automate the
checking and scoring
of these guidelines
Automate
Set the expectation
between
Engineering/ Product/
SRE that these
guidelines have to be
met
Evangelize

Building Production-Ready Microservices: DevopsExchangeSF

Recommandé

The Next Wave of Reliability EngineeringMichael Kehoe

How Small Team Get Ready for SRE (public version)Setyo Legowo

SRE in Enterprise - Local Journey DevopsDays GalwayKevin Connaughton

10 Reasons Why You Should Consider Google App Engine (GAE) for Your Next ProjectAbeer R

Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecurePuppet

A Crash Course in Building Site ReliabilityAcquia

SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt

Bjorn Rabenstein. SRE, DevOps, Google, and youIT Arena

Recommandé

The Next Wave of Reliability EngineeringMichael Kehoe

How Small Team Get Ready for SRE (public version)Setyo Legowo

SRE in Enterprise - Local Journey DevopsDays GalwayKevin Connaughton

10 Reasons Why You Should Consider Google App Engine (GAE) for Your Next ProjectAbeer R

Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecurePuppet

A Crash Course in Building Site ReliabilityAcquia

SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt

Bjorn Rabenstein. SRE, DevOps, Google, and youIT Arena

SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...New Relic

Quality Enablement - Agile Practices with Quality Enablement Microsoft Visual Studio

2016 State of DevOps Report WebinarPuppet

DevOps drivein - Mind the GapSerena Software

Diving Deeper into DevOps DeploymentsJules Pierre-Louis

Site (Service) Reliability EngineeringMark Underwood

SRE in ApiaryLadislav Prskavec

SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv

Devops: Security's big opportunity by Peter ChestnaDevSecCon

SRE vs DevOpsLevon Avakyan

DevOps: A Value PropositionNicole Forsgren

Continuous Testing: A Key to DevOps SuccessTechWell

Rapid Strategic SRE AssessmentsMarc Hornbeek

Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Hannes Lenke

The Anti-Transformation transformation @DevOps Summit AmsterdamMirco Hering

DevOpsDays - Pick any Three - Devops from scratchPete Cheslock

3 Steps to Expand DevOps and Automation Throughout the EnterprisePuppet

Scaling Enterprise DevOps with CloudBeesDeborah Schalm

Building DevOps ToolchainIBM UrbanCode Products

Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs

QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe

PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe

Contenu connexe

Tendances

SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...New Relic

Quality Enablement - Agile Practices with Quality Enablement Microsoft Visual Studio

2016 State of DevOps Report WebinarPuppet

DevOps drivein - Mind the GapSerena Software

Diving Deeper into DevOps DeploymentsJules Pierre-Louis

Site (Service) Reliability EngineeringMark Underwood

SRE in ApiaryLadislav Prskavec

SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv

Devops: Security's big opportunity by Peter ChestnaDevSecCon

SRE vs DevOpsLevon Avakyan

DevOps: A Value PropositionNicole Forsgren

Continuous Testing: A Key to DevOps SuccessTechWell

Rapid Strategic SRE AssessmentsMarc Hornbeek

Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Hannes Lenke

The Anti-Transformation transformation @DevOps Summit AmsterdamMirco Hering

DevOpsDays - Pick any Three - Devops from scratchPete Cheslock

3 Steps to Expand DevOps and Automation Throughout the EnterprisePuppet

Scaling Enterprise DevOps with CloudBeesDeborah Schalm

Building DevOps ToolchainIBM UrbanCode Products

Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs

Tendances (20)

SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...

Quality Enablement - Agile Practices with Quality Enablement

2016 State of DevOps Report Webinar

DevOps drivein - Mind the Gap

Diving Deeper into DevOps Deployments

Site (Service) Reliability Engineering

SRE in Apiary

SRE (service reliability engineer) on big DevOps platform running on the clou...

Devops: Security's big opportunity by Peter Chestna

SRE vs DevOps

DevOps: A Value Proposition

Continuous Testing: A Key to DevOps Success

Rapid Strategic SRE Assessments

Reliability (R)evolution: Turning the DevOps World Upside Down (Again).

The Anti-Transformation transformation @DevOps Summit Amsterdam

DevOpsDays - Pick any Three - Devops from scratch

3 Steps to Expand DevOps and Automation Throughout the Enterprise

Scaling Enterprise DevOps with CloudBees

Building DevOps Toolchain

Measuring Performance: See the Science of DevOps Measurement in Action

Similaire à Building Production-Ready Microservices: DevopsExchangeSF

QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe

PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe

ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfPhil Johnson

Code Yellow: Helping Operations Top-Heavy Teams the Smart WayTodd Palino

Microdeployments for microservices dev ops nashvilleNathaniel (Ned) Bauerle

Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe

Helping operations top-heavy teams the smart wayMichael Kehoe

Introduction to 5w’s of DevOpsCygnet Infotech

How to Leverage SAFe 5.0 for Your Enterprise Cloud StrategyCprime

Leveraging DevOps Principles for Release and DeploySerena Software

5 steps to Network Reliability Engineering and Automated Network OperationsJames Kelly

What is DevOps?Mesut Güneş

Infrastructure as code with test approachEnrique Carbonell

Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesPrecisely

Prepare the sled in summer and project release at its beginningVadym Fedorov

JC_Gabuya_ResumeJohn Carlo Gabuya

Who needs EA… when we have DevOps?Jeff Jakubiak

Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Microsoft Décideurs IT

Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.Microsoft Technet France

DevOps Vancouver Meetup - WSBC ProgressAndre Kaminski

Similaire à Building Production-Ready Microservices: DevopsExchangeSF (20)

QConSF 2018: Building Production-Ready Applications

PyBay 2018: Production-Ready Python Applications

ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf

Code Yellow: Helping Operations Top-Heavy Teams the Smart Way

Microdeployments for microservices dev ops nashville

Code Yellow: Helping operations top-heavy teams the smart way

Helping operations top-heavy teams the smart way

Introduction to 5w’s of DevOps

How to Leverage SAFe 5.0 for Your Enterprise Cloud Strategy

Leveraging DevOps Principles for Release and Deploy

5 steps to Network Reliability Engineering and Automated Network Operations

What is DevOps?

Infrastructure as code with test approach

Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services

Prepare the sled in summer and project release at its beginning

JC_Gabuya_Resume

Who needs EA… when we have DevOps?

Des serveurs créés pour vos usages specifiques, vous en avez reve HP l'a fait.

DevOps Vancouver Meetup - WSBC Progress

Plus de Michael Kehoe

eBPF WorkshopMichael Kehoe

eBPF BasicsMichael Kehoe

AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe

Linux Container BasicsMichael Kehoe

Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe

What the NTSB teaches us about incident management & postmortemsMichael Kehoe

Helping operations top-heavy teams the smart wayMichael Kehoe

SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe

SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe

SRECon-Europe-2017: Networks for SREsMichael Kehoe

Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe

Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe

APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe

Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe

Couchbase Connect 2016Michael Kehoe

Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe

SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe

SouthBay SRE Meetup Jan 2016Michael Kehoe

Couchbase Meetup Jan 2016Michael Kehoe

CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe

Plus de Michael Kehoe (20)

eBPF Workshop

eBPF Basics

AllDayDevops: What the NTSB teaches us about incident management & postmortems

Linux Container Basics

Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops

What the NTSB teaches us about incident management & postmortems

Helping operations top-heavy teams the smart way

SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...

SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...

SRECon-Europe-2017: Networks for SREs

Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale

Reducing MTTR and False Escalations: Event Correlation at LinkedIn

APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...

Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn

Couchbase Connect 2016

Using SaltStack to Auto Triage and Remediate Production Systems

SRECon USA 2016: Growing your Entry Level Talent

SouthBay SRE Meetup Jan 2016

Couchbase Meetup Jan 2016

CouchbasetoHadoop_Matt_Michael_Justin v4

Dernier

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar

Concurrency Control in Database Management systemChristalin Nelson

Textual Evidence in Reading and Writing of SHSMae Pangan

Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO

How to Make a Duplicate of Your Odoo 17 DatabaseCeline George

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar

Sulphonamides, mechanisms and their usesVijayaLaxmi84

Paradigm shift in nursing research by RS MEHTABP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL

ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar

Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringSri Sairam College Of Engineering Bengaluru

4.16.24 Poverty and Precarity--Desmond.pptxmary850239

Expanded definition: technical and operationalssuser3e220a

CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari

ClimART Action | eTwinning Projectjordimapav

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco

prashanth updated resume 2024 for Teaching ProfessionSri Sairam College Of Engineering Bengaluru

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo

Indexing Structures in Database Management system.pdfChristalin Nelson

How to Manage Buy 3 Get 1 Free in Odoo 17Celine George

Dernier (20)

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx

Concurrency Control in Database Management system

Textual Evidence in Reading and Writing of SHS

Daily Lesson Plan in Mathematics Quarter 4

How to Make a Duplicate of Your Odoo 17 Database

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...

Sulphonamides, mechanisms and their uses

Paradigm shift in nursing research by RS MEHTA

ICS2208 Lecture6 Notes for SL spaces.pdf

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering

4.16.24 Poverty and Precarity--Desmond.pptx

Expanded definition: technical and operational

CHEST Proprioceptive neuromuscular facilitation.pptx

ClimART Action | eTwinning Project

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf

prashanth updated resume 2024 for Teaching Profession

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx

Indexing Structures in Database Management system.pdf

How to Manage Buy 3 Get 1 Free in Odoo 17

Building Production-Ready Microservices: DevopsExchangeSF

1. How to Build Production-Ready Microservices Michael Kehoe Staff Site Reliability Engineer

2. Today’s agenda 1 Introductions 2 Tenets of Readiness 3 Creating Measurable guidelines 4 Measuring readiness 5 Key Learning’s 6 Q&A

3. Introduction

4. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Worked on: • Networks • Microservices • Traffic Engineering • Databases

5. Production-SRE Team @ LinkedIn $ WHOAMI • Disaster Recovery - Planning & Automation • Incident Response – Process & Automation • Visibility Engineering – Making use of operational data • Reliability Principles – Defining best practice & automating it

6. What is a Production Ready Service?

7. O’Reilly 2017 Susan J. Fowler Production-Ready Microservices

8. “A production-ready application or service is one that can be trusted to serve production traffic…” S U S A N J . F O W L E R

9. “… We trust it to behave reasonably, we trust it to perform reliably, we trust it to get the job done and to do its job well with very little downtime.” S U S A N J . F O W L E R

10. Tenets of readiness

11. Tenets of Readine ss 1 Stability & Reliability 2 Scalability & Performance 3 Fault Tolerance and DR 4 Monitoring 5 Documentation

12. Stability & Reliability

13. Tenets of Readiness STABILITY • Stable development cycle • Stable deployment cycle

14. Tenets of Readiness RELIABILITY • Dependency Management • Onboarding + Deprecation procedures • Routing + Discovery

15. Scalability & Performance

16. Tenets of Readiness SCALABILITY • Understanding growth-scales • Resource awareness • Dependency scaling

17. Tenets of Readiness PERFORMANCE • Constant performance evaluation • Traffic management • Capacity Planning

18. Fault Tolerance & DR

19. Tenets of Readiness FAULT TOLERANCE • Avoiding Single Points of Failure (SPOF) • Resiliency Engineering

20. Tenets of Readiness DISASTER RECOVERY • Understand failure scenario’s • Disaster Recovery Plans • Incident Management

21. Monitoring

22. Tenets of Readiness MONITORING • Dashboards + Alerting for: • Service • Resource Allocation • Infrastructure • All alerts are actionable and have pre- documented procedures. • Logging

23. Documentation

24. Tenets of Readiness DOCUMENTATION • Have one central landing-place for documentation for the service • Review of documentation from Engineer/ SRE/ Partners • Reviewed Regularly

25. Tenets of Readiness DOCUMENTATION • What should documentation include: • Key information (ports/ hostnames etc) • Description • Architecture Diagram • API description • Oncall information • Onboarding information

26. Creating Measurable Guidelines

27. Creating Measurable Guidelines • Not all guidelines directly translate into something measurable • You may need to look outcomes of specific guidance to create measurable guidelines

28. Creating Measurable Guidelines EXAMPLE • Stability: • Stable development cycle • Stable deployment process • Stable introduction and deprecation procedures

29. Creating Measurable Guidelines EXAMPLE • Stable development cycle  Is the unit-test coverage above X %?  Has this code-base been built in the last week?  Is there a staging environment for the application?

30. Creating Measurable Guidelines EXAMPLE • Stable deployment process  Has the application been deployed recently?  What is the successful deployment percentage?

31. Measuring Readiness

32. Measuring Readiness WHY? Ensuring that services are built and operated in a standard manner Standardization Ensuring that services are trustworthy Quality Assurance

33. Measuring Readiness HOW? Create a manual checklist Manual Checklists Automate the discovery and measurement of readiness Automated Scorecards

34. Breakdown of scores by team Automated Service Discovery Service Score Card Automated Scorecards

35. Team Overview Service Score Card Automated Scorecards

36. Breakdown of readiness results Service Score Card Automated Scorecards

37. Key Learnings

38. Key Learnings A set of guidelines for what it means for your service to be ‘ready’ Create Automate the checking and scoring of these guidelines Automate Set the expectation between Engineering/ Product/ SRE that these guidelines have to be met Evangelize

39. Q & A

Notes de l'éditeur

Michael So we’re apart of a team at LinkedIn called Production-SRE The key tenants of production-sre at LinkedIn is: Assist in restoring stability during site-critical issues Developing applications to reduce MTTD and MTTR Provide direction and guidelines for site-troubleshooting Build tools for efficient site-issue troubleshooting, issue detection and correlation As this presentation goes on, you’ll notice how an Event Correlation system fits into these
In this context, stability is about having a consistent pre-production experience Development Continuous Integration with central code repo and code review Reproducible builds Unit/ Integration testing Deployment Simple repeatable deploys Canary/ Dark Canary/ Staging Canary testing
Dependency management Unreliability in microservices usually comes from either changes in inbound traffic or changes in behavior from downstream traffic Knowing all of these and understanding impact matters Onboarding/ Deprecation Documented manner to start using API’s Access Control Best Practices Deprecation ACL/ Firewalls Code Cleanup Routing + Discovery Is there a standard way to discover how to get to your service Does your application have reliable health-checks Does your load-balancer respect these Circuit breakers/ Degraders
Growth Scales: How your service scales with the business goals/ metrics How your application scales as it gets more traffic (how do you make it serve more traffic) What resource “bounds” the application throughput Resource Awareness: What is your resource usage What are your bottlenecks Horizontal vs Vertical scaling (don’t do this) Dependency Scaling: Going back to reliability…how do your downstreams scale with your services growth
Performance This is essential Ideally should be done every time a change is made to the service (deployment/ AB flag) Measure and report performance Traffic Management QoS Scaling for bursts/ failover Capacity: This really ties everything together Do you have the right numbers to know how many resources you’ll need in the future
SPOF: In 2018, this really shouldn’t be a problem. Hardware failures Rack failures Do the right thing early to avoid problems down the road Resiliency Engineering May be known as chaos engineering Deliberately break your service to find weakpoints and look to make things fail more gracefully Outages happen, be prepared for them
Failure Scenarios: Understand what ways your service can break – resiliency engineering can help here if you’re ensure Have plans to respond to these Disaster Recovery: For larger scale outages, what’s your plan IM: What’s the process to manage & respond to the outage Gamedays are a great way to test this
Dashboards: Dashboards are for high-level system health Not for regression validation Monitor service/ resources/ infrastructure All alerts should be actionable and have pre-planned responses Anything other than this creates alert fatigue Logging is an underrated aspect of software development. During resiliency testing, see what the logs say and if they’re helpful
Documentation: Possibly the hardest tenet to execute on Should have one place for all documentation (wiki/ website etc) Documentation should be peer-reviewed by Engineers/ SRE/ Partners Everyone should be writing this documentation All documentation should be reviewed every 3-6 months
Problem is general concepts of readiness is not something you can directly translate into something that’s True/ False and can be scored or measured
Here is an example looking at the ‘Stability Tenet
Here is some potential outcomes of implementing some of the stability concepts
So why measure readiness? Readiness helps ensure two key outcomes Standardization One key concept Susan points out in the book is that services should be built in the same manner QA If we have built the service correctly, we should be able to trust that the service will be reliable in Production
So how do we measure this. Manual Checklists This should work ok for a small number of services, but becomes very very difficult as you have increased agility and an increased number of services Automated Scorecards This is a great way to build a scalable, reliable set of checks that can be constantly updated and provide reporting
Create: First step is to agree on what your company defines as production ready Automate: I strongly suggest automating these checks Evangelize: Once you have everything in place, evangelize, set the expectation, make it apart of company culture