SlideShare une entreprise Scribd logo
1  sur  26
The Game of Operations
and
The Operation of Games
Randy Shoup
@randyshoup
linkedin.com/in/randyshoup
DevOps Chicago Meetup, May 19 2014
Background
CTO at KIXEYE
• Real-time strategy games for web and mobile
Director of Engineering for Google App
Engine
• World’s largest Platform-as-a-Service
Chief Engineer at eBay
• Multiple generations of eBay’s real-time
search infrastructure
Real-Time Strategy Games are
… • Real-time
• Spiky
• Computationally-
intensive
• Constantly evolving
• Constantly pushing
boundaries
 Technically and
operationally demanding
Operating Games: Goals
Player Fun
• If players aren’t playing, we don’t have a business
• If players aren’t having fun, we don’t have a business for
long
• Fun includes game mechanics, feature set, quality,
performance
Studio Velocity
• 8 *highly independent* game studios
• Different tech stacks, tool chains, phases of development
Developer Productivity and Satisfaction
• We are a vendor; the studios are our customers
• Must be *strictly better* than the alternatives of build, buy,
borrow
Cost Efficiency
• More output for less
The Game of Operations
Cloud
• All studios and services moving to AWS
• Strong focus on automation
Services
• Small, focused teams
• Clean, well-defined interface to customers
DevOps
• Developers behave like Ops
• Ops behaves like Developers
The Game of Operations
Cloud
Services
DevOps
Why Cloud? (The Obvious)
Provisioning Speed
• Minutes, not weeks
• Autoscaling in response to load
Near-Infinite Capacity
• No need to predict and plan for growth
• No need to defensively overprovision
Pay For What You Use
• No “utilization risk” from owning / renting
• If it’s not in use, spin it down
Why Cloud? (The Less
Obvious)
Instance Optimization Opportunities
• Instance shapes to fit most parts of the
solution space (compute-intensive, IO-
intensive, etc.)
• If the shape does not fit, try another
Service Quality
• Amazon and Google know how to run data
centers
• Battle-tested and highly automated
• World-class networking, both cluster fabric
and external peering
Why Cloud? (The
Fundamentals)
Right Side of History
• Almost impossible to beat Google / Amazon
buying power or operating efficiencies
• 2010s in computing are like 1910s in electric
power
• Soon it will be just as common to run your own
data center as it is to run your own electric power
generation (!)
Easy and Fun
• It Just Works ™
• Makes it easy to fall in love with infrastructure 
Autoscaling
Games are very spiky
• Very unpredictable
• Huge variability between peak and trough
• Hits are self-reinforcing
Services and clients have to “flex”
• Clients back off in response to latency
• Services grow / shrink based on load
Service Cluster == AWS Auto-Scale Group
• Scale up or down based on predefined metrics,
thresholds
Automation Work at KIXEYE
Build / Deploy Pipeline
• One button
• Puppet -> Packer -> AMI -> Asgard
• No-downtime red-black deployment
• Futures: canarying, auto-rollback
Manageability
• Flume -> ElasticSearch / Kibana for logging
• Shinken -> PagerDuty for monitoring and
alerting
The Game of Operations
Cloud
Services
DevOps
Service Teams
• Give teams autonomy
• Freedom to choose technology, methodology,
working environment
• Responsibility for the results of those choices
• Hold them accountable for *results*
• Give a team a goal, not a solution
• Let team own the best way to achieve the
goal
KIXEYE Service Chassis
• Goal: Produce a “chassis” for building scalable
game services
• Minimal resources, minimal direction
• 3 people x 1 month
• Consider building on open source projects
 Team exceeded expectations
• Co-developed chassis, transport layer, service
template, build pipeline, red-black deployment, etc.
• Operability and manageability from the beginning
• Heavy use of Netflix open source projects
• 15 minutes from no code to running service in AWS
(!)
• Plan to open-source several parts of this work
Micro-Services
Simple
Well-defined interface
Single-purpose
Modular and independent
Small teams
Autonomy and responsibility
A
C D E
B
Transition to Building Services
Common Chassis
• Make it trivially easy to build and maintain a service
Define Service Interface (Formally!)
• Propose, Discuss, Agree
Prototype Implementation
• Simplest thing that could possibly work
• Client can integrate with prototype
• Implementor can learn what works and what does not
Real Implementation
• Throw away the prototype (!)
 Rinse and Repeat
Transition to Service
Relationships
Vendor – Customer Relationship
• Friendly and cooperative, but structured
• Clear ownership and division of responsibility
• Customer can choose to use service or not (!)
Service-Level Agreement (SLA)
• Promise of service levels by the service provider
• Customer needs to be able to rely on the service, like
a utility
Charging and Cost Allocation
• Charge customers for *usage* of the service
• Aligns economic incentives of customer and provider
• Motivates both sides to optimize
The Game of Operations
Cloud
Services
DevOps
Instrumentation and
Measurement
Instrument Everything
• Machine / instance stats: CPU, memory, I/O
• Software infrastructure stats: database, message
queue
• Application stats: game client, game server, services
Make It Easy to Do the Right Thing ™
• Easy, reliable, low-latency
• Auto-tagged and searchable
Why?
• Measurement beats intuition every time; my own
intuition is usually wrong 
• If you need to ssh into a box, instrumentation failed
you
One Team (!)
• Act as one team across development,
product, operations, etc.
• Solve problems instead of blaming and
pointing fingers
• Political games are not as fun as real-time
strategy games 
Everyone Is Responsible for
Prod
Everyone’s incentives are aligned
Everyone is strongly motivated to have solid
instrumentation and monitoring
Organization: Learning Culture
Learn from mistakes and improve
• What did you do -> What did you learn
• Take emotion and personalization out of it
Encourage iteration and velocity
• “Failure is not falling down but refusing to get
back up” – Theodore Roosevelt
Google Blame-Free Post-
Mortems
Post-mortem After Every Incident
• Document exactly what happened
• What went right
• What went wrong
Open and Honest Discussion
• What contributed to the incident?
• What could we have done better?
Engineers compete to take personal
responsibility (!)
Transition to DevOps
Organization
• Studios make user-visible games
• Services provide common endpoints
Training / Retraining
• Common bootcamp
• Train devs as Ops, Ops as devs
You Build It, You Run It
• Transition on-call
• Use primary / secondary on-call as
apprenticeship
Recap: The Game of
Operations
Cloud
Services
DevOps
Come Join Us!
KIXEYE is hiring in
SF, Seattle, Victoria, Brisbane, Amsterdam
@randyshoup
rshoup@kixeye.com
linkedin.com/in/randyshoup
slideshare.net/randyshoup

Contenu connexe

Plus de Randy Shoup

Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityRandy Shoup
 
Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
 
One Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterRandy Shoup
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermRandy Shoup
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Randy Shoup
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at ScaleRandy Shoup
 
Breaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building TeamsBreaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building TeamsRandy Shoup
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsRandy Shoup
 
Learning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three IncidentsLearning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three IncidentsRandy Shoup
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughRandy Shoup
 
Managing Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsRandy Shoup
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at ScaleRandy Shoup
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesRandy Shoup
 
Evolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBayEvolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBayRandy Shoup
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At ScaleRandy Shoup
 
DevOps - It's About How We Work
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We WorkRandy Shoup
 
Ten Lessons of the DevOps Transition
Ten Lessons of the DevOps TransitionTen Lessons of the DevOps Transition
Ten Lessons of the DevOps TransitionRandy Shoup
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in MicroservicesRandy Shoup
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
 

Plus de Randy Shoup (20)

Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
 
Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and Lessons
 
One Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us Better
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at Scale
 
Breaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building TeamsBreaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building Teams
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
 
Learning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three IncidentsLearning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three Incidents
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good Enough
 
Managing Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and Events
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at Scale
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and Microservices
 
Evolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBayEvolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBay
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
 
DevOps - It's About How We Work
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We Work
 
Ten Lessons of the DevOps Transition
Ten Lessons of the DevOps TransitionTen Lessons of the DevOps Transition
Ten Lessons of the DevOps Transition
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 

Dernier

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 

Dernier (9)

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 

DevOps Chicago - The Game Of Operations and the Operation of Games

  • 1. The Game of Operations and The Operation of Games Randy Shoup @randyshoup linkedin.com/in/randyshoup DevOps Chicago Meetup, May 19 2014
  • 2. Background CTO at KIXEYE • Real-time strategy games for web and mobile Director of Engineering for Google App Engine • World’s largest Platform-as-a-Service Chief Engineer at eBay • Multiple generations of eBay’s real-time search infrastructure
  • 3. Real-Time Strategy Games are … • Real-time • Spiky • Computationally- intensive • Constantly evolving • Constantly pushing boundaries  Technically and operationally demanding
  • 4. Operating Games: Goals Player Fun • If players aren’t playing, we don’t have a business • If players aren’t having fun, we don’t have a business for long • Fun includes game mechanics, feature set, quality, performance Studio Velocity • 8 *highly independent* game studios • Different tech stacks, tool chains, phases of development Developer Productivity and Satisfaction • We are a vendor; the studios are our customers • Must be *strictly better* than the alternatives of build, buy, borrow Cost Efficiency • More output for less
  • 5. The Game of Operations Cloud • All studios and services moving to AWS • Strong focus on automation Services • Small, focused teams • Clean, well-defined interface to customers DevOps • Developers behave like Ops • Ops behaves like Developers
  • 6. The Game of Operations Cloud Services DevOps
  • 7. Why Cloud? (The Obvious) Provisioning Speed • Minutes, not weeks • Autoscaling in response to load Near-Infinite Capacity • No need to predict and plan for growth • No need to defensively overprovision Pay For What You Use • No “utilization risk” from owning / renting • If it’s not in use, spin it down
  • 8. Why Cloud? (The Less Obvious) Instance Optimization Opportunities • Instance shapes to fit most parts of the solution space (compute-intensive, IO- intensive, etc.) • If the shape does not fit, try another Service Quality • Amazon and Google know how to run data centers • Battle-tested and highly automated • World-class networking, both cluster fabric and external peering
  • 9. Why Cloud? (The Fundamentals) Right Side of History • Almost impossible to beat Google / Amazon buying power or operating efficiencies • 2010s in computing are like 1910s in electric power • Soon it will be just as common to run your own data center as it is to run your own electric power generation (!) Easy and Fun • It Just Works ™ • Makes it easy to fall in love with infrastructure 
  • 10. Autoscaling Games are very spiky • Very unpredictable • Huge variability between peak and trough • Hits are self-reinforcing Services and clients have to “flex” • Clients back off in response to latency • Services grow / shrink based on load Service Cluster == AWS Auto-Scale Group • Scale up or down based on predefined metrics, thresholds
  • 11. Automation Work at KIXEYE Build / Deploy Pipeline • One button • Puppet -> Packer -> AMI -> Asgard • No-downtime red-black deployment • Futures: canarying, auto-rollback Manageability • Flume -> ElasticSearch / Kibana for logging • Shinken -> PagerDuty for monitoring and alerting
  • 12. The Game of Operations Cloud Services DevOps
  • 13. Service Teams • Give teams autonomy • Freedom to choose technology, methodology, working environment • Responsibility for the results of those choices • Hold them accountable for *results* • Give a team a goal, not a solution • Let team own the best way to achieve the goal
  • 14. KIXEYE Service Chassis • Goal: Produce a “chassis” for building scalable game services • Minimal resources, minimal direction • 3 people x 1 month • Consider building on open source projects  Team exceeded expectations • Co-developed chassis, transport layer, service template, build pipeline, red-black deployment, etc. • Operability and manageability from the beginning • Heavy use of Netflix open source projects • 15 minutes from no code to running service in AWS (!) • Plan to open-source several parts of this work
  • 15. Micro-Services Simple Well-defined interface Single-purpose Modular and independent Small teams Autonomy and responsibility A C D E B
  • 16. Transition to Building Services Common Chassis • Make it trivially easy to build and maintain a service Define Service Interface (Formally!) • Propose, Discuss, Agree Prototype Implementation • Simplest thing that could possibly work • Client can integrate with prototype • Implementor can learn what works and what does not Real Implementation • Throw away the prototype (!)  Rinse and Repeat
  • 17. Transition to Service Relationships Vendor – Customer Relationship • Friendly and cooperative, but structured • Clear ownership and division of responsibility • Customer can choose to use service or not (!) Service-Level Agreement (SLA) • Promise of service levels by the service provider • Customer needs to be able to rely on the service, like a utility Charging and Cost Allocation • Charge customers for *usage* of the service • Aligns economic incentives of customer and provider • Motivates both sides to optimize
  • 18. The Game of Operations Cloud Services DevOps
  • 19. Instrumentation and Measurement Instrument Everything • Machine / instance stats: CPU, memory, I/O • Software infrastructure stats: database, message queue • Application stats: game client, game server, services Make It Easy to Do the Right Thing ™ • Easy, reliable, low-latency • Auto-tagged and searchable Why? • Measurement beats intuition every time; my own intuition is usually wrong  • If you need to ssh into a box, instrumentation failed you
  • 20. One Team (!) • Act as one team across development, product, operations, etc. • Solve problems instead of blaming and pointing fingers • Political games are not as fun as real-time strategy games 
  • 21. Everyone Is Responsible for Prod Everyone’s incentives are aligned Everyone is strongly motivated to have solid instrumentation and monitoring
  • 22. Organization: Learning Culture Learn from mistakes and improve • What did you do -> What did you learn • Take emotion and personalization out of it Encourage iteration and velocity • “Failure is not falling down but refusing to get back up” – Theodore Roosevelt
  • 23. Google Blame-Free Post- Mortems Post-mortem After Every Incident • Document exactly what happened • What went right • What went wrong Open and Honest Discussion • What contributed to the incident? • What could we have done better? Engineers compete to take personal responsibility (!)
  • 24. Transition to DevOps Organization • Studios make user-visible games • Services provide common endpoints Training / Retraining • Common bootcamp • Train devs as Ops, Ops as devs You Build It, You Run It • Transition on-call • Use primary / secondary on-call as apprenticeship
  • 25. Recap: The Game of Operations Cloud Services DevOps
  • 26. Come Join Us! KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam @randyshoup rshoup@kixeye.com linkedin.com/in/randyshoup slideshare.net/randyshoup