11. The Downward
Spiral…
11
@RealGeneKim, genek@realgenekim.me
12. The IT Core Chronic Conflict
Every IT organization is pressured to
simultaneously:
Respond more quickly to urgent business needs
Provide stable, secure and predictable IT service
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and
author of The Goal, has written extensively on the theory and practice of identifying and resolving
core, chronic conflicts.
13
@RealGeneKim, genek@realgenekim.me
13. Every Company Is An IT Company…
95% of all capital projects have an IT
component…
50% of all capital spending is technology-related
Where we need to
be…
IT is always in the way
(again…)
We are here…
@RealGeneKim, genek@realgenekim.me
14. There Must Be A Better
Way…
15
@RealGeneKim, genek@realgenekim.me
26. The Three Ways
And Six Prescriptive Steps
Infosec Can Take
27
@RealGeneKim, genek@realgenekim.me
27. If I Could Wave A Magic Wand, Everyone Will…
Become conversant with DevOps and recognize
the practices when you see them
Be energized about how information
practitioners can contribute in this organizational
journey
Leave with some concrete steps to get some
great outcomes
Become a part of a team that starts putting
DevOps practices into place
28
@RealGeneKim, genek@realgenekim.me
29. The First Way:
Systems Thinking
(Business) (Customer)
@RealGeneKim, genek@realgenekim.me
30. The First Way:
Systems Thinking (Left To Right)
Understand the flow of work
Always seek to increase flow
Never unconsciously pass defects downstream
Never allow local optimization to cause global
degradation
Achieve profound understanding of the system
@RealGeneKim, genek@realgenekim.me
31. “Annual business planning sessions can be
madding. They think IT Operations is an „all you
can eat buffet.‟”
-Ben Rockwood,
Director Systems Engineering,
Joyent
@RealGeneKim, genek@realgenekim.me
32. Practice #1: Define The Work and Make It
Visible
Business projects (e.g., new order entry system)
Internal IT projects (e.g., create new
environments, infosec remediation)
Changes (e.g., deploys, improve database
performance)
Unplanned work (e.g., site down, site impaired,
security incident)
33
@RealGeneKim, genek@realgenekim.me
33. Day 2: PMO Meeting
@RealGeneKim, genek@realgenekim.me
34. Practice #2: Create One Step Environment
Creation Process
Make environments available early in the
Development process
Make sure Dev builds the code and environment
at the same time
Create a common Dev, QA and Production
environment creation process
@RealGeneKim, genek@realgenekim.me
35. Change the Agile sprint policy:
“At the end of each sprint, we must have working
code and the environment it runs in!”
@RealGeneKim, genek@realgenekim.me
36. Infosec Insurgency
Find the automated infrastructure project team
(e.g., puppet, chef)
Release managers can provide hardening guidance
Integrate and extend their production configuration
monitoring
Put ASSERTs to find misconfigurations, enforce https,
etc.
Define what changes/deploys cannot be made
without triggering full retest
37
@RealGeneKim, genek@realgenekim.me
37. The First Way:
Outcomes
Creating single repository for code and environments
Determinism in the release process
Consistent Dev, QA, Int, and Staging environments, all
properly built before deployment begins
Decreased cycle time
Reduce deployment times from 6 hours to 45 minutes
Refactor deployment process that had 1300+ steps
spanning 4 weeks
Faster release cadence
@RealGeneKim, genek@realgenekim.me
39. The Second Way:
Amplify Feedback Loops (Right to Left)
Understand and respond to the needs of all
customers, internal and external
Shorten and amplify all feedback loops: stop the
line when necessary
Create quality at the source
Create and embed knowledge where we need it
@RealGeneKim, genek@realgenekim.me
41. “We found that when we woke up developers at
2am, defects got fixed faster than ever.”
Patrick Lightbody
CEO, BrowserMob
@RealGeneKim, genek@realgenekim.me
42. Pattern #3: Embed Dev Into IT Ops
Embed Dev into IT Ops incident escalation
process
Invite Dev to post-mortems/root cause analysis
meeting
Have Dev and Infosec cross-train IT Operations
Ensure application monitoring/metrics to aid in
Ops and Infosec work (e.g., incident/problem
management)
@RealGeneKim, genek@realgenekim.me
43. The Second Way:
Outcomes
Defects and security issues getting fixed faster
than ever
Reusable Ops and Infosec user stories now part
of the Agile process
All groups communicating and coordinating
better
Everybody is getting more work done
@RealGeneKim, genek@realgenekim.me
44. The Third Way:
Culture Of Continual Experimentation And
Learning
@RealGeneKim, genek@realgenekim.me
45. The Third Way:
Culture Of Continual Experimentation And
Learning
Foster a culture that rewards:
Experimentation (taking risks) and learning from
failure
Repetition is the prerequisite to mastery
Why?
You need a culture that keeps pushing into the danger
zone
And have the habits that enable you to survive in the
danger zone
@RealGeneKim, genek@realgenekim.me
46. Break Things Early And Often
“Do painful things more frequently, so you can
make it less painful… We don‟t get pushback
from Dev, because they know it makes rollouts
smoother.”
-- Adrian Cockcroft, Architect, Netflix
@RealGeneKim, genek@realgenekim.me
49. You Don’t Choose Chaos Monkey…
Chaos Monkey Chooses You
@RealGeneKim, genek@realgenekim.me
50. Pattern #6: Break Things Before Production
Enforce consistency in code, environments and
configurations across the environments
Add your ASSERTs to find misconfigurations,
enforce https, etc.
Add static code analysis to automated
continuous integration and testing process
@RealGeneKim, genek@realgenekim.me
51. Pattern #6: Allocate 20% Of Cycles To
Technical Debt Reduction
@RealGeneKim, genek@realgenekim.me
56. An Innovation Culture
“By installing a rampant innovation culture, they
now do 165 experiments in the three months of tax
season.
Our business result? Conversion rate of the
website is up 50 percent. Employee result?
Everyone loves it, because now their ideas can
make it to market.”
--Scott Cook, Intuit Founder
57
@RealGeneKim, genek@realgenekim.me
57. Why Do I Think This Is
Important?
58
@RealGeneKim, genek@realgenekim.me
58. The Downward
Spiral…
59
@RealGeneKim, genek@realgenekim.me
60. The Three Ways: Some Patterns
First Way Second Way Third Way
Define The Wake Up Break Things Early
Work And Make Developers And Often
It Visible
Make Embed Dev Into IT Reserve 20% Of
Environments Operations Cycles For
Available Early Technical Debt
Reduction
62
@RealGeneKim, genek@realgenekim.me
66. When IT Fails: A Business Novel and
The DevOps Cookbook
Coming January 15, 2013 and Q1 2013
“The greatest IT management book of our generation.”
Branden Williams, CTO Marketing, RSA
“The lessons in When IT Fails might just save your business if IT fails
for you. Every IT executive should share this book with their business
peers.”
James Turnbull, VP Operations, Puppet Labs and author of “Pro
Puppet”
“This book will have a profound effect on IT, just as The Goal did for
manufacturing.‟
Jez Humble, co-author of the Jolt award-winning book Continuous
Delivery, and Principal at ThoughtWorks Studios.
@RealGeneKim, genek@realgenekim.me
67. Our Mission: Positively Impact The Lives Of
One Million IT Workers By 2017
For these slides, the “Top 10 Things You
Need To Know About DevOps,” Rugged
DevOps resources, and updates on the
book:
Sign up at http://itrevolution.com
Email genek@realgenekim.me
Or text “[email] 74730” to
+1 (858) 598-3980
Visit:
http://www.instantcustomer.com/go/7473
0
@RealGeneKim, genek@realgenekim.me
Notes de l'éditeur
Who are they auditing? IT operations.I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out.Who here is from IT operations?Bad day:Not as prepared for the audit as they thoughtSpending 30% of their time scrambling, generating presentation for auditorsOr an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them differentOr every server is like a snowflake, each having their own personalityWe as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differencesCreate and enforce a culture of change management and causality
Source: Flickr: birdsandanchors
Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.Primary measurement is deploy features quickly – get to market.I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:Bad day: We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match productionOr there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failureOr changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in productionBelieve it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
So who are all these constituencies that we can help, and increase our relevance as Tripwire practitioners and champions?How many people here are in infosec?Goal: protect critical systems and dataSafeguard organizational commitmentsPrevent security breaches, help quickly detect and recover from themBad day: no security standardsNo one is complyingYes, we’re 3 years behind. “Whaddyagonna do about it?”Vs. we (Tripwire owner) can become more relevant and add value by help infosec by leveraging all the configuration guidance out thereMeasure variance between produciton and those known good statesTrust and verify that when management says, we’ve trued up the configurations, they’ve actually done itWhy? Now, more than ever, there are an ever increasing amount of regulatory and contractual requirements to protect systems and data
How each side Actively impedes the achievement of each other’s goals.
There are many ways to react to this: like, fear, horror, trying to become invisible… All understandable, given the circumstances…Because infosec can no longer take 4 weeks to turn around a security review for application code, or take 6 weeks to turnaround a firewall change. But, on the other hand, I think it’s will be the best thing to ever happen to infosec in the past 20 years. We’re calling this Rugged DevOps, because it’s a way for infosec to integrate into the DevOps process, and be welcomed. And not be viewed as the shrill hysterical folks who slow the business down.
Tell story of Amazon, Netflix: they care about, availability, securityIt’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
How each side Actively impedes the achievement of each other’s goals.