Damon Edwards, Alex Honor, Nathan Fluegel of Rundeck presentation from All Day DevOps on Oct 17, 2018
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
6. Hello
Meet OpsHero
8 years experience
Last line of defense when
things go wrong
People turn to when things
need to get done
7. Hello
Meet OpsHero
8 years experience
Last line of defense when
things go wrong
People turn to when things
need to get done
Knows how things actually work
8. Hello
Meet OpsHero
Kick-ass scripter
(/home/opshero/bin is in
every co-worker’s $PATH)
8 years experience
Last line of defense when
things go wrong
People turn to when things
need to get done
Knows how things actually work
13. The interruptions…
Project A
···
Project B
···DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
14. The interruptions…
Project A
···
Project B
···DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Mmm, ya…
I’m gonna need
this and the other
stuff right away.
Boss
15. The interruptions…
Project A
···
Project B
···DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Mmm, ya…
I’m gonna need
this and the other
stuff right away.
Boss
Colleague
What do you
know about…?
16. The interruptions…
Project A
···
Project B
···DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Mmm, ya…
I’m gonna need
this and the other
stuff right away.
Boss
Colleague
What do you
know about…?
Hey! Flair Service is
super slow.
Customer
20. The incidents…
Uh oh…
What changed?!
WAAH!! … the site is
slow … now it is down!
Is anybody working on
fixing this?
NOBODY TOUCH
ANYTHING ELSE!!
Oh no my fix
made it worse.
21. The Fear…
Is this next
outage the one
that ends our
careers?
It could all go wrong
at any minute!
This fix sure feels
shaky
We might drown in
all of this tech debt
22. The Slog…
Uh oh, I’ve got to
go into Prod… this is
going to hurt
Can you believe they said,
“where’s your TICKET?!”
This is the same
rigamarole as last
week!… Again!
I lost a whole day
working on that
emergency
All I’ll hear about later
is “why couldn’t it be
fixed quicker?”
23. I’m frustrated and tired.
Isn’t there a way to work smarter, not harder?
29. Silos are bad
Backlog Information
I need X
PrioritiesTools
Backlog
I do X
Requests
for X
Silo A
Information
Priorities
Silo B
Tools
30. Silos are bad
Backlog Information
I need X
PrioritiesTools
Backlog
I do X
Requests
for X
Silo A
Information
Priorities
Silo B
Tools
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
34. Queues are
expensive
Queues Create…
Longer Cycle Time
Increased Risk
More Variability
More Overhead
Lower Quality
Less Motivation
Adapted from Donald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development
We weren’t the only ones to realize that queues are expensive
35. Queues are
expensive
??
Silo A Silo B
Ticket
Queue
Queues encourage “snowflakes” (technically ok, but unreproducible and brittle)
36. Queues are
expensive
??
Silo A Silo B
Ticket
Queue
Queues encourage “snowflakes” (technically ok, but unreproducible and brittle)
38. We started by talking to
executives about re-organizing.
39. Big re-org!
Get rid of the silos!
Old Silo A Old Silo B Old Silo C Old Silo D
40. Big re-org!
DevOps product teams seemed like a great idea!
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
41. Big re-org!
DevOps product teams seemed like a great idea!
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Key 1: get rid of as many
handoffs as possible
42. Big re-org!
DevOps product teams seemed like a great idea!
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Key 2: “Horizontal”
shared responsibility, not
everyone do everything!
Key 1: get rid of as many
handoffs as possible
45. Same-org!
Dev to QA looks different, but Ops is still Ops!
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
46. So how do you get rid of the
interruptions and repetitive work
requests?
47. We took it upon ourselves to look at all
of the handoffs and ticket queues…
48. Self-Service
…and applied “Self-Service Operations” wherever we could.
Self-Service Operations
On
Demand
On
Demand
On
Demand
On
Demand
Ops
Capability
Ops
Capability
Ops
Capability
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Ops
49. Self-Service
…and applied “Self-Service Operations” wherever we could.
Self-Service Operations
On
Demand
On
Demand
On
Demand
On
Demand
Ops
Capability
Ops
Capability
Ops
Capability
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Pull-based
Ops
50. Self-Service
…and applied “Self-Service Operations” wherever we could.
Self-Service Operations
On
Demand
On
Demand
On
Demand
On
Demand
Ops
Capability
Ops
Capability
Ops
Capability
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Pull-based
Bi-directional
Ops
51. Self-Service
…and applied “Self-Service Operations” wherever we could.
Self-Service Operations
On
Demand
On
Demand
On
Demand
On
Demand
Ops
Capability
Ops
Capability
Ops
Capability
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Pull-based
Bi-directional
On-demand
Ops
52. Self-Service
…and applied “Self-Service Operations” wherever we could.
Self-Service Operations
On
Demand
On
Demand
On
Demand
On
Demand
Ops
Capability
Ops
Capability
Ops
Capability
Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Pull-based
Bi-directional
On-demand
With security and guardrails
Ops
59. Not good…
… and there were lots of arguments about which language.
?
?
?
It’s got to be
Puppet.
Ruby? No way.
Python only!
Bash is what we
all know!
Ansible or bust.
We are stuck on
BladeLogic.
Powershell…
My team is
windows.
60. Oh, and both security and
compliance thought we were nuts.
61. Don’t worry my friend. I went
through the same growing
pains.
66. Attack the
“Do it.
Do it again.
And again.
And again.”
Anti-
Patterns
Done.
I need you to
do X
Later…
Ticket
Other
work
Done.
I need you to
do X
Later…
Ticket
Other
work
Sigh..Done.
I need you to
do X
Ticket
Other
work
Before
67. Attack the
“Do it.
Do it again.
And again.
And again.”
Anti-
Patterns
Done.
I need you to
do X
Later…
Ticket
Other
work
Done.
I need you to
do X
Later…
Ticket
Other
work
Sigh..Done.
I need you to
do X
Ticket
Other
work
Before
Do X
Later…
Other
work 1
Later…
Other
work 2
Other
work 3
Do X
Do X
Self-Service
Self-Service
Self-Service
After
69. Attack the
“I’m an Expert!
I don’t read the wiki”
Anti-
Patterns
Before
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
70. Attack the
“I’m an Expert!
I don’t read the wiki”
Anti-
Patterns
Before
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Self-Service
Service has changed. This flag is now
required or bad things will happen!
Pause monitoring first or
we all get woken up!
“restart”
Environment
Later…
Update
Restart Job
✅
I’ve done this before.
I’ve got this.
Self-Service
After
72. Attack the
“I could fix it.
But I can’t access it.”
Anti-
Patterns
Before Environment
I could fix it if I
could get to it
73. Attack the
“I could fix it.
But I can’t access it.”
Anti-
Patterns
Before Environment
I could fix it if I
could get to it
After
Environment
I’ve got this!
Self-
Service
75. Attack the
“The Dog Pile”
Anti-
Patterns
Before
I think its a problem with
dbcluster07-store2.uswest.acme
dbcluster07-
store2.uswest.
acme
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”“$ top”
76. Attack the
“The Dog Pile”
Anti-
Patterns
Before
I think its a problem with
dbcluster07-store2.uswest.acme
dbcluster07-
store2.uswest.
acme
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”
“$ top”“$ top”
After I think its a problem with
dbcluster07-store2.uswest.acme
dbcluster07-
store2.uswest.
acme
“$ top”
“Healthcheck
store2 - all”
OaaSSelf-Service
77. As far as the automation language wars…
we decided to rethink it.
79. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
80. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
81. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
82. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
83. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
84. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
85. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
86. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
User input handling
87. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
User input handling
Infrastructure model
88. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
User input handling
Infrastructure model
UI, API, CLI
89. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
User input handling
Infrastructure model
UI, API, CLI
Scheduling
90. Start where
you are
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
>_
Web GUI API CLI
Self-Service Platform
Let people use the skills they already have.
Existing scripts and
tools
Workflow
Access control
Error handling
Notifications
User input handling
Infrastructure model
UI, API, CLI
Scheduling
91. Enable others
Once team’s are creating their own self-service, then build
out the platform.
Self-Service Operations Platform
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Execute
+
Security and Ops manages
access, configuration, and compliance
/ Monitoring
92. Embrace
Make sure people can work through their SDLC
Self-Service Operations Platform
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Execute
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+
Security and Ops manages
access, configuration, and compliance
/ Monitoring
DevOps
93. Leverage
Then integrate to your other enterprise systems
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Artifact and
Container
Management Ops integrate
with artifact
flow
Self-Service Operations Platform
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+
Security and Ops manages
access, configuration, and compliance
/ Monitoring
Investments
94. Improved
Once they see it in action, Compliance and Security will be big fans.
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Artifact and
Container
Management Ops integrate
with artifact
flow
Approval trail?
Self-Service Operations Platform
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+
Security and Ops manages
access, configuration, and compliance
/ Monitoring
Who reviewed it? Who ran it? When? Where?
Who created the procedure?
Who created the policy?Controls
95. Ah. I think I now see a path
forward. I’ll give it a try!
96. Ah. I think I now see a path
forward. I’ll give it a try!
Cheers!
99. I love Ops!
“Do X”
“Define Y
Procedure”
“Define X
Procedure”
“Do Y”
“Do X+Y”
Self-Service
Self-service is flourishing
Everyone is noticing the improvement
Fewer Interruptions
Less Waiting
Getting More Done
You are going
places.
100. Life is good!
My stature amongst my
colleagues is growing!
My work is way
more satisfying!
!!
So I’ve been
noticing…
And I’m getting a raise!
101. That is fantastic!!
You are working smarter (not harder) and
making life better for your colleagues!
103. So now… tell me about
this SRE thing…
We’re definitely going to
need more beers!
104. So now… tell me about
this SRE thing…
We’re definitely going to
need more beers!
But that is a story for next time!
105. The End
OpsHero ……………..…. Alex Honor
Mentor ..………..……….. Damon Edwards
Narrator………………….. Nathan Fluegel
Directed by ……………… Damon Edwards
Written by ………………. Alex Honor
Damon Edwards
Let’s talk ………………… @rundeck
www.rundeck.com