SlideShare une entreprise Scribd logo
1  sur  184
Télécharger pour lire hors ligne
JASON HAND |
DevOps Evangelist
• Holds over 15 years of experience as a
developer, system administrator, and
support specialist
• Fully emerged into the world of agile
development and the DevOps
movement with Colorado tech
startups
#DevOpsRoadTrip
#DevOpsRoadtrip
#DevOpsRoadTrip
A little about VictorOps…
VictorOps is the real-time incident
management platform that combines the
power of people and data to embolden
DevOps pros to handle incidents as they
occur.
#DevOpsRoadTrip
Why Are
We Here?
Culture
Culture
“How Organizations Process Information”
Roy Westrum: A Typology of Organizational Cultures
2014 State of DevOps Report shows that in the context of IT, job satisfaction is the biggest predictor of
profitability, market share, and productivity. The biggest predictor of job satisfaction, in turn, is how
effectively organizations process information, as determined by a model created by sociologist Ron
Westrum, shown below. 1
1: https://continuousdelivery.com/implementing/culture/
Words are how we think – stories are how we link.
- Christina Baldwin
Oral narrative is and for a long time has been the
chief basis of culture itself.
- John D. Niles
Stories from the road
Cynefin
Unordered Ordered
Complicated
Obvious
Complex
Chaotic
Cause Effect
Obvious
From Experience
Cause Effect
Requires
Analysis
Cause Effect Only
Apparent in Hindsight
Cause & Effect Cannot
Be Related
Sense – Categorize - Respond
Sense – Analyze - Respond
Probe – Sense - Respond
Act – Sense - Respond
The systems we engineer, maintain, and improve are
Complicated
.. or ..
Known
unknowns
The systems we engineer, maintain, and improve are
ComplexUnknown
unknowns
What is the
Root
Cause?
What are the..
Contributing
Factors?
Identifying a “root cause” helps us to …
Put it back
how it was
What we really want is to..
Continuously
Improve
TimeToRepair(TTR)
Continuous Improvement Efforts
Reactive
(chaotic)
Tactical
(obvious)
Integrated
(complicated)
Strategic
(complex)
✓ No automation
✓ No operational stack
awareness
✓ Poor collaboration between
teams (Dev & Ops)
✓ Documentation not available
✓ No standardized
communication
✓ High focus on consistent
continuous learning
✓ Uses a NOC
✓ Some monitoring & alerting
instrumentation
✓ Collaboration in crisis
✓ "Mission critical" processes
are available
✓ Understood crisis
communication protocols
✓ Remediation data available to
IT Operations
✓ Team rotations, paging
policies, role hunting
✓ Continuous improvement of
key health indicators
✓ Technical collaboration across
all incidents
✓ Docs up to date and easily
accessible
✓ Consistent real-time
communication practices
✓ Automated docs and remediation
✓ Actionable Alerts with full context
✓ High collaboration among all
teams
✓ Documentation part of
remediation
✓ Targeted, proactive crisis comms
✓ High focus on continuous learning
Incident Management
Maturity
Reactive
(chaotic)
✓No automation
✓No operational stack awareness
✓Poor collaboration between teams (Dev & Ops)
✓Documentation not available
✓No standardized communication
✓High focus on consistent continuous learning
Tactical
(obvious)
✓Uses a NOC
✓Some monitoring & alerting instrumentation
✓Collaboration in crisis
✓"Mission critical" processes are available
✓Understood crisis communication protocols
✓Remediation data available to IT Operations
Integrated
(complicated)
✓Team rotations, paging policies, role hunting
✓Continuous improvement of key health indicators
✓Technical collaboration across all incidents
✓Docs up to date and easily accessible
✓Consistent real-time communication practices
Strategic
(complex)
✓Automated docs and remediation
✓Actionable Alerts with full context
✓High collaboration among all teams
✓Documentation part of remediation
✓Targeted, proactive crisis comms
✓High focus on continuous learning
“Six Trends Shape DevOps Adoption, Q1 2015”
Forrester report
• The Foundation For Success Is In Place . . . Mostly
• Fear Of Failure Will Hamper Advancement
• Monitoring And Analytics Strategies Must Make A Big Leap Forward
• The Focus On Customer Experience Is Not Second Nature . . . Yet
• Change And Release Processes Are Not Delivering Business Needs
• You Must Prioritize And Focus Sourcing Strategies
Automation
Awareness
Collaboration
Documentation User Empathy
Learning
Learning
Failure not seen as opportunity to learn
Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
Awareness
http://blog.vmware.com
© 2015 Forrester Research, Inc. Reproduction Prohibited 46
Single Source Of Truth Lacking In Many
Orgs – 95% only most of the time or less
Source: April 15, 2015 “Six Trends That Will Shape DevOps Adoption”, Forrester report
Collaboration
Teams siloed throughout life cycle
Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
User Empathy
https://open.buffer.com/wp-content/uploads/2015/12/empathy3.jpg
© 2015 Forrester Research, Inc. Reproduction Prohibited 50
IT teams aren’t measured on customer
experience goals.
Automation
http://thelifedesignproject.com/wp-content/uploads/2009/09/373881476_217d24ef6d.jpg
Delays in notifications Leads To Customers
Finding the Problem First
Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
Documentation
http://blog.vmware.com
Reduce MTTR
State of DevOps Report (2015)
– by Puppet Labs
Automation
Awareness
Collaboratio
n
Documentation User
Learning
jhand.co/DRT_SF
Bridget Kromhout | Pivotal - Cloud Foundry
Principal Technologist
• Bridget Kromhout is a Principal Technologist for Cloud Foundry at
Pivotal.
• After years as an operations engineer (most recently at DramaFever),
she traded in oncall for more travel.
• A frequent speaker at tech conferences, she helps organize tech
meetups at home in Minneapolis, serves on the program committee for
Velocity, and acts as a global core organizer for devopsdays.
• She podcasts at Arrested DevOps, occasionally blogs at
bridgetkromhout.com, and is active in a Twitterverse near you.
#DevOpsRoadTrip
@bridgetkromhout
Monitoring
@bridgetkromhout
lives:
Minneapolis,
Minnesota
works:
Pivotal
podcasts:
Arrested
DevOps
organizes:
devopsdays
Bridget Kromhout
@bridgetkromhout
Traded oncall… …for more travel (Similar effect on sleep)
@bridgetkromhout
@bridgetkromhout
“…measuring value, throughput,
and performance…
revenue rather than cost”
The Art of Monitoring (2016)
James Turnbull
artofmonitoring.com
@bridgetkromhout
Image credit: James Ernest
@bridgetkromhout
The Art of Monitoring (2016)
James Turnbull
Monitoring containers
artofmonitoring.com
@bridgetkromhout
“Almost every task run
under Borg contains a
built-in HTTP server that
publishes information
about the health of the
task and thousands of
performance metrics”
Large-scale cluster management at Google with Borg - Verma et al. 2015
“Almost every task run
under Borg contains a
built-in HTTP server that
publishes information
about the health of the
task and thousands of
performance metrics”
@bridgetkromhout
The Art of Monitoring (2016) — James Turnbull
Monitoring Maturity Model
artofmonitoring.com
@bridgetkromhout Image credit: Wikipedia
“Any organization that designs a system…
will produce a design
whose structure is a copy of
the organization's
communication
structure.”
Mel Conway
@bridgetkromhout
silos are for grain
@bridgetkromhout
three Friday mornings in Minneapolis
removed restored
@bridgetkromhout
Thank you!
Andy Domeier | SPS Commerce
Director System Operations
• Andy has been in Technology Operations leadership with SPS
Commerce for the past 11 years.
• Andy spends many mental cycles collaborating to solve
effective patterns for monitoring and operating complex
changing systems.
• Andy’s also spends time solving for priority organization and
alignment and the organization of knowledge.
#DevOpsRoadTrip
HOW EFFECTIVE IS
YOUR INCIDENT
RESPONSE?Andy Domeier
@ajdomie
agenda
© SPS COMMERCE 2
Styles of Incident Response
Healthy Incident Response
Tips & Tricks
STYLE #1 - DENIAL
© SPS COMMERCE 3
That’s not possible!
No Wai!
© SPS COMMERCE 4
STYLE #2 - CONFUSED
© SPS COMMERCE 5
Ummmm
Hmmmm
(crickets)
How is this
Possible?
© SPS COMMERCE 6
STYLE #3 - LAZY
© SPS COMMERCE 7
It’s the Database
It’s the Network
Just Restart It
© SPS COMMERCE 8
STYLE #4 - ANGRY
© SPS COMMERCE 9
Why did
you do that?
What did you
change?
#%$! &#!^ #$@
© SPS COMMERCE 10
STYLE #5 - FIREDRILL
© SPS COMMERCE 11
OMG W
TF
FML
“Buckshot”
© SPS COMMERCE 12
© SPS COMMERCE 13
LET’S GET REAL
© SPS COMMERCE 14
• Good way - Alarm
HOW DO WE KNOW THERE IS A FIRE?
© SPS COMMERCE 15
• Bad Way – Humans
HOW DO WE KNOW THERE IS A FIRE?
© SPS COMMERCE 16
• If you catch it right away?
WHO PUTS THE FIRE OUT?
© SPS COMMERCE 17
• If it’s out of control?
WHO PUTS THE FIRE OUT?
© SPS COMMERCE 18
INCIDENT RESPONSE TEAM
© SPS COMMERCE 19
• #monoliths
– Familiar, All or None, Less Agility
• #microservices
– Complex, semi-isolated, Agile
WHAT’S YOUR SYSTEM?
© SPS COMMERCE 20
• Monitoring Tools
– Base IT
– Logging
– APM
– Metrics
WHERE’S YOUR DATA?
© SPS COMMERCE 21
RESPOND IN ISOLATION
© SPS COMMERCE 22
• Hey Danielle, It looks like the site is acting up and when looking around the only outlier
I have found so far is a cpu spike on the DB. Can you help me investigate this a bit
more?
RESPOND AS A TEAM
© SPS COMMERCE 23
• Share Screens & Visualize Data
• Display Alerts w/ Integrations
• Automatic History Retention
• Enables Collaboration for All
• And my Favorite…...
#CHATOPS
© SPS COMMERCE 24
#CHATOPS – CELEBRATE WITH GIFS
© SPS COMMERCE 25
• Make health data as transparent and central as possible
– Helps the Team “Know where the fire is”
• Share data in chat
– Use the metric from your tools
• “Be Transparent”
• Team Response Nurtures Team Follow Up
TIPS FOR HEALTHY INCIDENT
RESPONSE
© SPS COMMERCE 26
• Always tie things back to the customer
– Simple but often over looked
– Opportunity to link the team to the business
TIPS FOR HEALTHY INCIDENT
RESPONSE
© SPS COMMERCE 27
THANK YOU!
Andy Domeier
@ajdomie
© SPS COMMERCE 28
Ben Overmyer | Star Tribune
Digital Manager, Operations
• Ben is the Digital Manager of Operations at the Minneapolis Star
Tribune.
• He has over a decade of experience as a back end software engineer,
two years of experience as a dedicated operations engineer, and great
enthusiasm for the DevOps culture.
• Besides the Star Tribune, he’s worked for an eclectic mix of
organizations, including the USGS, a game company in New Zealand,
and a beauty products marketing company.
• When not hacking on servers, apps, or people, he acts as art director
and author for a tabletop gaming company.
#DevOpsRoadTrip
EVOLVING INCIDENT
MANAGEMENT
STAR TRIBUNE DEVOPS
IN THE BEGINNING
▸ Forwarded phone line
▸ An on-call list maintained in a wiki
▸ Every week, manually change to the next person on the list
▸ …and overrides or substitutions?
EARLY MONITORING
▸ Zabbix monitoring set up for a handful of causes
▸ Zabbix alerts sent via email to a distribution list
▸ Sometimes no one would see these alerts until hours or, in
rare cases, days later
THE PAIN POINTS
▸ Manual maintenance of the calling tree data
▸ Manual rotation of the support phone line forwarding
▸ Poor documentation of incident life cycles
▸ No sense of incident frequency beyond “this was a bad
couple weeks”
▸ If the on-call person didn’t respond, there was no
escalation process other than calling the head of Digital
PHASE I:
VICTOROPS
ADOPTING VICTOROPS
▸ Automated rotations
▸ Multiple teams
▸ Automatic escalation processes
▸ Easy schedule overrides and changes
▸ APIs for programmatic incident interaction
THE NATURE OF ALERTS
▸ OK, we can set up programmatic alerts. Now what?
▸ Integrating Zabbix, New Relic, and CloudWatch
▸ Discovering alert floods
▸ Move to alerting on symptoms, not causes
▸ …but still monitoring causes
PHASE 2:
THE STATUS SITE
THE SPIDEY-SENSE FACTOR
▸ Humans are good at catching certain kinds of problems
▸ “This doesn’t feel right” and gaps in monitoring
▸ The evolution of the Sev incident system
THE STATUS SITE: MANUAL ALERTING FOR NON-TECH USERS
▸ Want to let certain non-tech users report Sev incidents
▸ Initially just a password-protected form
▸ Uses the VictorOps alert ingestion API for triggering alerts
▸ Uses the VictorOps public API for fetching information
▸ Each Sev alert is created with its own entity_id
▸ Lets admin users share status updates
MONTHLY INCIDENT REPORTING
▸ Monthly reports include a list of all Sev incidents, when
they started, when they ended, what the alert text was, and
what the resolution was
▸ Combine automated and chat messages in VictorOps with
data gathered from other sources
▸ Present this data as automatically as possible in the Status
Site
PHASE 3:
EVOLUTION
NEXT STEPS
▸ Integration of summarized data collected from Datadog/
CloudWatch/etc. into incident reporting
▸ Reports for users that shouldn’t have access to VictorOps
▸ Integration of the Status Site into Slack
▸ @bovermyer
▸ benovermyer.com
Q&A
BREAK TIME
#DevOpsRoadTrip
Breakout Sessions
◻ ChatOps - Jason Hand
◻ Leveraging Data to Establish a Healthy Culture - Andy Domeier
◻ Monitoring and Microservices – Bridget Kromhout
◻ Blameless Culture – Heather Mickman
◻ Devs vs. Ops On-Call, How and Why to Get started – Ben Overmyer
#DevOpsRoadTrip
BREAK TIME
#DevOpsRoadTrip
Breakout Sessions
◻ ChatOps - Jason Hand
◻ Leveraging Data to Establish a Healthy Culture - Andy Domeier
◻ Monitoring and Microservices – Bridget Kromhout
◻ Blameless Culture – Heather Mickman
◻ Devs vs. Ops On-Call, How and Why to Get started – Ben Overmyer
#DevOpsRoadTrip
BREAK TIME
#DevOpsRoadTrip
Heather Mickman | Target
Senior Director of Platform Engineering
• Heather Mickman is the Senior Director of Platform Engineering at Target and a
DevOps enthusiast.
• Heather has 20+ years of IT experience in various roles and industries including
retail, transportation, and high tech manufacturing.
• She is currently working on building the platforms used by software engineers
at Target including a multi-provider cloud platform, API Gateway, telemetry
tooling, data stores, and messaging.
• She has a passion for technology, building high performing teams, driving a
culture of innovation, and having fun along the way. Heather lives in
Minneapolis with her 2 sons and mini dachshund.
#DevOpsRoadTrip
Q&A
Automation
Awareness
Collaboration
Documentation User Empathy
Learning
jhand.co/DRT_MSP
Cynefin
Unordered Ordered
Complicated
Obvious
Complex
Chaotic
Cause Effect Obvious
From Experience
Cause Effect Requires
Analysis
Cause Effect Only
Apparent in Hindsight
Cause & Effect Cannot
Be Related
Sense – Categorize - Respond
Sense – Analyze - RespondProbe – Sense - Respond
Act – Sense - Respond
The systems we engineer, maintain, and improve are
Complicated
.. or ..
Known unknowns
The systems we engineer, maintain, and improve are
ComplexUnknown unknowns
What is the
Root
Cause?
What are the..
Contributing
Factors?
Identifying a “root cause” helps us to …
Put it back
how it was
What we really want is to..
Continuously
Improve
TimeToRepair(TTR)
Continuous Improvement Efforts
Reactive
(chaotic)
Tactical
(obvious)
Integrated
(complicated)
Strategic
(complex)
✓ No automation
✓ No operational stack
awareness
✓ Poor collaboration between
teams (Dev & Ops)
✓ Documentation not available
✓ No standardized
communication
✓ High focus on consistent
continuous learning
✓ Uses a NOC
✓ Some monitoring & alerting
instrumentation
✓ Collaboration in crisis
✓ "Mission critical" processes
are available
✓ Understood crisis
communication protocols
✓ Remediation data available to
IT Operations
✓ Team rotations, paging
policies, role hunting
✓ Continuous improvement of
key health indicators
✓ Technical collaboration across
all incidents
✓ Docs up to date and easily
accessible
✓ Consistent real-time
communication practices
✓ Automated docs and remediation
✓ Actionable Alerts with full context
✓ High collaboration among all
teams
✓ Documentation part of
remediation
✓ Targeted, proactive crisis comms
✓ High focus on continuous learning
Incident Management
Maturity
Reactive
(chaotic)
✓No automation
✓No operational stack awareness
✓Poor collaboration between teams (Dev & Ops)
✓Documentation not available
✓No standardized communication
✓High focus on consistent continuous learning
Tactical
(obvious)
✓Uses a NOC
✓Some monitoring & alerting instrumentation
✓Collaboration in crisis
✓"Mission critical" processes are available
✓Understood crisis communication protocols
✓Remediation data available to IT Operations
Integrated
(complicated)
✓Team rotations, paging policies, role hunting
✓Continuous improvement of key health indicators
✓Technical collaboration across all incidents
✓Docs up to date and easily accessible
✓Consistent real-time communication practices
Strategic
(complex)
✓Automated docs and remediation
✓Actionable Alerts with full context
✓High collaboration among all teams
✓Documentation part of remediation
✓Targeted, proactive crisis comms
✓High focus on continuous learning
Automation
Awareness
Collaboration
Documentation User Empathy
Learning
Learning
Failure not seen as opportunity to learn
Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
Awareness
http://blog.vmware.com
© 2015 Forrester Research, Inc. Reproduction Prohibited 23
Single Source Of Truth Lacking In Many
Orgs – 95% only most of the time or less
Source: April 15, 2015 “Six Trends That Will Shape DevOps Adoption”, Forrester report
Collaboration
Teams siloed throughout life cycle
Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
User Empathy
https://open.buffer.com/wp-content/uploads/2015/12/empathy3.jpg
© 2015 Forrester Research, Inc. Reproduction Prohibited 27
IT teams aren’t measured on customer
experience goals.
Automation
http://thelifedesignproject.com/wp-content/uploads/2009/09/373881476_217d24ef6d.jpg
Delays in notifications Leads To Customers
Finding the Problem First
Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
Documentation
http://blog.vmware.com
Reduce MTTR
State of DevOps Report (2015)
– by Puppet Labs
How do you
Score?
TimeToRepair(TTR)
Continuous Improvement Efforts
Reactive (0 – 4)
(chaotic)
Tactical (5 – 9)
(obvious)
Integrated (10 -14)
(complicated)
Strategic (15 – 18)
(complex)
✓ No automation
✓ No operational stack
awareness
✓ Poor collaboration between
teams (Dev & Ops)
✓ Documentation not available
✓ No standardized
communication
✓ High focus on consistent
continuous learning
✓ Uses a NOC
✓ Some monitoring & alerting
instrumentation
✓ Collaboration in crisis
✓ "Mission critical" processes are
available
✓ Understood crisis
communication protocols
✓ Remediation data available to
IT Operations
✓ Team rotations, paging
policies, role hunting
✓ Continuous improvement of
key health indicators
✓ Technical collaboration across
all incidents
✓ Docs up to date and easily
accessible
✓ Consistent real-time
communication practices
✓ Automated docs and remediation
✓ Actionable Alerts with full context
✓ High collaboration among all teams
✓ Documentation part of remediation
✓ Targeted, proactive crisis comms
✓ High focus on continuous learning
Incident Management
Maturity
RAFFLE TIME
#DevOpsRoadTrip
DENVER - SEATTLE - SAN FRANCISCO - MINNEAPOLIS - NEW YORK CITY

Contenu connexe

Tendances

Winnipeg ISACA Security is Dead, Rugged DevOps
Winnipeg ISACA Security is Dead, Rugged DevOpsWinnipeg ISACA Security is Dead, Rugged DevOps
Winnipeg ISACA Security is Dead, Rugged DevOps
Gene Kim
 
Bright talk running a cloud - final
Bright talk   running a cloud - finalBright talk   running a cloud - final
Bright talk running a cloud - final
Andrew White
 
2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report
2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report
2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report
Gene Kim
 
Tui the phoenix project book review
Tui the phoenix project book reviewTui the phoenix project book review
Tui the phoenix project book review
Rudiger Wolf
 
Kim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev ops
Kim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev opsKim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev ops
Kim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev ops
Gene Kim
 
2012 05 corp fin 1c
2012 05 corp fin 1c2012 05 corp fin 1c
2012 05 corp fin 1c
Gene Kim
 
How Can We Better Sell DevOps?
How Can We Better Sell DevOps?How Can We Better Sell DevOps?
How Can We Better Sell DevOps?
Gene Kim
 

Tendances (20)

What the smartest brands know about CX ... and what they still aren't doing a...
What the smartest brands know about CX ... and what they still aren't doing a...What the smartest brands know about CX ... and what they still aren't doing a...
What the smartest brands know about CX ... and what they still aren't doing a...
 
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
 
Winnipeg ISACA Security is Dead, Rugged DevOps
Winnipeg ISACA Security is Dead, Rugged DevOpsWinnipeg ISACA Security is Dead, Rugged DevOps
Winnipeg ISACA Security is Dead, Rugged DevOps
 
Bright talk running a cloud - final
Bright talk   running a cloud - finalBright talk   running a cloud - final
Bright talk running a cloud - final
 
Technical excellence - practices matter
Technical excellence - practices matterTechnical excellence - practices matter
Technical excellence - practices matter
 
Inextricably linked reproducibility and productivity in data science and ai ...
Inextricably linked reproducibility and productivity in data science and ai  ...Inextricably linked reproducibility and productivity in data science and ai  ...
Inextricably linked reproducibility and productivity in data science and ai ...
 
Teams and monoliths - Matthew Skelton - LondonCD 2016
Teams and monoliths - Matthew Skelton - LondonCD 2016Teams and monoliths - Matthew Skelton - LondonCD 2016
Teams and monoliths - Matthew Skelton - LondonCD 2016
 
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
 
Leading A DevOps Transformation: Lessons Learned
Leading A DevOps Transformation: Lessons LearnedLeading A DevOps Transformation: Lessons Learned
Leading A DevOps Transformation: Lessons Learned
 
2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report
2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report
2019 12 Clojure/conj: Love Letter To Clojure, and A Datomic Experience Report
 
DevOps MythBusters
DevOps MythBustersDevOps MythBusters
DevOps MythBusters
 
PuppetConf2012GeneKim
PuppetConf2012GeneKimPuppetConf2012GeneKim
PuppetConf2012GeneKim
 
Tui the phoenix project book review
Tui the phoenix project book reviewTui the phoenix project book review
Tui the phoenix project book review
 
Kim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev ops
Kim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev opsKim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev ops
Kim IT Pro Forum Eugene: IT at Ludicrous Speeds - rugged dev ops
 
2012 Velocity London: DevOps Patterns Distilled
2012 Velocity London: DevOps Patterns Distilled2012 Velocity London: DevOps Patterns Distilled
2012 Velocity London: DevOps Patterns Distilled
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
2012 05 corp fin 1c
2012 05 corp fin 1c2012 05 corp fin 1c
2012 05 corp fin 1c
 
How Can We Better Sell DevOps?
How Can We Better Sell DevOps?How Can We Better Sell DevOps?
How Can We Better Sell DevOps?
 
The Path of DevOps Enlightenment for InfoSec
The Path of DevOps Enlightenment for InfoSecThe Path of DevOps Enlightenment for InfoSec
The Path of DevOps Enlightenment for InfoSec
 
Inextricably linked: reproducibility and productivity in data science and AI
Inextricably linked: reproducibility and productivity in data science and AIInextricably linked: reproducibility and productivity in data science and AI
Inextricably linked: reproducibility and productivity in data science and AI
 

En vedette

DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015
Yuval Yeret
 

En vedette (20)

Paris Devops - Monitoring And Feature Toggle Pattern With JMX
Paris Devops - Monitoring And Feature Toggle Pattern With JMXParis Devops - Monitoring And Feature Toggle Pattern With JMX
Paris Devops - Monitoring And Feature Toggle Pattern With JMX
 
I am a Test Engineer: Why should I care about DevOps?
I am a Test Engineer: Why should I care about DevOps?I am a Test Engineer: Why should I care about DevOps?
I am a Test Engineer: Why should I care about DevOps?
 
Turning the Heat up on DevOps: Providing a web-based editing experience aroun...
Turning the Heat up on DevOps: Providing a web-based editing experience aroun...Turning the Heat up on DevOps: Providing a web-based editing experience aroun...
Turning the Heat up on DevOps: Providing a web-based editing experience aroun...
 
Full Stack DevOps - Ready To Go
Full Stack DevOps - Ready To GoFull Stack DevOps - Ready To Go
Full Stack DevOps - Ready To Go
 
DevconTLV 2014 (Jan) - DIY DevOps
DevconTLV 2014 (Jan) - DIY DevOpsDevconTLV 2014 (Jan) - DIY DevOps
DevconTLV 2014 (Jan) - DIY DevOps
 
Survey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniquesSurvey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniques
 
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer support
 
Practical Monitoring Techniques
Practical Monitoring TechniquesPractical Monitoring Techniques
Practical Monitoring Techniques
 
Which watcher watches CloudWatch
Which watcher watches CloudWatch Which watcher watches CloudWatch
Which watcher watches CloudWatch
 
Measured availability - Sanjay Singh - DevOps Bangalore meetup March 28th 2015
Measured availability - Sanjay Singh - DevOps Bangalore meetup March 28th 2015Measured availability - Sanjay Singh - DevOps Bangalore meetup March 28th 2015
Measured availability - Sanjay Singh - DevOps Bangalore meetup March 28th 2015
 
Dev ops with smell v1.2
Dev ops with smell v1.2Dev ops with smell v1.2
Dev ops with smell v1.2
 
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
 
AWS and Dynatrace: Moving your Cloud Strategy to the Next Level
AWS and Dynatrace: Moving your Cloud Strategy to the Next LevelAWS and Dynatrace: Moving your Cloud Strategy to the Next Level
AWS and Dynatrace: Moving your Cloud Strategy to the Next Level
 
Full-Stack Development
Full-Stack DevelopmentFull-Stack Development
Full-Stack Development
 
Devoxx 2014 monitoring
Devoxx 2014 monitoringDevoxx 2014 monitoring
Devoxx 2014 monitoring
 
DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015
 
DevOps - Retour d'expérience - MarsJug du 29 Juin 2011
DevOps - Retour d'expérience - MarsJug du 29 Juin 2011DevOps - Retour d'expérience - MarsJug du 29 Juin 2011
DevOps - Retour d'expérience - MarsJug du 29 Juin 2011
 
Run IT Support the DevOps Way
Run IT Support the DevOps WayRun IT Support the DevOps Way
Run IT Support the DevOps Way
 
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic - DevOps PaaS Business with Docker Support for Service ProvidersJelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
 

Similaire à DevOps Roadtrip Minneapolis

Similaire à DevOps Roadtrip Minneapolis (20)

DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck
 
How Dealertrack Optimizes the DevOps Toolchain, FutureStack17
How Dealertrack Optimizes the DevOps Toolchain, FutureStack17How Dealertrack Optimizes the DevOps Toolchain, FutureStack17
How Dealertrack Optimizes the DevOps Toolchain, FutureStack17
 
ASAS 2015 - Benito de Miranda
ASAS 2015 - Benito de MirandaASAS 2015 - Benito de Miranda
ASAS 2015 - Benito de Miranda
 
DevOps Roadtrip NYC
DevOps Roadtrip NYC DevOps Roadtrip NYC
DevOps Roadtrip NYC
 
DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck
 
How AI is transforming DevOps | Calidad Infotech
How AI is transforming DevOps | Calidad InfotechHow AI is transforming DevOps | Calidad Infotech
How AI is transforming DevOps | Calidad Infotech
 
AIIM and Vamosa - Practical Cosniderations when Implementing ECM
AIIM and Vamosa - Practical Cosniderations when Implementing ECMAIIM and Vamosa - Practical Cosniderations when Implementing ECM
AIIM and Vamosa - Practical Cosniderations when Implementing ECM
 
Owasp summit debrief v1.0 (jun 2017)
Owasp summit debrief v1.0 (jun 2017)Owasp summit debrief v1.0 (jun 2017)
Owasp summit debrief v1.0 (jun 2017)
 
Top ECM Trends in Digital Enterprise
Top ECM Trends in Digital EnterpriseTop ECM Trends in Digital Enterprise
Top ECM Trends in Digital Enterprise
 
Desmistificando Tecnologias
Desmistificando TecnologiasDesmistificando Tecnologias
Desmistificando Tecnologias
 
AppSphere 15 - Transforming the Business: The Role of DevOps
AppSphere 15 - Transforming the Business: The Role of DevOpsAppSphere 15 - Transforming the Business: The Role of DevOps
AppSphere 15 - Transforming the Business: The Role of DevOps
 
AppDynamics the Missing Link to DevOps - AppSphere16
AppDynamics the Missing Link to DevOps - AppSphere16AppDynamics the Missing Link to DevOps - AppSphere16
AppDynamics the Missing Link to DevOps - AppSphere16
 
Improving software quality for the future of connected vehicles
Improving software quality for the future of connected vehiclesImproving software quality for the future of connected vehicles
Improving software quality for the future of connected vehicles
 
If you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get thereIf you don't know where you're going it doesn't matter how fast you get there
If you don't know where you're going it doesn't matter how fast you get there
 
Markings of a Healthy OSS Project
Markings of a Healthy OSS ProjectMarkings of a Healthy OSS Project
Markings of a Healthy OSS Project
 
Accelerate Your Time to a Successful Deployment with DevOps
Accelerate Your Time to a Successful Deployment with DevOpsAccelerate Your Time to a Successful Deployment with DevOps
Accelerate Your Time to a Successful Deployment with DevOps
 
10 steps to salvation: Creating digital governance that works
10 steps to salvation: Creating digital governance that works10 steps to salvation: Creating digital governance that works
10 steps to salvation: Creating digital governance that works
 
DEV345_Tools Won’t Fix Your Broken DevOps
DEV345_Tools Won’t Fix Your Broken DevOpsDEV345_Tools Won’t Fix Your Broken DevOps
DEV345_Tools Won’t Fix Your Broken DevOps
 
Best Practices for Driving Software Quality through a Federated Application S...
Best Practices for Driving Software Quality through a Federated Application S...Best Practices for Driving Software Quality through a Federated Application S...
Best Practices for Driving Software Quality through a Federated Application S...
 
The new role of the marketer
The new role of the marketerThe new role of the marketer
The new role of the marketer
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

DevOps Roadtrip Minneapolis

  • 1.
  • 2. JASON HAND | DevOps Evangelist • Holds over 15 years of experience as a developer, system administrator, and support specialist • Fully emerged into the world of agile development and the DevOps movement with Colorado tech startups #DevOpsRoadTrip
  • 4.
  • 5. A little about VictorOps… VictorOps is the real-time incident management platform that combines the power of people and data to embolden DevOps pros to handle incidents as they occur. #DevOpsRoadTrip
  • 6.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 14.
  • 15.
  • 16.
  • 17.
  • 19. “How Organizations Process Information” Roy Westrum: A Typology of Organizational Cultures 2014 State of DevOps Report shows that in the context of IT, job satisfaction is the biggest predictor of profitability, market share, and productivity. The biggest predictor of job satisfaction, in turn, is how effectively organizations process information, as determined by a model created by sociologist Ron Westrum, shown below. 1 1: https://continuousdelivery.com/implementing/culture/
  • 20.
  • 21.
  • 22. Words are how we think – stories are how we link. - Christina Baldwin Oral narrative is and for a long time has been the chief basis of culture itself. - John D. Niles Stories from the road
  • 23.
  • 24.
  • 25.
  • 27. Unordered Ordered Complicated Obvious Complex Chaotic Cause Effect Obvious From Experience Cause Effect Requires Analysis Cause Effect Only Apparent in Hindsight Cause & Effect Cannot Be Related Sense – Categorize - Respond Sense – Analyze - Respond Probe – Sense - Respond Act – Sense - Respond
  • 28.
  • 29. The systems we engineer, maintain, and improve are Complicated .. or .. Known unknowns
  • 30. The systems we engineer, maintain, and improve are ComplexUnknown unknowns
  • 31.
  • 34. Identifying a “root cause” helps us to … Put it back how it was
  • 35. What we really want is to.. Continuously Improve
  • 36. TimeToRepair(TTR) Continuous Improvement Efforts Reactive (chaotic) Tactical (obvious) Integrated (complicated) Strategic (complex) ✓ No automation ✓ No operational stack awareness ✓ Poor collaboration between teams (Dev & Ops) ✓ Documentation not available ✓ No standardized communication ✓ High focus on consistent continuous learning ✓ Uses a NOC ✓ Some monitoring & alerting instrumentation ✓ Collaboration in crisis ✓ "Mission critical" processes are available ✓ Understood crisis communication protocols ✓ Remediation data available to IT Operations ✓ Team rotations, paging policies, role hunting ✓ Continuous improvement of key health indicators ✓ Technical collaboration across all incidents ✓ Docs up to date and easily accessible ✓ Consistent real-time communication practices ✓ Automated docs and remediation ✓ Actionable Alerts with full context ✓ High collaboration among all teams ✓ Documentation part of remediation ✓ Targeted, proactive crisis comms ✓ High focus on continuous learning Incident Management Maturity
  • 37. Reactive (chaotic) ✓No automation ✓No operational stack awareness ✓Poor collaboration between teams (Dev & Ops) ✓Documentation not available ✓No standardized communication ✓High focus on consistent continuous learning
  • 38. Tactical (obvious) ✓Uses a NOC ✓Some monitoring & alerting instrumentation ✓Collaboration in crisis ✓"Mission critical" processes are available ✓Understood crisis communication protocols ✓Remediation data available to IT Operations
  • 39. Integrated (complicated) ✓Team rotations, paging policies, role hunting ✓Continuous improvement of key health indicators ✓Technical collaboration across all incidents ✓Docs up to date and easily accessible ✓Consistent real-time communication practices
  • 40. Strategic (complex) ✓Automated docs and remediation ✓Actionable Alerts with full context ✓High collaboration among all teams ✓Documentation part of remediation ✓Targeted, proactive crisis comms ✓High focus on continuous learning
  • 41. “Six Trends Shape DevOps Adoption, Q1 2015” Forrester report • The Foundation For Success Is In Place . . . Mostly • Fear Of Failure Will Hamper Advancement • Monitoring And Analytics Strategies Must Make A Big Leap Forward • The Focus On Customer Experience Is Not Second Nature . . . Yet • Change And Release Processes Are Not Delivering Business Needs • You Must Prioritize And Focus Sourcing Strategies
  • 44. Failure not seen as opportunity to learn Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
  • 46. © 2015 Forrester Research, Inc. Reproduction Prohibited 46 Single Source Of Truth Lacking In Many Orgs – 95% only most of the time or less Source: April 15, 2015 “Six Trends That Will Shape DevOps Adoption”, Forrester report
  • 48. Teams siloed throughout life cycle Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
  • 50. © 2015 Forrester Research, Inc. Reproduction Prohibited 50 IT teams aren’t measured on customer experience goals.
  • 52.
  • 53. Delays in notifications Leads To Customers Finding the Problem First Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
  • 55. Reduce MTTR State of DevOps Report (2015) – by Puppet Labs
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. Bridget Kromhout | Pivotal - Cloud Foundry Principal Technologist • Bridget Kromhout is a Principal Technologist for Cloud Foundry at Pivotal. • After years as an operations engineer (most recently at DramaFever), she traded in oncall for more travel. • A frequent speaker at tech conferences, she helps organize tech meetups at home in Minneapolis, serves on the program committee for Velocity, and acts as a global core organizer for devopsdays. • She podcasts at Arrested DevOps, occasionally blogs at bridgetkromhout.com, and is active in a Twitterverse near you. #DevOpsRoadTrip
  • 67. @bridgetkromhout Traded oncall… …for more travel (Similar effect on sleep)
  • 69. @bridgetkromhout “…measuring value, throughput, and performance… revenue rather than cost” The Art of Monitoring (2016) James Turnbull artofmonitoring.com
  • 71. @bridgetkromhout The Art of Monitoring (2016) James Turnbull Monitoring containers artofmonitoring.com
  • 72. @bridgetkromhout “Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics” Large-scale cluster management at Google with Borg - Verma et al. 2015 “Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics”
  • 73. @bridgetkromhout The Art of Monitoring (2016) — James Turnbull Monitoring Maturity Model artofmonitoring.com
  • 74. @bridgetkromhout Image credit: Wikipedia “Any organization that designs a system… will produce a design whose structure is a copy of the organization's communication structure.” Mel Conway
  • 76. @bridgetkromhout three Friday mornings in Minneapolis removed restored
  • 78.
  • 79. Andy Domeier | SPS Commerce Director System Operations • Andy has been in Technology Operations leadership with SPS Commerce for the past 11 years. • Andy spends many mental cycles collaborating to solve effective patterns for monitoring and operating complex changing systems. • Andy’s also spends time solving for priority organization and alignment and the organization of knowledge. #DevOpsRoadTrip
  • 80. HOW EFFECTIVE IS YOUR INCIDENT RESPONSE?Andy Domeier @ajdomie
  • 81. agenda © SPS COMMERCE 2 Styles of Incident Response Healthy Incident Response Tips & Tricks
  • 82. STYLE #1 - DENIAL © SPS COMMERCE 3 That’s not possible! No Wai!
  • 84. STYLE #2 - CONFUSED © SPS COMMERCE 5 Ummmm Hmmmm (crickets) How is this Possible?
  • 86. STYLE #3 - LAZY © SPS COMMERCE 7 It’s the Database It’s the Network Just Restart It
  • 88. STYLE #4 - ANGRY © SPS COMMERCE 9 Why did you do that? What did you change? #%$! &#!^ #$@
  • 90. STYLE #5 - FIREDRILL © SPS COMMERCE 11 OMG W TF FML “Buckshot”
  • 92. © SPS COMMERCE 13 LET’S GET REAL
  • 94. • Good way - Alarm HOW DO WE KNOW THERE IS A FIRE? © SPS COMMERCE 15
  • 95. • Bad Way – Humans HOW DO WE KNOW THERE IS A FIRE? © SPS COMMERCE 16
  • 96. • If you catch it right away? WHO PUTS THE FIRE OUT? © SPS COMMERCE 17
  • 97. • If it’s out of control? WHO PUTS THE FIRE OUT? © SPS COMMERCE 18
  • 98. INCIDENT RESPONSE TEAM © SPS COMMERCE 19
  • 99. • #monoliths – Familiar, All or None, Less Agility • #microservices – Complex, semi-isolated, Agile WHAT’S YOUR SYSTEM? © SPS COMMERCE 20
  • 100. • Monitoring Tools – Base IT – Logging – APM – Metrics WHERE’S YOUR DATA? © SPS COMMERCE 21
  • 101. RESPOND IN ISOLATION © SPS COMMERCE 22
  • 102. • Hey Danielle, It looks like the site is acting up and when looking around the only outlier I have found so far is a cpu spike on the DB. Can you help me investigate this a bit more? RESPOND AS A TEAM © SPS COMMERCE 23
  • 103. • Share Screens & Visualize Data • Display Alerts w/ Integrations • Automatic History Retention • Enables Collaboration for All • And my Favorite…... #CHATOPS © SPS COMMERCE 24
  • 104. #CHATOPS – CELEBRATE WITH GIFS © SPS COMMERCE 25
  • 105. • Make health data as transparent and central as possible – Helps the Team “Know where the fire is” • Share data in chat – Use the metric from your tools • “Be Transparent” • Team Response Nurtures Team Follow Up TIPS FOR HEALTHY INCIDENT RESPONSE © SPS COMMERCE 26
  • 106. • Always tie things back to the customer – Simple but often over looked – Opportunity to link the team to the business TIPS FOR HEALTHY INCIDENT RESPONSE © SPS COMMERCE 27
  • 108.
  • 109. Ben Overmyer | Star Tribune Digital Manager, Operations • Ben is the Digital Manager of Operations at the Minneapolis Star Tribune. • He has over a decade of experience as a back end software engineer, two years of experience as a dedicated operations engineer, and great enthusiasm for the DevOps culture. • Besides the Star Tribune, he’s worked for an eclectic mix of organizations, including the USGS, a game company in New Zealand, and a beauty products marketing company. • When not hacking on servers, apps, or people, he acts as art director and author for a tabletop gaming company. #DevOpsRoadTrip
  • 111. IN THE BEGINNING ▸ Forwarded phone line ▸ An on-call list maintained in a wiki ▸ Every week, manually change to the next person on the list ▸ …and overrides or substitutions?
  • 112. EARLY MONITORING ▸ Zabbix monitoring set up for a handful of causes ▸ Zabbix alerts sent via email to a distribution list ▸ Sometimes no one would see these alerts until hours or, in rare cases, days later
  • 113. THE PAIN POINTS ▸ Manual maintenance of the calling tree data ▸ Manual rotation of the support phone line forwarding ▸ Poor documentation of incident life cycles ▸ No sense of incident frequency beyond “this was a bad couple weeks” ▸ If the on-call person didn’t respond, there was no escalation process other than calling the head of Digital
  • 115. ADOPTING VICTOROPS ▸ Automated rotations ▸ Multiple teams ▸ Automatic escalation processes ▸ Easy schedule overrides and changes ▸ APIs for programmatic incident interaction
  • 116. THE NATURE OF ALERTS ▸ OK, we can set up programmatic alerts. Now what? ▸ Integrating Zabbix, New Relic, and CloudWatch ▸ Discovering alert floods ▸ Move to alerting on symptoms, not causes ▸ …but still monitoring causes
  • 118. THE SPIDEY-SENSE FACTOR ▸ Humans are good at catching certain kinds of problems ▸ “This doesn’t feel right” and gaps in monitoring ▸ The evolution of the Sev incident system
  • 119. THE STATUS SITE: MANUAL ALERTING FOR NON-TECH USERS ▸ Want to let certain non-tech users report Sev incidents ▸ Initially just a password-protected form ▸ Uses the VictorOps alert ingestion API for triggering alerts ▸ Uses the VictorOps public API for fetching information ▸ Each Sev alert is created with its own entity_id ▸ Lets admin users share status updates
  • 120. MONTHLY INCIDENT REPORTING ▸ Monthly reports include a list of all Sev incidents, when they started, when they ended, what the alert text was, and what the resolution was ▸ Combine automated and chat messages in VictorOps with data gathered from other sources ▸ Present this data as automatically as possible in the Status Site
  • 122. NEXT STEPS ▸ Integration of summarized data collected from Datadog/ CloudWatch/etc. into incident reporting ▸ Reports for users that shouldn’t have access to VictorOps ▸ Integration of the Status Site into Slack
  • 124. Q&A
  • 126. Breakout Sessions ◻ ChatOps - Jason Hand ◻ Leveraging Data to Establish a Healthy Culture - Andy Domeier ◻ Monitoring and Microservices – Bridget Kromhout ◻ Blameless Culture – Heather Mickman ◻ Devs vs. Ops On-Call, How and Why to Get started – Ben Overmyer #DevOpsRoadTrip
  • 128. Breakout Sessions ◻ ChatOps - Jason Hand ◻ Leveraging Data to Establish a Healthy Culture - Andy Domeier ◻ Monitoring and Microservices – Bridget Kromhout ◻ Blameless Culture – Heather Mickman ◻ Devs vs. Ops On-Call, How and Why to Get started – Ben Overmyer #DevOpsRoadTrip
  • 130. Heather Mickman | Target Senior Director of Platform Engineering • Heather Mickman is the Senior Director of Platform Engineering at Target and a DevOps enthusiast. • Heather has 20+ years of IT experience in various roles and industries including retail, transportation, and high tech manufacturing. • She is currently working on building the platforms used by software engineers at Target including a multi-provider cloud platform, API Gateway, telemetry tooling, data stores, and messaging. • She has a passion for technology, building high performing teams, driving a culture of innovation, and having fun along the way. Heather lives in Minneapolis with her 2 sons and mini dachshund. #DevOpsRoadTrip
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149. Q&A
  • 153. Unordered Ordered Complicated Obvious Complex Chaotic Cause Effect Obvious From Experience Cause Effect Requires Analysis Cause Effect Only Apparent in Hindsight Cause & Effect Cannot Be Related Sense – Categorize - Respond Sense – Analyze - RespondProbe – Sense - Respond Act – Sense - Respond
  • 154.
  • 155. The systems we engineer, maintain, and improve are Complicated .. or .. Known unknowns
  • 156. The systems we engineer, maintain, and improve are ComplexUnknown unknowns
  • 157.
  • 160. Identifying a “root cause” helps us to … Put it back how it was
  • 161. What we really want is to.. Continuously Improve
  • 162. TimeToRepair(TTR) Continuous Improvement Efforts Reactive (chaotic) Tactical (obvious) Integrated (complicated) Strategic (complex) ✓ No automation ✓ No operational stack awareness ✓ Poor collaboration between teams (Dev & Ops) ✓ Documentation not available ✓ No standardized communication ✓ High focus on consistent continuous learning ✓ Uses a NOC ✓ Some monitoring & alerting instrumentation ✓ Collaboration in crisis ✓ "Mission critical" processes are available ✓ Understood crisis communication protocols ✓ Remediation data available to IT Operations ✓ Team rotations, paging policies, role hunting ✓ Continuous improvement of key health indicators ✓ Technical collaboration across all incidents ✓ Docs up to date and easily accessible ✓ Consistent real-time communication practices ✓ Automated docs and remediation ✓ Actionable Alerts with full context ✓ High collaboration among all teams ✓ Documentation part of remediation ✓ Targeted, proactive crisis comms ✓ High focus on continuous learning Incident Management Maturity
  • 163. Reactive (chaotic) ✓No automation ✓No operational stack awareness ✓Poor collaboration between teams (Dev & Ops) ✓Documentation not available ✓No standardized communication ✓High focus on consistent continuous learning
  • 164. Tactical (obvious) ✓Uses a NOC ✓Some monitoring & alerting instrumentation ✓Collaboration in crisis ✓"Mission critical" processes are available ✓Understood crisis communication protocols ✓Remediation data available to IT Operations
  • 165. Integrated (complicated) ✓Team rotations, paging policies, role hunting ✓Continuous improvement of key health indicators ✓Technical collaboration across all incidents ✓Docs up to date and easily accessible ✓Consistent real-time communication practices
  • 166. Strategic (complex) ✓Automated docs and remediation ✓Actionable Alerts with full context ✓High collaboration among all teams ✓Documentation part of remediation ✓Targeted, proactive crisis comms ✓High focus on continuous learning
  • 169. Failure not seen as opportunity to learn Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
  • 171. © 2015 Forrester Research, Inc. Reproduction Prohibited 23 Single Source Of Truth Lacking In Many Orgs – 95% only most of the time or less Source: April 15, 2015 “Six Trends That Will Shape DevOps Adoption”, Forrester report
  • 173. Teams siloed throughout life cycle Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
  • 175. © 2015 Forrester Research, Inc. Reproduction Prohibited 27 IT teams aren’t measured on customer experience goals.
  • 177.
  • 178. Delays in notifications Leads To Customers Finding the Problem First Source: “Six Trends Shape DevOps Adoption, Q1 2015”, Forrester report
  • 180. Reduce MTTR State of DevOps Report (2015) – by Puppet Labs
  • 182. TimeToRepair(TTR) Continuous Improvement Efforts Reactive (0 – 4) (chaotic) Tactical (5 – 9) (obvious) Integrated (10 -14) (complicated) Strategic (15 – 18) (complex) ✓ No automation ✓ No operational stack awareness ✓ Poor collaboration between teams (Dev & Ops) ✓ Documentation not available ✓ No standardized communication ✓ High focus on consistent continuous learning ✓ Uses a NOC ✓ Some monitoring & alerting instrumentation ✓ Collaboration in crisis ✓ "Mission critical" processes are available ✓ Understood crisis communication protocols ✓ Remediation data available to IT Operations ✓ Team rotations, paging policies, role hunting ✓ Continuous improvement of key health indicators ✓ Technical collaboration across all incidents ✓ Docs up to date and easily accessible ✓ Consistent real-time communication practices ✓ Automated docs and remediation ✓ Actionable Alerts with full context ✓ High collaboration among all teams ✓ Documentation part of remediation ✓ Targeted, proactive crisis comms ✓ High focus on continuous learning Incident Management Maturity
  • 184. DENVER - SEATTLE - SAN FRANCISCO - MINNEAPOLIS - NEW YORK CITY