SlideShare a Scribd company logo
1 of 229
Download to read offline
http://gapingvoid.com/
Sunday, June 20, 2010
The Upside of Downtime
         Turning disaster into opportunity




Sunday, June 20, 2010
Who’s had a site go down?




Sunday, June 20, 2010
Who’s hasn’t had a site go
                       down?



Sunday, June 20, 2010
There’s always
                         that one guy!




Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Downtime
                                                   sucks



Source: http://www.motivatedphotos.com/?id=8080
Sunday, June 20, 2010
Why downtime sucks
               Business   $3,000

                          $2,250

                          $1,500
                                                         Sales
                           $750

                             $0
                                   0   2   4   6   8   10 12 14 16 18 20 22




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand
               You




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand
               You
               Users




Sunday, June 20, 2010
Downtime = Bad! (Duh)




Sunday, June 20, 2010
Approach #1
                          Don’t fail



Sunday, June 20, 2010
Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg
Sunday, June 20, 2010
“Everything fails all the time”
                        -- Werner Vogels (Amazon, CTO)




Sunday, June 20, 2010
“Everything fails all the time”
                        -- Werner Vogels (Amazon, CTO)




Sunday, June 20, 2010
Your site
                         will fail



                           Werner Vogels
                          (Amazon, CTO)
Sunday, June 20, 2010
Why?!?




Sunday, June 20, 2010
Why Failure Happens
                            Risk Homeostasis




Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg

Sunday, June 20, 2010
Why Failure Happens
                        Risk Homeostasis
                        Black Swan




Source: Amazon.com
Sunday, June 20, 2010
Why Failure Happens
                          Risk Homeostasis
                          Black Swan
                          Unknown unknowns




Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg
Sunday, June 20, 2010
Why Failure Happens
                           Risk Homeostasis
                           Black Swan
                           Unknown unknowns
                           Change




Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg
Sunday, June 20, 2010
Why Failure Happens
                          Risk Homeostasis
                          Black Swan
                          Unknown unknowns
                          Change
                          Many small failures


Source: http://www.biojobblog.com/uploads/image/dominos.jpg

Sunday, June 20, 2010
Why Failure Happens
                            Risk Homeostasis
                            Black Swan
                            Unknown unknowns
                            Change
                            Many small failures
                            Humans
Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Polisher
                 blocked

         Not unusual




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into
                 blocked                                      air system

         Not unusual                                           Not expected




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected        Not good




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken


                                                                       WTF        Gauge broken

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken


                                                                Meltdown          Gauge broken

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Sunday, June 20, 2010
Source: http://support.rightscale.com/09-Clouds/AWS/02-Amazon_EC2/Designing_Failover_Architectures_on_EC2/03-Advanced_Failover_Architecture
Sunday, June 20, 2010
“accidental power failure”



Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/
Sunday, June 20, 2010
“traffic accident damaged a nearby
                         utility transformer”
Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/
Sunday, June 20, 2010
“unfortunate code change”
Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/
Sunday, June 20, 2010
Sunday, June 20, 2010
“Unhappy customers may get some
             attention, but unhappy networked
             customers can quickly impact your
             business”
                                                                                                                                     -- Clay Shirky

Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
http://labs.webmetrics.com/crowdsourceduptime
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Recap




Sunday, June 20, 2010
Your site will fail




Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad




Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad
          +
          Everyone will find out



Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad
          +
          Everyone will find out
          =
          Screw it, I’ll become a
          lumberjack
                            Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg
Sunday, June 20, 2010
“Embrace fear of outages and
               degradation. Use it to guide your
               architecture, your code, your
               infrastructure. So lean into it.”
                              -- John Allspaw, VP Tech. Ops at Etsy

Sunday, June 20, 2010
Approach #2
                        Prepare for downtime



Sunday, June 20, 2010
Disclaimer:
         Try hard to avoid downtime



Sunday, June 20, 2010
Learning by example...




Sunday, June 20, 2010
Case Study #1
                          Facebook



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
“The larger issue here isn't just that a portion of
         Facebook's platform has gone down - numerous web
         services have issues from time to time, including
         everything from Gmail to Twitter. An outage of this
         length, however, with no official communication
         from the company itself is disturbing.”
                                                     -- N.Y. Times




Sunday, June 20, 2010
Facebook



         Downtime             Disturbing




Sunday, June 20, 2010
Sunday, June 20, 2010
Case Study #2
                        Google App Engine



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Google App Engine



                        Downtime     Kudos




Sunday, June 20, 2010
Case Study #3
                          Atlassian



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Atlassian



                 Downtime           Bravo




Sunday, June 20, 2010
http://atlassian.com/

Sunday, June 20, 2010
Downtime:
         Opportunity to Build Trust



Sunday, June 20, 2010
Downtime:
         Opportunity to Destroy Trust



Sunday, June 20, 2010
How To:
         Prepare for Downtime



Sunday, June 20, 2010
Something > Nothing




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




               Life is good     Oh crap     That sucked
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Prepare   Communicate   Explain




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Communication channel




Sunday, June 20, 2010
Prepare   Communicate          Explain

         1. Communication channel


      Something is                Can’t tell if it’s    I’ll assume it’s
        wrong                      me or you                   you




                                                          You suck


Sunday, June 20, 2010
Prepare   Communicate          Explain

         1. Communication channel


      Something is                Can’t tell if it’s    I’ll assume it’s
        wrong                      me or you                   you




                                   Tell me when         You suck a lot
    I know it’s you
                                    you’re back              less


Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find




Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site




Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated




Sunday, June 20, 2010
7 keys for public health dashboards

          1. Must show current status for each “service”
          2. Data must be accurate and timely
          3. Must be easy to find
          4. Must provide details for events in real time
          5. Provide historical uptime and performance data
          6. Provide a way to be notified of status changes
          7. Provide details on the data is gathered


 Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html

Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process



Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority




Sunday, June 20, 2010
Prepare       Communicate    Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority
                        Mean-Time-To-Communicate (MTTC)


Sunday, June 20, 2010
Prepare        Communicate        Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority
                        Mean-Time-To-Communicate (MTTC)
                        On-call/drills/escalations/etc.
Sunday, June 20, 2010
Your servers




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Communicate




Sunday, June 20, 2010
Prepare     Communicate     Explain

         1. Communicate
                        Use communication channel




Sunday, June 20, 2010
Prepare     Communicate     Explain

         1. Communicate
                        Use communication channel
                        MTTC




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA
                        Update regularly


Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA
                        Update regularly

         2. Fix it!
Sunday, June 20, 2010
Phew, close
                           one!




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Postmortem




Sunday, June 20, 2010
Prepare                                 Communicate   Explain

         1. Postmortem
                        Admit failure




Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/
Sunday, June 20, 2010
Prepare                                Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human




Source: http://www.bureauofcommunication.com/compose/apology
Sunday, June 20, 2010
Prepare   Communicate   Explain




                         “We apologize for any
                        inconvenience this may
                             have caused”


Sunday, June 20, 2010
Prepare                                   Communicate                    Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time




Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf
Sunday, June 20, 2010
Prepare                                    Communicate      Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted




Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/
Sunday, June 20, 2010
Prepare                                 Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong




Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html
Sunday, June 20, 2010
Prepare                           Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned


Source: http://graysky.org/2010/02/downtime-postmortem/
Sunday, June 20, 2010
Prepare         Communicate   Explain

         1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned


Sunday, June 20, 2010
Prepare   Communicate   Explain




                “I was completely overwhelmed by
                the amount of positive feedback and
                support I received.”
Sunday, June 20, 2010
Prepare         Communicate   Explain

         1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned

          2. Improve for the future
Sunday, June 20, 2010
Prepare                       Communicate   Explain




               “Google is not just saying sorry, they are
               actually implementing serious changes which
               probably represents millions of dollars of
               development to help make sure this doesn't
               happen again.”




Source: http://news.ycombinator.com/item?id=1168493

Sunday, June 20, 2010
Prepare                                  Communicate                     Explain




Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf
Sunday, June 20, 2010
Prepare   Communicate   Explain




                                  Be human




Sunday, June 20, 2010
Prepare     Communicate   Explain




                                  Be authentic




Sunday, June 20, 2010
Prepare      Communicate   Explain




                                  Be transparent




Sunday, June 20, 2010
Prepare   Communicate   Explain




                          Accept responsibility




Sunday, June 20, 2010
Prepare   Communicate   Explain




                            Learn and improve




Sunday, June 20, 2010
Prepare   Communicate   Explain




                                   Trust




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare     Communicate                Explain
        1. Communication channel     1. Communicate         1. Post-mortem
        - Easy to find                 - Use channel          - Admit failure
        - Off-site                    - M.T.T.C.             - Sound like a human
        - Real-time                   - Who/what affected    - Start time and end time
                                      - When started         - Who/what was impacted
        2. Process                    - ETA to resolution    - What went wrong
         - Give authority             - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations       2. Fix it!             2. Learn and improve




                Be Prepared       + Be Transparent +          Be Human




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare     Communicate                Explain
        1. Communication channel     1. Communicate         1. Post-mortem
        - Easy to find                 - Use channel          - Admit failure
        - Off-site                    - M.T.T.C.             - Sound like a human
        - Real-time                   - Who/what affected    - Start time and end time
                                      - When started         - Who/what was impacted
        2. Process                    - ETA to resolution    - What went wrong
         - Give authority             - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations       2. Fix it!             2. Learn and improve




                Be Prepared       + Be Transparent +          Be Human             =



Sunday, June 20, 2010
                                    Trust
Disclaimer:
         Don’t screw up too often



Sunday, June 20, 2010
Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught




                     Not
                    Caught



Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught




                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught                 Big Loss


                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught     Win             Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught     Win             Win

Sunday, June 20, 2010
Benefits
               Gain trust
               Reduce churn, increase loyalty
               Reduce support costs
               Ability to control the message
               Competitive advantage
               More time to focus on the actual problem
               Reduce stress


Sunday, June 20, 2010
Change != Easy




Sunday, June 20, 2010
Change != Impossible




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve
               Pain




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve
               Pain
               Buy-in




Sunday, June 20, 2010
Product
         Management



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing   Reality: They’ll find out, better from us

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing   Reality: They’ll find out, better from us

Sunday, June 20, 2010
Source: http://delicious.com/lennysan/healthdashboard

Sunday, June 20, 2010
Simple as that!




Sunday, June 20, 2010
Your site
                        will still fail!




Sunday, June 20, 2010
“The measure of a society is how
     well it transforms pain and suffering
     into something worthwhile.”
                           -- Fredrick Nietzsche

Sunday, June 20, 2010
“The measure of a company is how
      well it transforms pain of downtime
      into something worthwhile.”
                                                        -- Lenny Rachitsky

Source: Original quote inspired by Fredrick Nietzsche
Sunday, June 20, 2010
Bare minimum:
         Register a Twitter account



Sunday, June 20, 2010
Thank You

             Slides: http://bit.ly/upside-of-downtime

             Lenny Rachitsky
             @lennysan
             http://www.transparentuptime.com/

                        Webmetrics/Neustar
                        @webmetrics
                        http://www.webmetrics.com/
Sunday, June 20, 2010
Bonus




Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare                                         Communicate                       Explain
          1. Communication channel                                              1. Communicate         1. Post-mortem
          - Easy to find                                                          - Use channel          - Admit failure
          - Off-site                                                             - M.T.T.C.             - Sound like a human
          - Real-time                                                            - Who/what affected    - Start time and end time
                                                                                 - When started         - Who/what was impacted
          2. Process                                                             - ETA to resolution    - What went wrong
           - Give authority                                                      - Update regularly     - Lessons learned
           - M.T.T.C.
           - On-call/escalations                                                2. Fix it!             2. Learn and improve




        "Unlikely that an accidental surface or subsurface
        oil spill would occur from the proposed activities"
                                                                                -- Exploration and environmental impact plan


Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion

Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
“Be not afraid of transparency;
          some are born transparent,
          some achieve transparency,
          and others have transparency
         
 
 
 
 
 
 
 
 thrust upon them.”
                        -- Burrowed from William Shakespeare




Sunday, June 20, 2010
Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)
         8. Build habits - (build process organically)


Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)
         8. Build habits - (build process organically)
         9. Rally the herd - (get buy in, rest will follow)
Sunday, June 20, 2010

More Related Content

More from Lenny Rachitsky

Localmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLocalmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLenny Rachitsky
 
Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Lenny Rachitsky
 
Upside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkUpside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkLenny Rachitsky
 
Google App Engine - Simple Introduction
Google App Engine - Simple IntroductionGoogle App Engine - Simple Introduction
Google App Engine - Simple IntroductionLenny Rachitsky
 
The Cloud - An introduction
The Cloud - An introductionThe Cloud - An introduction
The Cloud - An introductionLenny Rachitsky
 
The Power of Story, Part 1
The Power of Story, Part 1The Power of Story, Part 1
The Power of Story, Part 1Lenny Rachitsky
 
Getting Things Done - Intro
Getting Things Done - IntroGetting Things Done - Intro
Getting Things Done - IntroLenny Rachitsky
 
The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893Lenny Rachitsky
 
Influence - Robert Cialdini
Influence - Robert CialdiniInfluence - Robert Cialdini
Influence - Robert CialdiniLenny Rachitsky
 

More from Lenny Rachitsky (11)

Localmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLocalmind pitch at NewTech Montreal
Localmind pitch at NewTech Montreal
 
Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)
 
Upside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkUpside of Downtime Preparation Framework
Upside of Downtime Preparation Framework
 
Google App Engine - Simple Introduction
Google App Engine - Simple IntroductionGoogle App Engine - Simple Introduction
Google App Engine - Simple Introduction
 
The Cloud - An introduction
The Cloud - An introductionThe Cloud - An introduction
The Cloud - An introduction
 
How to Trust the Cloud
How to Trust the CloudHow to Trust the Cloud
How to Trust the Cloud
 
The Power of Story, Part 1
The Power of Story, Part 1The Power of Story, Part 1
The Power of Story, Part 1
 
Getting Things Done - Intro
Getting Things Done - IntroGetting Things Done - Intro
Getting Things Done - Intro
 
The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893
 
Influence - Robert Cialdini
Influence - Robert CialdiniInfluence - Robert Cialdini
Influence - Robert Cialdini
 
Twitter - An Intro
Twitter - An IntroTwitter - An Intro
Twitter - An Intro
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

The Upside of Downtime (Velocity 2010)

  • 2. The Upside of Downtime Turning disaster into opportunity Sunday, June 20, 2010
  • 3. Who’s had a site go down? Sunday, June 20, 2010
  • 4. Who’s hasn’t had a site go down? Sunday, June 20, 2010
  • 5. There’s always that one guy! Sunday, June 20, 2010
  • 15. Downtime sucks Source: http://www.motivatedphotos.com/?id=8080 Sunday, June 20, 2010
  • 16. Why downtime sucks Business $3,000 $2,250 $1,500 Sales $750 $0 0 2 4 6 8 10 12 14 16 18 20 22 Sunday, June 20, 2010
  • 17. Why downtime sucks Business Brand Sunday, June 20, 2010
  • 18. Why downtime sucks Business Brand You Sunday, June 20, 2010
  • 19. Why downtime sucks Business Brand You Users Sunday, June 20, 2010
  • 20. Downtime = Bad! (Duh) Sunday, June 20, 2010
  • 21. Approach #1 Don’t fail Sunday, June 20, 2010
  • 23. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 24. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 25. Your site will fail Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 27. Why Failure Happens Risk Homeostasis Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg Sunday, June 20, 2010
  • 28. Why Failure Happens Risk Homeostasis Black Swan Source: Amazon.com Sunday, June 20, 2010
  • 29. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg Sunday, June 20, 2010
  • 30. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg Sunday, June 20, 2010
  • 31. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Source: http://www.biojobblog.com/uploads/image/dominos.jpg Sunday, June 20, 2010
  • 32. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Humans Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg Sunday, June 20, 2010
  • 35. Polisher blocked Not unusual Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 36. Polisher Moisture leaks into blocked air system Not unusual Not expected Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 37. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Not good Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 38. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 39. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 40. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 41. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken WTF Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 42. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Meltdown Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 45. “accidental power failure” Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/ Sunday, June 20, 2010
  • 46. “traffic accident damaged a nearby utility transformer” Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/ Sunday, June 20, 2010
  • 47. “unfortunate code change” Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/ Sunday, June 20, 2010
  • 49. “Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business” -- Clay Shirky Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/ Sunday, June 20, 2010
  • 62. Your site will fail Sunday, June 20, 2010
  • 63. Your site will fail + Downtime is bad Sunday, June 20, 2010
  • 64. Your site will fail + Downtime is bad + Everyone will find out Sunday, June 20, 2010
  • 65. Your site will fail + Downtime is bad + Everyone will find out = Screw it, I’ll become a lumberjack Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg Sunday, June 20, 2010
  • 66. “Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.” -- John Allspaw, VP Tech. Ops at Etsy Sunday, June 20, 2010
  • 67. Approach #2 Prepare for downtime Sunday, June 20, 2010
  • 68. Disclaimer: Try hard to avoid downtime Sunday, June 20, 2010
  • 70. Case Study #1 Facebook Sunday, June 20, 2010
  • 77. “The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.” -- N.Y. Times Sunday, June 20, 2010
  • 78. Facebook Downtime Disturbing Sunday, June 20, 2010
  • 80. Case Study #2 Google App Engine Sunday, June 20, 2010
  • 95. Google App Engine Downtime Kudos Sunday, June 20, 2010
  • 96. Case Study #3 Atlassian Sunday, June 20, 2010
  • 108. Atlassian Downtime Bravo Sunday, June 20, 2010
  • 110. Downtime: Opportunity to Build Trust Sunday, June 20, 2010
  • 111. Downtime: Opportunity to Destroy Trust Sunday, June 20, 2010
  • 112. How To: Prepare for Downtime Sunday, June 20, 2010
  • 113. Something > Nothing Sunday, June 20, 2010
  • 114. Upside of Downtime Framework 1.0 Life is good Oh crap That sucked Time Sunday, June 20, 2010
  • 115. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 116. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 117. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 118. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 119. Prepare Communicate Explain Sunday, June 20, 2010
  • 120. Prepare Communicate Explain 1. Communication channel Sunday, June 20, 2010
  • 121. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you You suck Sunday, June 20, 2010
  • 122. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you Tell me when You suck a lot I know it’s you you’re back less Sunday, June 20, 2010
  • 131. Prepare Communicate Explain 1. Communication channel Easy to find Sunday, June 20, 2010
  • 132. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Sunday, June 20, 2010
  • 133. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated Sunday, June 20, 2010
  • 134. 7 keys for public health dashboards 1. Must show current status for each “service” 2. Data must be accurate and timely 3. Must be easy to find 4. Must provide details for events in real time 5. Provide historical uptime and performance data 6. Provide a way to be notified of status changes 7. Provide details on the data is gathered Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html Sunday, June 20, 2010
  • 135. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Sunday, June 20, 2010
  • 136. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Sunday, June 20, 2010
  • 137. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) Sunday, June 20, 2010
  • 138. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) On-call/drills/escalations/etc. Sunday, June 20, 2010
  • 140. Prepare Communicate Explain 1. Communicate Sunday, June 20, 2010
  • 141. Prepare Communicate Explain 1. Communicate Use communication channel Sunday, June 20, 2010
  • 142. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Sunday, June 20, 2010
  • 143. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected Sunday, June 20, 2010
  • 144. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started Sunday, June 20, 2010
  • 145. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Sunday, June 20, 2010
  • 146. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly Sunday, June 20, 2010
  • 147. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly 2. Fix it! Sunday, June 20, 2010
  • 148. Phew, close one! Sunday, June 20, 2010
  • 149. Prepare Communicate Explain 1. Postmortem Sunday, June 20, 2010
  • 150. Prepare Communicate Explain 1. Postmortem Admit failure Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/ Sunday, June 20, 2010
  • 151. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Source: http://www.bureauofcommunication.com/compose/apology Sunday, June 20, 2010
  • 152. Prepare Communicate Explain “We apologize for any inconvenience this may have caused” Sunday, June 20, 2010
  • 153. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 154. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/ Sunday, June 20, 2010
  • 155. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html Sunday, June 20, 2010
  • 156. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Source: http://graysky.org/2010/02/downtime-postmortem/ Sunday, June 20, 2010
  • 157. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Sunday, June 20, 2010
  • 158. Prepare Communicate Explain “I was completely overwhelmed by the amount of positive feedback and support I received.” Sunday, June 20, 2010
  • 159. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned 2. Improve for the future Sunday, June 20, 2010
  • 160. Prepare Communicate Explain “Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.” Source: http://news.ycombinator.com/item?id=1168493 Sunday, June 20, 2010
  • 161. Prepare Communicate Explain Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 162. Prepare Communicate Explain Be human Sunday, June 20, 2010
  • 163. Prepare Communicate Explain Be authentic Sunday, June 20, 2010
  • 164. Prepare Communicate Explain Be transparent Sunday, June 20, 2010
  • 165. Prepare Communicate Explain Accept responsibility Sunday, June 20, 2010
  • 166. Prepare Communicate Explain Learn and improve Sunday, June 20, 2010
  • 167. Prepare Communicate Explain Trust Sunday, June 20, 2010
  • 168. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 169. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human Sunday, June 20, 2010
  • 170. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human = Sunday, June 20, 2010 Trust
  • 171. Disclaimer: Don’t screw up too often Sunday, June 20, 2010
  • 173. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Sunday, June 20, 2010
  • 174. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Win Sunday, June 20, 2010
  • 175. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Loss Not Caught Win Sunday, June 20, 2010
  • 176. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Sunday, June 20, 2010
  • 177. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 178. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 179. Benefits Gain trust Reduce churn, increase loyalty Reduce support costs Ability to control the message Competitive advantage More time to focus on the actual problem Reduce stress Sunday, June 20, 2010
  • 180. Change != Easy Sunday, June 20, 2010
  • 182. Keys to Adoption Getting past a culture of “hide the problem” Sunday, June 20, 2010
  • 183. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Sunday, June 20, 2010
  • 184. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Sunday, June 20, 2010
  • 185. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Sunday, June 20, 2010
  • 186. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Buy-in Sunday, June 20, 2010
  • 187. Product Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 188. Product Default: Lets wait for complaints Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 189. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 190. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 191. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 192. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Sales/ Marketing Sunday, June 20, 2010
  • 193. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Marketing Sunday, June 20, 2010
  • 194. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Sunday, June 20, 2010
  • 195. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 196. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 198. Simple as that! Sunday, June 20, 2010
  • 199. Your site will still fail! Sunday, June 20, 2010
  • 200. “The measure of a society is how well it transforms pain and suffering into something worthwhile.” -- Fredrick Nietzsche Sunday, June 20, 2010
  • 201. “The measure of a company is how well it transforms pain of downtime into something worthwhile.” -- Lenny Rachitsky Source: Original quote inspired by Fredrick Nietzsche Sunday, June 20, 2010
  • 202. Bare minimum: Register a Twitter account Sunday, June 20, 2010
  • 203. Thank You Slides: http://bit.ly/upside-of-downtime Lenny Rachitsky @lennysan http://www.transparentuptime.com/ Webmetrics/Neustar @webmetrics http://www.webmetrics.com/ Sunday, June 20, 2010
  • 207. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 208. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 209. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 210. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve "Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities" -- Exploration and environmental impact plan Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion Sunday, June 20, 2010
  • 211. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 212. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 213. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 214. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 215. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 216. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 217. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 218. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 219. “Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.” -- Burrowed from William Shakespeare Sunday, June 20, 2010
  • 221. Making change 1. Find the bright spots - (this presentation has a bunch) Sunday, June 20, 2010
  • 222. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) Sunday, June 20, 2010
  • 223. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) Sunday, June 20, 2010
  • 224. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) Sunday, June 20, 2010
  • 225. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) Sunday, June 20, 2010
  • 226. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) Sunday, June 20, 2010
  • 227. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) Sunday, June 20, 2010
  • 228. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) Sunday, June 20, 2010
  • 229. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) 9. Rally the herd - (get buy in, rest will follow) Sunday, June 20, 2010